f5cf1a36ad
There is a scenario when the `active_con` is properly chosen but isn't marked as `ready_to_send`. If `renew_subs()` is called during the `on_session_opened()`, the flag will be turned on after the subscriptions are renewed which cannot happen as it requires the flag to be already set. In other words: there is a circular data dependency. The net result is stalling the subscription machinery, particularly the `OSDMap` subs. This caused a nasty peering issue at Sepia [1] where PG 2.7 got stuck in the `GetInfo` state. ``` rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136908$ less ./remote/smithi039/log/ceph-osd.1.log.gz ... DEBUG 2021-05-26 20:19:48,134 [shard 0] osd - pg_epoch 14 pg[2.7( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0) [] r= -1 lpr=0 crt=0'0 mlcod 0'0 unknown enter Initial ... DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=0 crt=0'0 mlcod 0'0 unknown enter Reset ... DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Started ... DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Start ... DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Started/Primary ... DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating enter Started/Primary/Peering ... DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetInfo DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering build_prior all_probe DEBUG 2021-05-26 20:19:48,139 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering build_prior final: probe 0,1 down blocked_by {} DEBUG 2021-05-26 20:19:48,139 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering up_thru 0 < same_since 14, must notify monitor DEBUG 2021-05-26 20:19:48,139 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: no prior_set down osds, clearing prior_readable_until_ub DEBUG 2021-05-26 20:19:48,139 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: querying info from osd.0 ... DEBUG 2021-05-26 20:19:48,237 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering got osd.0 2.7( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0) DEBUG 2021-05-26 20:19:48,237 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Adding osd: 0 peer features: 3f01cfbb7ffdffff DEBUG 2021-05-26 20:19:48,237 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common peer features: 3f01cfbb7ffdffff DEBUG 2021-05-26 20:19:48,237 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common acting features: 3f01cfbb7ffdffff DEBUG 2021-05-26 20:19:48,238 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common upacting features: 3f01cfbb7ffdffff DEBUG 2021-05-26 20:19:48,238 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering exit Started/Primary/Peering/GetInfo 0.099480 4 2021-05-26T20:19:48.146172+0000 ... DEBUG 2021-05-26 20:19:48,238 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetLog ... DEBUG 2021-05-26 20:19:48,238 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetMissing ... DEBUG 2021-05-26 20:19:48,238 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/WaitUpThru ... DEBUG 2021-05-26 20:19:49,139 [shard 0] osd - pg_epoch 15 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating enter Started/Primary/Active ... DEBUG 2021-05-26 20:19:49,142 [shard 0] osd - pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+activating enter Started/Primary/Active/Activating ... DEBUG 2021-05-26 20:19:49,204 [shard 0] osd - pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Started/Primary/Active/Recovered ... DEBUG 2021-05-26 20:19:49,204 [shard 0] osd - pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Started/Primary/Active/Clean ... DEBUG 2021-05-26 20:22:31,223 [shard 0] osd - pg_epoch 86 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Reset ... <a lot of flipping> ... DEBUG 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163 ) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown activate_map DEBUG 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163 ) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown exit Reset 0.035744 1 2021-05-26T20:24:07.817331+0000 INFO 2021-05-26 20:24:07,851 [shard 0] osd - Exiting state: Reset, entered at 1622060647.8158188, 1622060647.8173316 spent on 1 events DEBUG 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163 ) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started INFO 2021-05-26 20:24:07,851 [shard 0] osd - Entering state: Started DEBUG 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163 ) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Start INFO 2021-05-26 20:24:07,851 [shard 0] osd - Entering state: Start INFO 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163 ) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown state<Start>: transitioning to Primary DEBUG 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163 ) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown exit Start 0.000041 0 0.000000 INFO 2021-05-26 20:24:07,851 [shard 0] osd - Exiting state: Start, entered at 1622060647.8516333, 0.0 spent on 0 events DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163 ) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started/Primary INFO 2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163 ) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started/Primary/Peering INFO 2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary/Peering DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering enter Started/Primary/Peering/GetInfo INFO 2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary/Peering/GetInfo ... DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior all_probe 0,1,4 DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior maybe_rw interval:139, acting: 0 DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior final: probe 0,1,4 down blocked_by {} DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering up_thru 125 < same_since 163, must notify monitor DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>: no prior_set down osds, clearing prior_readable_until_ub DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>: querying info from osd.0 DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>: querying info from osd.4 ... DEBUG 2021-05-26 20:24:07,924 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] connect to existing DEBUG 2021-05-26 20:24:07,924 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] --> #62 === pg_query2(2.7 2.7 query(info 0'0 epoch_sent 163) e163/163) v1 (131) ... DEBUG 2021-05-26 20:24:07,942 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] GOT AckFrame: seq=62 ... <plenty of osd_ping messanging but no reply to the pg_query for 2.7> ... DEBUG 2021-05-26 20:58:19,829 [shard 0] ms - [osd.1(hb_front) v2:172.21.15.39:6807/34727 >> osd.4 v2:172.21.15.62:6807/34686@54816] <== #772 = == osd_ping(ping e17 up_from 10 ping_stamp 2021-05-26T20:58:19.825573+0000/2319.780029297s send_stamp 2319.780029297s) v5 (70) DEBUG 2021-05-26 20:58:19,829 [shard 0] ms - [osd.1(hb_front) v2:172.21.15.39:6807/34727 >> osd.4 v2:172.21.15.62:6807/34686@54816] --> #772 === osd_ping(ping_reply e249 up_from 10 ping_stamp 2021-05-26T20:58:19.825573+0000/2319.780029297s send_stamp 2320.039062500s) v5 (70 ``` The peering request got stuck due to awaiting for `OSDMap`. ``` DEBUG 2021-05-26 20:24:07,930 [shard 0] ms - [osd.4(cluster) v2:172.21.15.62:6802/34686 >> osd.1 v2:172.21.15.39:6803/34727@61064] <== #62 === pg_query2(2.7 2.7 query(info 0'0 epoch_sent 163) e163/163) v1 (131) DEBUG 2021-05-26 20:24:07,930 [shard 0] osd - handle_peering_op on 2.7 from 1 DEBUG 2021-05-26 20:24:07,930 [shard 0] osd - peering_event(id=517, detail=PeeringEvent(from=1 pgid=2.7 sent=163 requested=163 evt=epoch_sent: 163 epoch_requested: 163 MQuery 2.7 from 1 query_epoch 163 query: query(info 0'0 epoch_sent 163))): star ``` ``` INFO 2021-05-26 20:19:49,127 [shard 0] osd - evt epoch is 15, i have 14, will wait INFO 2021-05-26 20:19:49,128 [shard 0] osd - osdmap_subscribe(14) DEBUG 2021-05-26 20:19:49,128 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #9 === mon_s ubscribe({osdmap=14}) v3 (15) ... INFO 2021-05-26 20:19:49,131 [shard 0] osd - handle_osd_map osd_map(14..15 src has 1..15) v4 INFO 2021-05-26 20:19:49,131 [shard 0] osd - handle_osd_map epochs [14..15], i have 15, src has [1..15] ... INFO 2021-05-26 20:19:49,138 [shard 0] osd - handle_osd_map osd_map(14..15 src has 1..15) v4 INFO 2021-05-26 20:19:49,138 [shard 0] osd - handle_osd_map epochs [14..15], i have 15, src has [1..15] ... INFO 2021-05-26 20:19:49,139 [shard 0] osd - evt epoch is 15, i have 14, will wait INFO 2021-05-26 20:19:49,141 [shard 0] osd - osdmap_subscribe(14) WARN 2021-05-26 20:19:49,141 [shard 0] monc - renew_subs - empty ... INFO 2021-05-26 20:19:50,140 [shard 0] osd - handle_osd_map osd_map(15..16 src has 1..16) v4 INFO 2021-05-26 20:19:50,140 [shard 0] osd - handle_osd_map epochs [15..16], i have 15, src has [1..16] DEBUG 2021-05-26 20:19:50,141 [shard 0] bluestore - do_transaction INFO 2021-05-26 20:19:50,145 [shard 0] osd - osd.4: committed_osd_maps(16, 16) ... INFO 2021-05-26 20:20:42,881 [shard 0] osd - handle_osd_map epochs [16..17], i have 16, src has [1..17] DEBUG 2021-05-26 20:20:42,882 [shard 0] bluestore - do_transaction INFO 2021-05-26 20:20:42,886 [shard 0] osd - osd.4: committed_osd_maps(17, 17) ... INFO 2021-05-26 20:20:43,941 [shard 0] osd - evt epoch is 18, i have 17, will wait INFO 2021-05-26 20:20:43,941 [shard 0] osd - osdmap_subscribe(17) ... INFO 2021-05-26 20:20:43,957 [shard 0] osd - evt epoch is 18, i have 17, will wait INFO 2021-05-26 20:20:43,957 [shard 0] osd - osdmap_subscribe(17) ... INFO 2021-05-26 20:20:43,969 [shard 0] osd - evt epoch is 18, i have 17, will wait INFO 2021-05-26 20:20:43,969 [shard 0] osd - osdmap_subscribe(17) ... DEBUG 2021-05-26 20:20:46,930 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@57288 >> mon.2 v2:172.21.15.39:3301/0] <== #4 === osd_m ap(20..21 src has 1..21) v4 (41) INFO 2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map osd_map(20..21 src has 1..21) v4 INFO 2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map epochs [20..21], i have 17, src has [1..21] INFO 2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map message skips epochs 18..19 INFO 2021-05-26 20:20:46,930 [shard 0] osd - osdmap_subscribe(18) ... DEBUG 2021-05-26 20:20:47,936 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@57288 >> mon.2 v2:172.21.15.39:3301/0] <== #5 === osd_m ap(21..22 src has 1..22) v4 (41) INFO 2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map osd_map(21..22 src has 1..22) v4 INFO 2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map epochs [21..22], i have 17, src has [1..22] INFO 2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map message skips epochs 18..20 INFO 2021-05-26 20:20:47,936 [shard 0] osd - osdmap_subscribe(18) ... <osdmap_subscribe(18) over and over> ``` ``` 2021-05-26T20:19:42.048+0000 7f4712ffd700 1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] <== osd.4 v2:172.21.15.62:6801/34686 4 ==== mon_subscribe({mgrmap=0+,osd_pg_creates=0+,osdmap=0+}) v3 ==== 82+0+0 (secure 0 0 0) 0x7f46fc04e150 con 0x7f470401c480 2021-05-26T20:19:42.048+0000 7f4712ffd700 20 mon.b@1(peon) e1 _ms_dispatch existing session 0x7f46fc02f500 for osd.4 2021-05-26T20:19:42.048+0000 7f4712ffd700 20 mon.b@1(peon) e1 entity_name osd.4 global_id 4168 (new_ok) caps allow * 2021-05-26T20:19:42.048+0000 7f4712ffd700 10 mon.b@1(peon) e1 handle_subscribe mon_subscribe({mgrmap=0+,osd_pg_creates=0+,osdmap=0+}) v3 ... 2021-05-26T20:19:49.129+0000 7f4712ffd700 1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] <== osd.4 v2:172.21.15.62:6801/34686 9 ==== mo n_subscribe({osdmap=14}) v3 ==== 36+0+0 (secure 0 0 0) 0x7f46e8556210 con 0x7f470401c480 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon) e1 _ms_dispatch existing session 0x7f46fc02f500 for osd.4 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon) e1 entity_name osd.4 global_id 4168 (new_ok) caps allow * 2021-05-26T20:19:49.129+0000 7f4712ffd700 10 mon.b@1(peon) e1 handle_subscribe mon_subscribe({osdmap=14}) v3 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 is_capable service=mon command= read addr v2:172.21.15.62:6801/34686 on cap allow * 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 allow so far , doing grant allow * 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 allow all 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 is_capable service=osd command= read addr v2:172.21.15.62:6801/34686 on cap allow * 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 allow so far , doing grant allow * 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 allow all 2021-05-26T20:19:49.129+0000 7f4712ffd700 10 mon.b@1(peon).osd e15 check_osdmap_sub 0x7f46e84f0150 next 14 (onetime) 2021-05-26T20:19:49.129+0000 7f4712ffd700 5 mon.b@1(peon).osd e15 send_incremental [14..15] to osd.4 2021-05-26T20:19:49.129+0000 7f4712ffd700 10 mon.b@1(peon).osd e15 build_incremental [14..15] with features 3f01cfbb7ffdffff 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon).osd e15 build_incremental inc 15 622 bytes 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon).osd e15 build_incremental inc 14 578 bytes 2021-05-26T20:19:49.129+0000 7f4712ffd700 1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] --> v2:172.21.15.62:6801/34686 -- osd_map(14.. 15 src has 1..15) v4 -- 0x7f46e856a100 con 0x7f470401c480 ``` ``` seastar::future<> Client::renew_subs() { if (!sub.have_new()) { logger().warn("{} - empty", __func__); return seastar::now(); } logger().trace("{}", __func__); auto m = crimson::make_message<MMonSubscribe>(); m->what = sub.get_subs(); m->hostname = ceph_get_short_hostname(); return send_message(std::move(m)).then([this] { sub.renewed(); }); } ``` ``` INFO 2021-05-26 20:19:42,081 [shard 0] osd - osdmap_subscribe(1) DEBUG 2021-05-26 20:19:42,081 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #6 === mon_s ubscribe({osdmap=1}) v3 (15) ... INFO 2021-05-26 20:19:49,128 [shard 0] osd - osdmap_subscribe(14) DEBUG 2021-05-26 20:19:49,128 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #9 === mon_subscribe({osdmap=14}) v3 (15) ... INFO 2021-05-26 20:19:49,141 [shard 0] osd - osdmap_subscribe(14) WARN 2021-05-26 20:19:49,141 [shard 0] monc - renew_subs - empty <no MMonSubcribe> ... INFO 2021-05-26 20:20:43,941 [shard 0] osd - evt epoch is 18, i have 17, will wait INFO 2021-05-26 20:20:43,941 [shard 0] osd - osdmap_subscribe(17) <no MMonSubcribe> ... INFO 2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map message skips epochs 18..19 INFO 2021-05-26 20:20:46,930 [shard 0] osd - osdmap_subscribe(18) <no MMonSubcribe> ``` [1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136908 Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com> |
||
---|---|---|
.github | ||
admin | ||
bin | ||
ceph-erasure-code-corpus@2d7d78b9cc | ||
ceph-menv | ||
ceph-object-corpus@038c72b5ac | ||
cmake/modules | ||
debian | ||
doc | ||
etc | ||
examples | ||
fusetrace | ||
keys | ||
man | ||
mirroring | ||
monitoring | ||
qa | ||
selinux | ||
share | ||
src | ||
sudoers.d | ||
systemd | ||
udev | ||
.gitattributes | ||
.githubmap | ||
.gitignore | ||
.gitmodule_mirrors | ||
.gitmodules | ||
.mailmap | ||
.organizationmap | ||
.peoplemap | ||
.readthedocs.yml | ||
AUTHORS | ||
ceph.spec.in | ||
CMakeLists.txt | ||
CodingStyle | ||
CONTRIBUTING.rst | ||
COPYING | ||
COPYING-GPL2 | ||
COPYING-LGPL2.1 | ||
COPYING-LGPL3 | ||
do_cmake.sh | ||
do_freebsd.sh | ||
doc_deps.deb.txt | ||
install-deps.sh | ||
make-debs.sh | ||
make-dist | ||
make-srpm.sh | ||
mingw_conf.sh | ||
PendingReleaseNotes | ||
pom.xml | ||
README.aix | ||
README.FreeBSD | ||
README.md | ||
README.solaris | ||
README.windows.rst | ||
run-make-check.sh | ||
SECURITY.md | ||
SubmittingPatches-backports.rst | ||
SubmittingPatches-kernel.rst | ||
SubmittingPatches.rst | ||
win32_build.sh | ||
win32_deps_build.sh |
Ceph - a scalable distributed storage system
Please see http://ceph.com/ for current info.
Contributing Code
Most of Ceph is dual licensed under the LGPL version 2.1 or 3.0. Some miscellaneous code is under BSD-style license or is public domain. The documentation is licensed under Creative Commons Attribution Share Alike 3.0 (CC-BY-SA-3.0). There are a handful of headers included here that are licensed under the GPL. Please see the file COPYING for a full inventory of licenses by file.
Code contributions must include a valid "Signed-off-by" acknowledging the license for the modified or contributed file. Please see the file SubmittingPatches.rst for details on what that means and on how to generate and submit patches.
We do not require assignment of copyright to contribute code; code is contributed under the terms of the applicable license.
Checking out the source
You can clone from github with
git clone git@github.com:ceph/ceph
or, if you are not a github user,
git clone git://github.com/ceph/ceph
Ceph contains many git submodules that need to be checked out with
git submodule update --init --recursive
Build Prerequisites
The list of Debian or RPM packages dependencies can be installed with:
./install-deps.sh
Building Ceph
Note that these instructions are meant for developers who are
compiling the code for development and testing. To build binaries
suitable for installation we recommend you build deb or rpm packages,
or refer to the ceph.spec.in
or debian/rules
to see which
configuration options are specified for production builds.
Build instructions:
./do_cmake.sh
cd build
ninja
(do_cmake.sh now defaults to creating a debug build of ceph that can be up to 5x slower with some workloads. Please pass "-DCMAKE_BUILD_TYPE=RelWithDebInfo" to do_cmake.sh to create a non-debug release.
The number of jobs used by ninja
is derived from the number of CPU cores of
the building host if unspecified. Use the -j
option to limit the job number
if the build jobs are running out of memory. On average, each job takes around
2.5GiB memory.)
This assumes you make your build dir a subdirectory of the ceph.git
checkout. If you put it elsewhere, just point CEPH_GIT_DIR
to the correct
path to the checkout. Any additional CMake args can be specified setting ARGS
before invoking do_cmake. See cmake options
for more details. Eg.
ARGS="-DCMAKE_C_COMPILER=gcc-7" ./do_cmake.sh
To build only certain targets use:
ninja [target name]
To install:
ninja install
CMake Options
If you run the cmake
command by hand, there are many options you can
set with "-D". For example the option to build the RADOS Gateway is
defaulted to ON. To build without the RADOS Gateway:
cmake -DWITH_RADOSGW=OFF [path to top level ceph directory]
Another example below is building with debugging and alternate locations for a couple of external dependencies:
cmake -DLEVELDB_PREFIX="/opt/hyperleveldb" \
-DCMAKE_INSTALL_PREFIX=/opt/ceph -DCMAKE_C_FLAGS="-O0 -g3 -gdwarf-4" \
..
To view an exhaustive list of -D options, you can invoke cmake
with:
cmake -LH
If you often pipe ninja
to less
and would like to maintain the
diagnostic colors for errors and warnings (and if your compiler
supports it), you can invoke cmake
with:
cmake -DDIAGNOSTICS_COLOR=always ..
Then you'll get the diagnostic colors when you execute:
ninja | less -R
Other available values for 'DIAGNOSTICS_COLOR' are 'auto' (default) and 'never'.
Building a source tarball
To build a complete source tarball with everything needed to build from source and/or build a (deb or rpm) package, run
./make-dist
This will create a tarball like ceph-$version.tar.bz2 from git. (Ensure that any changes you want to include in your working directory are committed to git.)
Running a test cluster
To run a functional test cluster,
cd build
ninja vstart # builds just enough to run vstart
../src/vstart.sh --debug --new -x --localhost --bluestore
./bin/ceph -s
Almost all of the usual commands are available in the bin/ directory. For example,
./bin/rados -p rbd bench 30 write
./bin/rbd create foo --size 1000
To shut down the test cluster,
../src/stop.sh
To start or stop individual daemons, the sysvinit script can be used:
./bin/init-ceph restart osd.0
./bin/init-ceph stop
Running unit tests
To build and run all tests (in parallel using all processors), use ctest
:
cd build
ninja
ctest -j$(nproc)
(Note: Many targets built from src/test are not run using ctest
.
Targets starting with "unittest" are run in ninja check
and thus can
be run with ctest
. Targets starting with "ceph_test" can not, and should
be run by hand.)
When failures occur, look in build/Testing/Temporary for logs.
To build and run all tests and their dependencies without other unnecessary targets in Ceph:
cd build
ninja check -j$(nproc)
To run an individual test manually, run ctest
with -R (regex matching):
ctest -R [regex matching test name(s)]
(Note: ctest
does not build the test it's running or the dependencies needed
to run it)
To run an individual test manually and see all the tests output, run
ctest
with the -V (verbose) flag:
ctest -V -R [regex matching test name(s)]
To run an tests manually and run the jobs in parallel, run ctest
with
the -j
flag:
ctest -j [number of jobs]
There are many other flags you can give ctest
for better control
over manual test execution. To view these options run:
man ctest
Building the Documentation
Prerequisites
The list of package dependencies for building the documentation can be
found in doc_deps.deb.txt
:
sudo apt-get install `cat doc_deps.deb.txt`
Building the Documentation
To build the documentation, ensure that you are in the top-level
/ceph
directory, and execute the build script. For example:
admin/build-doc