[Sheepdog] Segmentation faults and cluster failure

Shawn Moore smmoore at gmail.com
Fri Sep 16 16:25:19 CEST 2011


I will apologize in advance for this really long post, but I like to
provide as much data as I can in advance.

I have been experimenting for about a day on using sheepdog and one of
our largest issues at the time was how to handle replicating to
different data centers.  It looks like zones can handle this after
browsing the listserv and git repo.

So last night I tore down the cluster and re-setup with zone 1 and 2
and copies 2.  That worked just like I wanted it to.

DC 1		DC 2
node173		node156
node174		node157

Today I went to do some testing on failing node(s).  When I killed one
of them everything seems ok and when I bring it back, it appears to
re-sync just fine.

I waited until I saw "recovery complete" and then about 5 minutes
later, I killed an entire DC (or zone).  In this example I killed DC2
which had nodes 156 and 157.

In listing which nodes had the vdi object I can see that one node had
the object (should be 2) and one didn't.  Which I understand because
the "mirror" side is down.

So then I go to bring it back up (zone 2) and I can see recovery
starting to take place.  Once I see "recovery complete" I do:
[node156 ~]# collie node info
Id	Size	Used	Use%
 0	386 GB	21 GB	  5%
 1	381 GB	17 GB	  4%
 2	398 GB	21 GB	  5%
 3	394 GB	17 GB	  4%

Total	1.5 TB	76 GB	  4%, total virtual VDI Size	100 GB


Looks good I think.  Then I do:
[node156 ~]# collie node list
   Idx - Host:Port          Vnodes       Zone
---------------------------------------------
     0 - 192.168.0.156:7000 	64          2
     1 - 192.168.0.157:7000 	64          2
*    2 - 192.168.0.173:7000 	64          1
     3 - 192.168.0.174:7000 	64          1

Still looks good.  Then when I do:
[node156 ~]# collie cluster info
Cluster status: running

Creation time        Epoch Nodes
2011-09-15 20:21:18     15 [192.168.0.156:7000, 192.168.0.157:7000,
192.168.0.173:7000, 192.168.0.174:7000]
2011-09-15 20:21:18     14 [192.168.0.156:7000, 192.168.0.173:7000,
192.168.0.174:7000]
2011-09-15 20:21:18     13 [192.168.0.173:7000, 192.168.0.174:7000]
2011-09-15 20:21:18     12 [192.168.0.156:7000, 192.168.0.173:7000,
192.168.0.174:7000]
2011-09-15 20:21:18     11 [192.168.0.156:7000, 192.168.0.157:7000,
192.168.0.173:7000, 192.168.0.174:7000]
2011-09-15 20:21:18     10 [192.168.0.156:7000, 192.168.0.173:7000,
192.168.0.174:7000]
2011-09-15 20:21:18      9 [192.168.0.156:7000, 192.168.0.157:7000,
192.168.0.173:7000, 192.168.0.174:7000]
2011-09-15 20:21:18      8 [192.168.0.157:7000, 192.168.0.173:7000,
192.168.0.174:7000]
1996-09-04 21:47:32 825112369 [3030:3000:7d7f:0:4682:c294:7d7f:0,
80f4:e394:7d7f:0:90e0:3ae2:ff7f:0:57504,
a8e0:3ae2:ff7f:0:100:0:100:0:21887, a50b:4000::d0de:e394:7d7f:0:21418,
3f00:0:7d7f:0:300:::14641, 3034:2032:313a:3437:3a33:3200:ff7f:0,
5a97:4000::5c78:9694:7d7f:0:46, c0ea:6000::f877:8a94:7d7f:0:50776,
403c:8a94:7d7f:0:ffff:ffff:::58232, 100::28c2:6000:0:0,
2000:0:2f00:0:1500:0:400:0:8, 300:0:f700:0:100:0:7d7f:0:51136,
9022:b801::bf00:0:3b8a:0:34560, 0:0:f0a0:500::, :::12596,
::8000:300:3:1c7f:1504:1:58232, 300::b0e1:3ae2:ff7f:0:57948,
::5a97:4000:0:0:9011, 100::78e3:3ae2:ff7f:0:58232,
ca00::2095:4000:0:0:50624, c04f:4000::dba1:4000:0:0:51328,
::13a3:4000:0:0:51520, c06a:4000::, ::4fe2:3ae2:ff7f:0:1,
::88a9:9294:7d7f:0, c085:4000:::5779,
98e3:3ae2:ff7f:0:586:4000:::64360, :::7088, 70e3:3ae2:ff7f::,
5dec:8b94:7d7f:::58232, ::300:0:201e:4000:0:0,
4b16:7372:df8d:7014:b01b:4000:::58224, :::5707,
4b16:43aa:c8a4:8bea::ff7f:0, ::c085:4000:0:0:58232, 300::,
b01b:4000::70e3:3ae2:ff7f:0, d91b:4000::68e3:3ae2:ff7f:0:28,
300::16f9:3ae2:ff7f:0:63773, 25f9:3ae2:ff7f:::63786,
47f9:3ae2:ff7f:0:52f9:3ae2:ff7f:0:63842,
70f9:3ae2:ff7f:0:93f9:3ae2:ff7f:0:63910,
b0f9:3ae2:ff7f:0:affe:3ae2:ff7f:0:65225,
15ff:3ae2:ff7f:0:1fff:3ae2:ff7f:0:65328,
47ff:3ae2:ff7f:0:4fff:3ae2:ff7f:0:65370,
67ff:3ae2:ff7f:0:9dff:3ae2:ff7f:0:65471, d4ff:3ae2:ff7f:::33,
f0:3fe2:ff7f:0:1000:::62463, 600::10:0:0:0:17, 6400::300:0:0:0:64,
400::3800:0:0:0:5, 800::700:0:0:0:61440, 800:::9, b01b:4000:0:0:b00::,
c00:::13, ::e00:0:0:0, 1700:::25, 79e5:3ae2:ff7f:0:1f00:::65511,
f00::89e5:3ae2:ff7f:0, :::63232,
5439:b9ef:4638:8a25:8b78:3836:5f36:3400, ::, ::, ::, ::, ::, ::, ::,
::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::,
::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::,
::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::,
::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::,
::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::,
::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::,
::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::,
::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::,
::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::,
::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::,
::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::,
::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::, ::,
6c6c:6965:63:6c75:7374:6572:69:6e66:111,
4d45:3d6e:6f64:6531:3536:2e63:6174:6177:24930,
4552:4d3d:7874:6572:6d00:5348:454c:4c3d:25135,
6800:4849:5354:5349:5a45:3d31:3030:3000:21331,
4e54:3d31:3532:2e34:312e:3231:312e:3131:8249,
3232:53:5348:5f54:5459:3d2f:6465:762f:29808,
4552:3d72:6f6f:7400:4c53:5f43:4f4c:4f52:15699,
693d:3031:3b33:343a:6c6e:3d30:313b:3336:27962,
693d:3430:3b33:333a:736f:3d30:313b:3335:25658,
353a:6264:3d34:303b:3333:3b30:313a:6364:13373,
313a:6f72:3d34:303b:3331:3b30:313a:6d69:12349,
373b:3431:3a73:753d:3337:3b34:313a:7367:13117,
613d:3330:3b34:313a:7477:3d33:303b:3432:28474,
323a:7374:3d33:373b:3434:3a65:783d:3031:13115,
723d:3031:3b33:313a:2a2e:7467:7a3d:3031:13115,
6a3d:3031:3b33:313a:2a2e:7461:7a3d:3031:13115,
683d:3031:3b33:313a:2a2e:6c7a:6d61:3d30:15153,
6c7a:3d30:313b:3331:3a2a:2e74:787a:3d30:15153,
6970:3d30:313b:3331:3a2a:2e7a:3d30:313b:12595,
313b:3331:3a2a:2e64:7a3d:3031:3b33:313a:11818,
3331:3a2a:2e6c:7a3d:3031:3b33:313a:2a2e:31352,
3a2a:2e62:7a32:3d30:313b:3331:3a2a:2e74:31330,
3a2a:2e74:627a:323d:3031:3b33:313a:2a2e:31330,
3a2a:2e74:7a3d:3031:3b33:313a:2a2e:6465:15714,
2a2e:7270:6d3d:3031:3b33:313a:2a2e:6a61:15730,
2a2e:7261:723d:3031:3b33:313a:2a2e:6163:15717,
2a2e:7a6f:6f3d:3031:3b33:313a:2a2e:6370:28521,
3a2a:2e37:7a3d:3031:3b33:313a:2a2e:727a:12349,
2e6a:7067:3d30:313b:3335:3a2a:2e6a:7065:15719,
2a2e:6769:663d:3031:3b33:353a:2a2e:626d:15728,
2a2e:7062:6d3d:3031:3b33:353a:2a2e:7067:15725,
2a2e:7070:6d3d:3031:3b33:353a:2a2e:7467:15713,
2a2e:7862:6d3d:3031:3b33:353a:2a2e:7870:15725,
2a2e:7469:663d:3031:3b33:353a:2a2e:7469:26214,
3a2a:2e70:6e67:3d30:313b:3335:3a2a:2e73:26486,
3a2a:2e73:7667:7a3d:3031:3b33:353a:2a2e:28269,
353a:2a2e:7063:783d:3031:3b33:353a:2a2e:28525,
353a:2a2e:6d70:673d:3031:3b33:353a:2a2e:28781,
3335:3a2a:2e6d:3276:3d30:313b:3335:3a2a:27950,
3335:3a2a:2e6f:676d:3d30:313b:3335:3a2a:27950,
3335:3a2a:2e6d:3476:3d30:313b:3335:3a2a:27950,
3b33:353a:2a2e:766f:623d:3031:3b33:353a:11818,
3335:3a2a:2e6e:7576:3d30:313b:3335:3a2a:30510,
3335:3a2a:2e61:7366:3d30:313b:3335:3a2a:29230,
353a:2a2e:726d:7662:3d30:313b:3335:3a2a:26158,
3335:3a2a:2e61:7669:3d30:313b:3335:3a2a:26158,
3335:3a2a:2e66:6c76:3d30:313b:3335:3a2a:26414,
353a:2a2e:646c:3d30:313b:3335:3a2a:2e78:26211,
3a2a:2e78:7764:3d30:313b:3335:3a2a:2e79:30325,
3a2a:2e63:676d:3d30:313b:3335:3a2a:2e65:26221,
3a2a:2e61:7876:3d30:313b:3335:3a2a:2e61:30830,
3a2a:2e6f:6776:3d30:313b:3335:3a2a:2e6f:30823,
3a2a:2e61:6163:3d30:313b:3336:3a2a:2e61:15733,
2a2e:666c:6163:3d30:313b:3336:3a2a:2e6d:25705,
3a2a:2e6d:6964:693d:3031:3b33:363a:2a2e:27501,
363a:2a2e:6d70:333d:3031:3b33:363a:2a2e:28781,
363a:2a2e:6f67:673d:3031:3b33:363a:2aSegmentation fault


So doesn't look good.  Then I do this same command "collie cluster
info" on the other zone 2 node and it does the exact same thing
"segmentation fault".  Then I go to the zone 1 nodes and run it and
they segfault as well.  So now the cluster is completely down.  I have
tried to find the node with the highest epoch and start it back up
first but no matter what I do I can't get the cluster up, always
segfaults.

One node173 I get:
Sep 16 09:41:01 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000001
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969200000000
Sep 16 09:41:01 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000002
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969200000000
Sep 16 09:41:01 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000003
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969200000000
Sep 16 09:41:01 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000004
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969200000000
Sep 16 09:41:01 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000005
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969200000000
Sep 16 09:41:01 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000006
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969200000000
Sep 16 09:41:01 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000007
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969300000000
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969200000000
Sep 16 09:41:01 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000008
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969300000000
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969200000000
Sep 16 09:41:01 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000009
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969200000000
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969300000000
Sep 16 09:41:01 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000010
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969200000000
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969300000000
Sep 16 09:41:01 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000011
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969200000000
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969300000000
Sep 16 09:41:01 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000012
Sep 16 09:41:01 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000013
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969200000000
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969300000000
Sep 16 09:41:01 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000014
Sep 16 09:41:01 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000015
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969200000000
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969300000000
Sep 16 09:41:01 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000016
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969300000000
Sep 16 09:41:01 init_epoch_path(1934) found the vdi obj, 80f5969200000000
Sep 16 09:41:01 jrnl_recover(2240) Openning the directory
/node/sheepdog/journal/00000016/.
Sep 16 09:41:01 worker_routine(206) started this thread 0
Sep 16 09:41:01 worker_routine(206) started this thread 0
Sep 16 09:41:01 worker_routine(206) started this thread 3
Sep 16 09:41:01 worker_routine(206) started this thread 1
Sep 16 09:41:01 worker_routine(206) started this thread 0
Sep 16 09:41:01 worker_routine(206) started this thread 0
Sep 16 09:41:01 worker_routine(206) started this thread 0
Sep 16 09:41:01 worker_routine(206) started this thread 1
Sep 16 09:41:01 worker_routine(206) started this thread 2
Sep 16 09:41:01 worker_routine(206) started this thread 2
Sep 16 09:41:01 worker_routine(206) started this thread 3
Sep 16 09:41:01 set_addr(1723) addr = 192.168.0.173, port = 7000
Sep 16 09:41:01 create_cluster(1778) zone id = 1
Sep 16 09:41:01 main(167) Sheepdog daemon (version 0.2.3) started
Sep 16 09:41:01 sd_confchg(1621) confchg nodeid add92998
Sep 16 09:41:01 sd_confchg(1623) 1 0 1
Sep 16 09:41:01 sd_confchg(1627) [0] node_id: -1378276968, pid: 19921, reason: 0
Sep 16 09:41:01 sd_confchg(1641) allow new confchg, 0x24e5020
Sep 16 09:41:01 start_cpg_event_work(1465) 0 0
Sep 16 09:41:01 cpg_event_fn(1279) 0x24e5020, 0 2
Sep 16 09:41:01 cpg_event_done(1315) 0x24e5020
Sep 16 09:41:01 __sd_confchg_done(1206) 19921 add92998
Sep 16 09:41:01 update_cluster_info(683) l nodeid: add92998, pid:
19921, ip: 192.168.0.173:7000
Sep 16 09:41:01 cpg_event_done(1373) free 0x24e5020


Then after this I try to bring up node174.  node174 says:
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000001
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000002
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000003
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000004
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000005
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000006
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000007
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969500000000
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969400000000
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969600000000
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000008
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969600000000
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969400000000
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969500000000
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000009
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969600000000
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969500000000
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969400000000
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000010
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969600000000
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969500000000
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969400000000
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000011
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969600000000
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969400000000
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969500000000
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000012
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000013
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969500000000
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969400000000
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969600000000
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000014
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000015
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969600000000
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969400000000
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969500000000
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000016
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969400000000
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969500000000
Sep 16 09:41:18 init_epoch_path(1934) found the vdi obj, 80f5969600000000
Sep 16 09:41:18 init_epoch_path(1913) found the obj dir,
/node/sheepdog/obj//00000017
Sep 16 09:41:18 jrnl_recover(2240) Openning the directory
/node/sheepdog/journal/00000017/.
Sep 16 09:41:18 worker_routine(206) started this thread 0
Sep 16 09:41:18 worker_routine(206) started this thread 0
Sep 16 09:41:18 worker_routine(206) started this thread 1
Sep 16 09:41:18 worker_routine(206) started this thread 0
Sep 16 09:41:18 worker_routine(206) started this thread 0
Sep 16 09:41:18 worker_routine(206) started this thread 1
Sep 16 09:41:18 worker_routine(206) started this thread 0
Sep 16 09:41:18 worker_routine(206) started this thread 2
Sep 16 09:41:18 worker_routine(206) started this thread 2
Sep 16 09:41:18 worker_routine(206) started this thread 3
Sep 16 09:41:18 worker_routine(206) started this thread 3
Sep 16 09:41:18 set_addr(1723) addr = 192.168.0.174, port = 7000
Sep 16 09:41:18 create_cluster(1778) zone id = 1
Sep 16 09:41:18 main(167) Sheepdog daemon (version 0.2.3) started
Sep 16 09:41:18 sd_confchg(1621) confchg nodeid add92998
Sep 16 09:41:18 sd_confchg(1623) 2 0 1
Sep 16 09:41:18 sd_confchg(1627) [0] node_id: -1378276968, pid: 19921,
reason: -1308593377
Sep 16 09:41:18 sd_confchg(1627) [1] node_id: -1361499752, pid: 24781,
reason: 6485728
Sep 16 09:41:18 sd_confchg(1641) allow new confchg, 0x269a020
Sep 16 09:41:18 start_cpg_event_work(1465) 0 0
Sep 16 09:41:18 cpg_event_fn(1279) 0x269a020, 0 2
Sep 16 09:41:18 cpg_event_done(1315) 0x269a020
Sep 16 09:41:18 send_join_request(1168) 2933467544 24781
Sep 16 09:41:18 cpg_event_done(1373) free 0x269a020
Sep 16 09:41:18 sd_deliver(987) op: 1, state: 1, size: 32840, from:
192.168.0.174:7000, nodeid: 2933467544, pid: 24781
Sep 16 09:41:18 sd_deliver(996) allow new deliver, 0x269a160
Sep 16 09:41:18 start_cpg_event_work(1465) 0 1
Sep 16 09:41:18 cpg_event_fn(1279) 0x269a160, 1 2
Sep 16 09:41:18 cpg_event_fn(1293) 1
Sep 16 09:41:18 __sd_deliver(839) op: 1, state: 1, size: 32840, from:
192.168.0.174:7000, pid: 24781
Sep 16 09:41:18 cpg_event_done(1315) 0x269a160
Sep 16 09:41:18 __sd_deliver_done(955) op: 1, state: 1, size: 32840,
from: 192.168.0.174:7000
Sep 16 09:41:18 cpg_event_done(1373) free 0x269a160
Sep 16 09:41:18 sd_deliver(987) op: 1, state: 3, size: 32840, from:
192.168.0.174:7000, nodeid: 2916690328, pid: 19921
Sep 16 09:41:18 sd_deliver(996) allow new deliver, 0x269a160
Sep 16 09:41:18 start_cpg_event_work(1465) 0 1
Sep 16 09:41:18 cpg_event_fn(1279) 0x269a160, 1 2
Sep 16 09:41:18 cpg_event_fn(1293) 3
Sep 16 09:41:18 __sd_deliver(839) op: 1, state: 3, size: 32840, from:
192.168.0.174:7000, pid: 24781
Sep 16 09:41:18 cpg_event_done(1315) 0x269a160
Sep 16 09:41:18 update_cluster_info(611) failed to join sheepdog, 65
Sep 16 09:41:18 __sd_deliver_done(955) op: 1, state: 3, size: 32840,
from: 192.168.0.174:7000
Sep 16 09:41:18 cpg_event_done(1373) free 0x269a160


During this same time, node173 says:
Sep 16 09:41:18 sd_confchg(1621) confchg nodeid add92998
Sep 16 09:41:18 sd_confchg(1623) 2 0 1
Sep 16 09:41:18 sd_confchg(1627) [0] node_id: -1378276968, pid: 19921,
reason: 1404584655
Sep 16 09:41:18 sd_confchg(1627) [1] node_id: -1361499752, pid: 24781,
reason: 6485728
Sep 16 09:41:18 sd_confchg(1641) allow new confchg, 0x24e5020
Sep 16 09:41:18 start_cpg_event_work(1465) 0 0
Sep 16 09:41:18 cpg_event_fn(1279) 0x24e5020, 0 2
Sep 16 09:41:18 cpg_event_done(1315) 0x24e5020
Sep 16 09:41:18 __sd_confchg_done(1232) l nodeid: add92998, pid:
19921, ip: 192.168.0.173:7000
Sep 16 09:41:18 cpg_event_done(1373) free 0x24e5020
Sep 16 09:41:18 sd_deliver(987) op: 1, state: 1, size: 32840, from:
192.168.0.174:7000, nodeid: 2933467544, pid: 24781
Sep 16 09:41:18 sd_deliver(996) allow new deliver, 0x24e51a0
Sep 16 09:41:18 start_cpg_event_work(1465) 0 1
Sep 16 09:41:18 cpg_event_fn(1279) 0x24e51a0, 1 2
Sep 16 09:41:18 cpg_event_fn(1293) 1
Sep 16 09:41:18 __sd_deliver(839) op: 1, state: 1, size: 32840, from:
192.168.0.174:7000, pid: 24781
Sep 16 09:41:18 cpg_event_done(1315) 0x24e51a0
Sep 16 09:41:18 __sd_deliver_done(955) op: 1, state: 1, size: 32840,
from: 192.168.0.174:7000
Sep 16 09:41:18 get_cluster_status(435) sheepdog is waiting with older
epoch, 17 16 192.168.0.174:7000
Sep 16 09:41:18 cpg_event_done(1373) free 0x24e51a0
Sep 16 09:41:18 sd_deliver(987) op: 1, state: 3, size: 32840, from:
192.168.0.174:7000, nodeid: 2916690328, pid: 19921
Sep 16 09:41:18 sd_deliver(996) allow new deliver, 0x24e51a0
Sep 16 09:41:18 start_cpg_event_work(1465) 0 1
Sep 16 09:41:18 cpg_event_fn(1279) 0x24e51a0, 1 2
Sep 16 09:41:18 cpg_event_fn(1293) 3
Sep 16 09:41:18 __sd_deliver(839) op: 1, state: 3, size: 32840, from:
192.168.0.174:7000, pid: 24781
Sep 16 09:41:18 cpg_event_done(1315) 0x24e51a0
Sep 16 09:41:18 __sd_deliver_done(955) op: 1, state: 3, size: 32840,
from: 192.168.0.174:7000
Sep 16 09:41:18 cpg_event_done(1373) free 0x24e51a0


Currently nothing on here is important and it's only testing data, but
I would like to know how to recover from this situation if it were for
real.



More information about the sheepdog mailing list