At Fri, 23 Sep 2011 11:03:36 -0400, Shawn Moore wrote: > > > BTW, would you please try 'collie cluster info' to check if the outputs are > > consistent on each node. > > In my testing last night, I went from 4 nodes (2 zones of 2 nodes) to > 6 nodes (3 zones of 2 nodes): > ZONE 1: node173, node174 > ZONE 2: node156, node157 > ZONE 3: blade161, blade162 > > These two nodes (blade161 and blade162) were added to the already > running cluster, so the copies is still 2. Is there anyway to change > the copies after the cluster creation and re-distribute? No, unfortunately. I think it is a bit difficult to support it, but would like to in future. > > Nodes 173, 174, 156 and 157 all said the same thing: > 2011-09-15 20:21:18 34 [192.168.0.156:7000, 192.168.0.157:7000, > 192.168.0.161:7000, 192.168.0.162:7000, 192.168.0.173:7000, > 192.168.0.174:7000] > 2011-09-15 20:21:18 33 [192.168.0.156:7000, 192.168.0.161:7000, > 192.168.0.162:7000, 192.168.0.173:7000, 192.168.0.174:7000] > 2011-09-15 20:21:18 32 [192.168.0.161:7000, 192.168.0.162:7000, > 192.168.0.173:7000, 192.168.0.174:7000] > 2011-09-15 20:21:18 31 [192.168.0.161:7000, 192.168.0.162:7000, > 192.168.0.174:7000] > 2011-09-15 20:21:18 30 [192.168.0.161:7000, 192.168.0.162:7000] > 2011-09-15 20:21:18 29 [192.168.0.161:7000, 192.168.0.162:7000, > 192.168.0.174:7000] > 2011-09-15 20:21:18 28 [192.168.0.161:7000, 192.168.0.162:7000, > 192.168.0.173:7000, 192.168.0.174:7000] > 2011-09-15 20:21:18 27 [192.168.0.162:7000, 192.168.0.173:7000, > 192.168.0.174:7000] > > > When I got to blade161, I see: > 2011-09-15 20:21:18 34 [192.168.0.156:7000, 192.168.0.157:7000, > 192.168.0.161:7000, 192.168.0.162:7000, 192.168.0.173:7000, > 192.168.0.174:7000] > 2011-09-15 20:21:18 33 [192.168.0.156:7000, 192.168.0.161:7000, > 192.168.0.162:7000, 192.168.0.173:7000, 192.168.0.174:7000] > 2011-09-15 20:21:18 32 [192.168.0.161:7000, 192.168.0.162:7000, > 192.168.0.173:7000, 192.168.0.174:7000] > 2011-09-15 20:21:18 31 [192.168.0.161:7000, 192.168.0.162:7000, > 192.168.0.174:7000] > 2011-09-15 20:21:18 30 [192.168.0.161:7000, 192.168.0.162:7000] > 2011-09-15 20:21:18 29 [192.168.0.161:7000, 192.168.0.162:7000, > 192.168.0.174:7000] > 2011-09-15 20:21:18 28 [192.168.0.161:7000, 192.168.0.162:7000, > 192.168.0.173:7000, 192.168.0.174:7000] > 2011-09-15 20:21:18 27 [192.168.0.162:7000, 192.168.0.173:7000, > 192.168.0.174:7000] > 1969-12-31 19:00:21 19 [b00:0:4500:0:300:0:6c01:0, > b0b9:ffff:ffff:ffff:60e0:7402:::63516, 87:0:2200::e040:500, ::, > 1500::8000:300:0:0:14641, 3331:2031:393a:3030:3a32:3100:ff7f:0:49056, > 20ea:3532:ff7f:0:300:::26800, 10c5:b1e5:ec7f:0:400:::44618, > 100::a8f6:b1e5:ec7f:0:65535, 24eb:3532:ff7f:0:1400:0:ec7f:0:60048, > 3234:6562:3a33:3533:323a:6666:3766:3a30:12602, > 3a36:3636:363a:3337:3636:3a33:6133:303a:12849, 100:::65535, > 5:0:500:0:bf00:0:3b8a:0:768, 11:131a:12:f17:1600::, > cc76:66e5:ec7f:0:5:0:500:0:31744, > 3:1c7f:1504:1:90ea:3532:ff7f:0:60568, 300:::63782, > 400::3823:4000:0:0:39898, 300::98ec:3532:ff7f:0:34835, > 7699:4000::20d4:6000:0:0:15472, 5ba6:4000::e0d6:6000:0:0, > 93a7:4000::a0d7:6000:0:0:27072, ::, ::48f4:b1e5:ec7f:0, :::3629, > 822:3a32:ff7f:0:80f6:b1e5:ec7f:0, ::, 35b0:700::308a:4000:0:0, > 5318:4000::b8ec:3532:ff7f:0:31744, 100::] > Segmentation fault > > > Then I go to blade162 and get: > [root at blade162 ~]# collie cluster info > failed to send a req, Success > failed to get a rsp, Success > > > I then look at the commands again on 173, 174, 156, and 157 and they all report: > 2011-09-15 20:21:18 36 [192.168.0.156:7000, 192.168.0.157:7000, > 192.168.0.173:7000, 192.168.0.174:7000] > 2011-09-15 20:21:18 35 [192.168.0.156:7000, 192.168.0.157:7000, > 192.168.0.162:7000, 192.168.0.173:7000, 192.168.0.174:7000] > 2011-09-15 20:21:18 34 [192.168.0.156:7000, 192.168.0.157:7000, > 192.168.0.161:7000, 192.168.0.162:7000, 192.168.0.173:7000, > 192.168.0.174:7000] > 2011-09-15 20:21:18 33 [192.168.0.156:7000, 192.168.0.161:7000, > 192.168.0.162:7000, 192.168.0.173:7000, 192.168.0.174:7000] > 2011-09-15 20:21:18 32 [192.168.0.161:7000, 192.168.0.162:7000, > 192.168.0.173:7000, 192.168.0.174:7000] > 2011-09-15 20:21:18 31 [192.168.0.161:7000, 192.168.0.162:7000, > 192.168.0.174:7000] > 2011-09-15 20:21:18 30 [192.168.0.161:7000, 192.168.0.162:7000] > 2011-09-15 20:21:18 29 [192.168.0.161:7000, 192.168.0.162:7000, > 192.168.0.174:7000] > > > I know I built all the nodes off of git version: > collie-sheepdog-v0.2.3-75-g066d753.tar.gz > It seems the logs just always report as "version 0.2.3" no matter what > git version I've used. I'll add a git revision to the version number. :) > > > Also it seems the sheep.log uses NON 24hr time but collie uses 24hr > time which is preferred, and the command "collie cluster info" shows > the same date/time for all epochs. Shouldn't it show the date/time of > when that epoch was created? Good point. The epoch creation time is not used in internal Sheepdog, but it would be useful for users. > > > These logs were collected after 161 and 162 died and before they were > brought back. You can find the node logs from all nodes below, ~13MB > compressed over 200MB un-compressed. > http://www.stormpoint.com/files/sd_2011-09-23.tgz Thanks, this would be helpful for us. Kazutaka |