At Fri, 14 Oct 2011 11:39:47 +0100, Chris Webb wrote: > > MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> writes: > > > Before pulling vdiattr branch, didn't this bug happen? > > I think we never got far enough in the process to try the write because the > setattr -x/getattr stuff failed beforehand, so I can't be sure it hasn't > happened all along. > > > If possible, can you check what is written in sheep.log when this problem > > happens? > > I've put the three log files up at > > http://cdw.me.uk/tmp/sheep-00.log > http://cdw.me.uk/tmp/sheep-01.log > http://cdw.me.uk/tmp/sheep-02.log > > They're very short as I created the cluster afresh and immediately ran the > commands that triggered the problem. Thanks, the reason of this problem is that you use a direct I/O option but the offset and length of "collie vdi write" is not aligned to sector size (512 bytes). I didn't expect that because VM's I/O requests are always sector aligned. Is it okay to exit with error when the offset size is not aligned to 512 bytes? And is it okay to enlarge the read/write buffer length to the sector aligned size when it is not aligned? If possible, I don't want to treat "collie vdi read/write" as special cases. > > > > 0026# collie vdi list > > > name id size used shared creation time vdi id > > > ------------------------------------------------------------------ > > > Floating point exception (core dumped) > > > > Can you get a stack trace from the core? > > This is a small collie interface bug I've seen before and meant to fix > myself but hadn't got around to: it's a division by zero in hval_to_sheep() > (line 205 of include/sheep.h). You do > > ret = get_nth_node(entries, nr_entries, (i + 1) % nr_entries, idx); > > which is a division-by-zero if nr_entries = 0, i.e. where all the nodes have > gone away as in this case. (There aren't any nodes to pick from in that > case, so this should fail but not dump core!) Thanks! I'll fix it in the next patchset too. Kazutaka |