On 10/22/2012 08:35 PM, 迪八哥 wrote: > We are exploring Ceph now, and it shows a better performance over > sheepdog especially for sequential R/W and random write.I think the > Sheepdog and Ceph share a similar internal design(i.e split the image to > 4M object, some kinds of consistent-hash has been used).What we think > most important is **journal disk** in storage node,Ceph's performance > boost up to 3X when an extra 7200rpm sata disk was used for > journal.Will sheepdog consider some similar mechanism? Hi Xiaoxi I have prototyped a journal device like mechanism to jd branch. Currently the missing part is journal recovery logic for crash. I have tested it a 2 disks of same IO capability machine and get a rather promising improvement (80+X): ----------------------------------------------------- ./script/vditest test -c none -w -B 4096 -h1 -a on jd branch: Total write throughput: 33875968.0B/s (32.3M/s), IOPS 8270.5/s. and on current master: Total write throughput: 410419.2B/s (400.8K/s), IOPS 100.2/s. ----------------------------------------------------- For comparison, I also add a object cache (buffered IO) test number: ./script/vditest test -c writeback -w -B 4096 -h1 -a Total write throughput: 61089792.0B/s (58.3M/s), IOPS 14914.5/s. With 'journal file' even on the disk of the same IO capability, we can get 55% of buffered IO performance. I have also tested 'journal file' mode with ramdisk setup as journal device, I got nearly 80% of buffered IO perf. So I conclude with a faster journal device, we can get even more. I only test write performance, but I think the read performance will get increased too. Would you please try 'jd' branch and compare the performance with Ceph? To launch the sheep with another journal device, you need to pass: sheep -j /path/to/journal_directory other_options *NOTE* Because of internal design, you need to pass the path of a directory instead of a file entry. This means we can't operate on raw device file. For a quick overview of 'journal file' I introduced, the idea is very simple: use a dedicated device to log all the IO operations in a sequential manner and then we are safe to change the backend IO operations from O_DSYNC & O_DIRECT into O_RDWR (buffered IO), which will benefit us both read & write performance a lot. Thanks, Yuan |