On Tue, Apr 10, 2012 at 4:01 PM, Liu Yuan <namei.unix at gmail.com> wrote: > v4: > - change trace interface > new interface: > start trace: collie debug trace start > stop trace: collie debug trace stop > cat output: collie debug trace cat > > - address Kazum's comments, thanks! > - fix() trace_lookup_ip() race. > - disable trace as default > - refactor trace.h, now trace code is totally removed from binary when disabled. > - fix make_text_writable() bug > - use SIGUSR2 instead of SIGUSR1 > > v3: > - fix trace_lookup_ip() > - fix main thread ring buffer init > - fix pop_return_trace() > > v2: > - use ring buffer for trace output > - add a cat operation for collie > > This patch set aims to provide a ftrace-like trace utility for performance tuning > of online system in production environment. This means virtually no overhead when > disabled, rather low even enabled. > > Currently 'graph' tracer is implemented, accounting CPU usage at funciton granularity. > > The patch set achieves this by dynamically patching the binary code at runtime, making > use of GCC -pg instrumentation features, which instrument a 5-bytes opcode in every function. > > When the trace is > turned off, the opcode is patched as NOP opcode. > turned on, the opcode is a call opcode to our trace functions and we can hook callbacks both for > function entry and exit. This is how the patch atually calculate the time spent by function. > > The underlying infrastructure is quite modular, I hope we can implement other tracers that considered useful > later. > > I use the below script[*] to get the following output: > --------------------------------------------------------- > tailai.ly at taobao:~/sheepdog$ ./test3.sh "-b farm" > using backend farm store > Formatting 'sheepdog:test', fmt=raw size=1073741824 > wrote 10485760/10485760 bytes at offset 0 > 10 MiB, 1 ops; 0.8899 sec (11.237 MiB/sec and 1.1237 ops/sec) > read 10485760/10485760 bytes at offset 0 > 10 MiB, 1 ops; 0.5130 sec (19.492 MiB/sec and 1.9492 ops/sec) > > Begin tracing > > wrote 10485760/10485760 bytes at offset 10485760 > 10 MiB, 1 ops; 0.8662 sec (11.544 MiB/sec and 1.1544 ops/sec) > read 10485760/10485760 bytes at offset 10485760 > 10 MiB, 1 ops; 0.5425 sec (18.431 MiB/sec and 1.8431 ops/sec) > > Cat the trace buffer > > Time(us) | Function Graph > ------------------------------- > | do_io_request() { > | forward_write_obj_req() { > | addr_to_str() { > 2.132 | } > | get_sheep_fd() { > 29.960 | } > | send_req() { > 1538.482 | } > | addr_to_str() { > 5.644 | } > | addr_to_str() { > 0.598 | } > | get_sheep_fd() { > 25.426 | } > | send_req() { > 4184.318 | } > | do_local_io() { > | do_process_work() { > | store_write_obj() { > | farm_open() { > | strbuf_add() { > | strbuf_grow() { > | xrealloc() { > 0.786 | } > 1.625 | } > 2.555 | } > | strbuf_addf() { > | strbuf_grow() { > | xrealloc() { > 1.636 | } > 1.894 | } > 4.325 | } > | strbuf_release() { > 0.98 | } > 31.288 | } > | do_write_obj() { > | farm_write() { > | xpwrite() { > 234185.848 | } > | trunk_update_entry() { > | lookup_trunk_entry() { > 0.691 | } > 2.784 | } > 234194.492 | } > 234195.533 | } > | farm_close() { > 6.697 | } > 234237.173 | } > 234238.461 | } > 234260.257 | } > | do_read() { > 8.18 | } > | do_read() { > 9.308 | } > 406753.259 | } > 406774.40 | } > ... > > [*] > #!/bin/bash > > pkill -9 sheep > rm store/* -rf > for i in 0 1 2; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i;sleep 1;done > collie/collie cluster format $1 > qemu-img create -f raw sheepdog:test 1G > qemu-io -c "write -P 0x1 0 10M" sheepdog:test > qemu-io -c "read -P 0x1 0 10M" sheepdog:test > echo -e "\nBegin tracing\n" > collie/collie debug trace start > qemu-io -c "write -P 0x1 10M 10M" sheepdog:test > qemu-io -c "read -P 0x1 10M 10M" sheepdog:test > collie/collie debug trace stop > echo -e "\nCat the trace buffer\n" > collie/collie debug trace cat > t Applied after fixing coding style errors, thanks. Kazutaka |