On 01/17/2012 07:59 PM, Liu Yuan wrote: > This patch set aims to provide a ftrace-like trace utility for performance tuning > of online system in production environment. This means virtually no overhead when > disabled. > > The patch set achieves this by dynamically patching the binary code at runtime, making > use of GCC -pg instrumentation features, which instrument a 5-bytes opcode in every function. > > When the trace is > turned off, the opcode is patched as NOP opcode. > turned on, the opcode is a call opcode to our trace functions and we can hook callbacks both for > function entry and exit. This is how the patch atually calculate the time spent by function. > > This is an early draft set, thought the trace infrastrure is almost done. The work left is most related > to the trace buffer implementation. Currently in the patch set, a dynamical buffer is drafted but not used > yet. So the output is rely on the original dprintf, which is slow and doing sync logging. A final trace > output version should be per-thread buffer and will be manipulated by collie command. > > Further more, the trace infrasture is quite modular, we can hook different callbacks for different purposes. > > So now the output is like: > ... > Jan 17 19:49:52 default_trace_graph_return(47) strbuf_grow: 8259 (ns) < how much nanosecond used in the function > Jan 17 19:49:52 default_trace_graph_return(47) strbuf_addf: 15663 (ns) > Jan 17 19:49:52 default_trace_graph_return(47) strbuf_release: 106 (ns) > Jan 17 19:49:52 default_trace_graph_return(47) farm_open: 61610 (ns) > Jan 17 19:49:52 default_trace_graph_return(47) xpread: 1647242 (ns) > Jan 17 19:49:52 default_trace_graph_return(47) farm_read: 1704604 (ns) > Jan 17 19:49:52 default_trace_graph_return(47) farm_close: 8024 (ns) > Jan 17 19:49:52 default_trace_graph_return(47) store_read_obj: 1802385 (ns) > Jan 17 19:49:52 default_trace_graph_return(47) do_process_work: 1827672 (ns) > Jan 17 19:49:52 default_trace_graph_return(47) do_local_io: 1845099 (ns) > Jan 17 19:49:52 default_trace_graph_return(47) forward_read_obj_req: 1860139 (ns) > Jan 17 19:49:52 default_trace_graph_return(47) get_sd_op: 141 (ns) > ... For your convenience, I use the following script to test the functionality: pkill -9 sheep pkill -9 qemu-io rm store/* -rf for i in 0 1 2; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i;sleep 1;done collie/collie cluster format -b farm qemu-img create -f raw sheepdog:test 1G qemu-io -c "write -P 0x1 0 10M" sheepdog:test qemu-io -c "read -P 0x1 0 10M" sheepdog:test collie/collie debug trace -e #start the trace qemu-io -c "write -P 0x1 0 20M" sheepdog:test qemu-io -c "read -P 0x1 0 10M" sheepdog:test collie/collie debug trace -t #end the trace qemu-io -c "read -P 0x1 0 10M" sheepdog:test Thanks, Yuan |