This patch set aims to provide a ftrace-like trace utility for performance tuning of online system in production environment. This means virtually no overhead when disabled. The patch set achieves this by dynamically patching the binary code at runtime, making use of GCC -pg instrumentation features, which instrument a 5-bytes opcode in every function. When the trace is turned off, the opcode is patched as NOP opcode. turned on, the opcode is a call opcode to our trace functions and we can hook callbacks both for function entry and exit. This is how the patch atually calculate the time spent by function. This is an early draft set, thought the trace infrastrure is almost done. The work left is most related to the trace buffer implementation. Currently in the patch set, a dynamical buffer is drafted but not used yet. So the output is rely on the original dprintf, which is slow and doing sync logging. A final trace output version should be per-thread buffer and will be manipulated by collie command. Further more, the trace infrasture is quite modular, we can hook different callbacks for different purposes. So now the output is like: ... Jan 17 19:49:52 default_trace_graph_return(47) strbuf_grow: 8259 (ns) < how much nanosecond used in the function Jan 17 19:49:52 default_trace_graph_return(47) strbuf_addf: 15663 (ns) Jan 17 19:49:52 default_trace_graph_return(47) strbuf_release: 106 (ns) Jan 17 19:49:52 default_trace_graph_return(47) farm_open: 61610 (ns) Jan 17 19:49:52 default_trace_graph_return(47) xpread: 1647242 (ns) Jan 17 19:49:52 default_trace_graph_return(47) farm_read: 1704604 (ns) Jan 17 19:49:52 default_trace_graph_return(47) farm_close: 8024 (ns) Jan 17 19:49:52 default_trace_graph_return(47) store_read_obj: 1802385 (ns) Jan 17 19:49:52 default_trace_graph_return(47) do_process_work: 1827672 (ns) Jan 17 19:49:52 default_trace_graph_return(47) do_local_io: 1845099 (ns) Jan 17 19:49:52 default_trace_graph_return(47) forward_read_obj_req: 1860139 (ns) Jan 17 19:49:52 default_trace_graph_return(47) get_sd_op: 141 (ns) ... and the supposed one should be more humand readable like following: (take the example from early version of Linux kernel's ftrace output.) ... sys_read() { fget_light() { } 526 <---- nanosecond vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 ... Liu Yuan (11): trace: driver trace to work trace: low level functions in assembly trace: add low level ip2function support logger: teach logger funcs to notrace sheep: export some structures for trace use trace: low-level trace infrastructure proper trace: add return hooker trace: add a new interface for trace utility trace: add a dynamic buffer sheep: revert coroutine for fixing group_handler() race sheep: fix group_handler face collie/Makefile.am | 6 + collie/collie.c | 5 + collie/collie.h | 1 + collie/debug.c | 67 +++++++++++++ configure.ac | 30 ++++++- include/sheep.h | 1 + include/util.h | 47 +++++++++ lib/Makefile.am | 2 +- lib/dbuffer.c | 44 +++++++++ lib/logger.c | 22 ++-- sheep/Makefile.am | 6 +- sheep/group.c | 40 +++----- sheep/ops.c | 19 ++++ sheep/sheep.c | 8 ++ sheep/strbuf.h | 20 ---- sheep/trace/graph.c | 85 ++++++++++++++++ sheep/trace/mcount.S | 84 ++++++++++++++++ sheep/trace/stabs.c | 198 +++++++++++++++++++++++++++++++++++++ sheep/trace/trace.c | 263 ++++++++++++++++++++++++++++++++++++++++++++++++++ sheep/trace/trace.h | 76 +++++++++++++++ sheep/trace/trace.ld | 18 ++++ sheep/work.c | 46 ++++------ sheep/work.h | 32 ++++++ 23 files changed, 1031 insertions(+), 89 deletions(-) |