I am trying to do profile with libunwind (using linux perf), with perf top
monitoring the target process, I get this assembly time cost screen:
0.19 │ mov %rcx,0x18(%rsp) ▒
│ trace_lookup(): ▒
1.54 │ mov 0x8(%r9),%rcx ▒
│ _ULx86_64_tdep_trace(): ▒
0.52 │ and $0x1,%edx ◆
0.57 │ mov %r14d,0xc(%rsp) ▒
0.40 │ mov 0x78(%rsp),%r10 ▒
1.24 │ sub %rdx,%r15 ▒
│ trace_lookup(): ▒
0.35 │ shl %cl,%r12d ▒
│ _ULx86_64_tdep_trace(): ▒
2.18 │ mov 0x90(%rsp),%r8 ▒
│ trace_lookup(): ▒
0.46 │ imul %r15,%r13 ▒
│ _ULx86_64_tdep_trace(): ▒
0.59 │ mov %r15,0x88(%rsp) ▒
│ trace_lookup(): ▒
0.50 │ lea -0x1(%r12),%rdx ▒
1.22 │ shr $0x2b,%r13 ▒
0.37 │ and %r13,%rdx ▒
0.57 │177: mov %rdx,%rbp ▒
0.43 │ shl $0x4,%rbp ▒
1.33 │ add %rdi,%rbp ▒
0.49 │ mov 0x0(%rbp),%rsi ▒
24.40 │ cmp %rsi,%r15 ▒
│ ↓ jne 420 ▒
│ _ULx86_64_tdep_trace(): ▒
2.10 │18e: movzbl 0x8(%rbp),%edx ▒
3.68 │ test $0x8,%dl ▒
│ ↓ jne 370 ▒
1.27 │ mov %edx,%eax ▒
0.06 │ shl $0x5,%eax ▒
0.73 │ sar $0x5,%al ▒
1.70 │ cmp $0xfe,%al ▒
│ ↓ je 380 ▒
0.01 │ ↓ jle 2f0 ▒
0.01 │ cmp $0xff,%al ▒
│ ↓ je 3a0 ▒
0.02 │ cmp $0x1,%al ▒
│ ↓ jne 298 ▒
0.01 │ and $0x10,%edx ▒
│ movl $0x1,0x10(%rsp) ▒
│ movl $0x1,0x1c8(%rbx) ▒
0.00 │ ↓ je 393
The corresponding source code is here trace_lookup source code, If I read correctly, the number of lines of code corresponding to this hot path cmp
instruction is line 296, but I don't know why this line is so slow and cost most of the time?
mov
instruction. – Dimissoryperf top
command? Sometimes in defaultperf top
output it can be easy to focus on wrong process, for example on the perf itself, or on wrong function. – Maddening