mysterious rtm abort using haswell tsx
Asked Answered
Q

1

6

I'm experimenting with the tsx extensions in haswell, by adapting an existing medium-sized (1000's of lines) codebase to using GCC transactional memory extensions (which indirectly are using haswell tsx in this machine) instead of coarse grained locks. I am using GCC's transactional_memory extensions, not writing my own _xbegin / _xend directly. I am using the ITM_DEFAULT_METHOD=htm

I'm having issues getting it to work fast enough because I get high rates of hardware transaction abort for mysterious reasons. As shown below, these aborts are not due to conflicts nor due to capacity limitations.

Here is the perf command I used to quantify the failure rate and underlying causes:

perf stat \
 -e cpu/event=0x54,umask=0x2,name=tx_mem_abort_capacity_write/ \
 -e cpu/event=0x54,umask=0x1,name=tx_mem_abort_conflict/ \
 -e cpu/event=0x5d,umask=0x1,name=tx_exec_misc1/ \
 -e cpu/event=0x5d,umask=0x2,name=tx_exec_misc2/ \
 -e cpu/event=0x5d,umask=0x4,name=tx_exec_misc3/ \
 -e cpu/event=0x5d,umask=0x8,name=tx_exec_misc4/ \
 -e cpu/event=0x5d,umask=0x10,name=tx_exec_misc5/ \
 -e cpu/event=0xc9,umask=0x1,name=rtm_retired_start/ \
 -e cpu/event=0xc9,umask=0x2,name=rtm_retired_commit/ \
 -e cpu/event=0xc9,umask=0x4,name=rtm_retired_aborted/pp \
 -e cpu/event=0xc9,umask=0x8,name=rtm_retired_aborted_misc1/ \
 -e cpu/event=0xc9,umask=0x10,name=rtm_retired_aborted_misc2/ \
 -e cpu/event=0xc9,umask=0x20,name=rtm_retired_aborted_misc3/ \
 -e cpu/event=0xc9,umask=0x40,name=rtm_retired_aborted_misc4/ \
 -e cpu/event=0xc9,umask=0x80,name=rtm_retired_aborted_misc5/ \ 
./myprogram -th 1 -reps 3000000

So, the program runs some code with transactions in it 30 million times. Each request involves one transaction gcc __transaction_atomic block. There is only one thread in this run.

This particular perf command captures most of the relevant tsx performance events described in the Intel software developers manual vol 3.

The output from perf stat is the following:

             0 tx_mem_abort_capacity_write                                  [26.66%]
             0 tx_mem_abort_conflict                                        [26.65%]
    29,937,894 tx_exec_misc1                                                [26.71%]
             0 tx_exec_misc2                                                [26.74%]
             0 tx_exec_misc3                                                [26.80%]
             0 tx_exec_misc4                                                [26.92%]
             0 tx_exec_misc5                                                [26.83%]
    29,906,632 rtm_retired_start                                            [26.79%]
             0 rtm_retired_commit                                           [26.70%]
    29,985,423 rtm_retired_aborted                                          [26.66%]
             0 rtm_retired_aborted_misc1                                    [26.75%]
             0 rtm_retired_aborted_misc2                                    [26.73%]
    29,927,923 rtm_retired_aborted_misc3                                    [26.71%]
             0 rtm_retired_aborted_misc4                                    [26.69%]
           176 rtm_retired_aborted_misc5                                    [26.67%]

  10.583607595 seconds time elapsed

As you can see from the output:

  • The rtm_retired_start count is 30 million (matches input to program)
  • The rtm_retired_abort count is about the same (no commits at all)
  • The abort_conflict and abort_capacity counts are 0, so these are not the reasons. Also, recall it is only one thread running, conflicts should be rare.
  • The only actual leads here are the high values of tx_exec_misc1 and rtm_retired_aborted_misc3, which are somewhat similar in description.

The Intel manual (vol 3) defines rtm_retired_aborted_misc3 counters:

code: C9H 20H

mnemonic: RTM_RETIRED.ABORTED_MISC3

description: Number of times an RTM execution aborted due to HLE unfriendly instructions.

The definition for tx_exec_misc1 has some similar words:

code: 5DH 01H

mnemonic: TX_EXEC.MISC1

description: Counts the number of times a class of instructions that may cause a transactional abort was executed. Since this is the count of execution, it may not always cause a transactional abort.

I checked the assembly location for the aborts using perf record/ perf report using high precision (PEBS) support for rtm_retired_aborted. The location has a mov instruction from register to register. No weird instruction names seen nearby.

Update:

Here are two things I've tried since then:

1) the tx_exec_misc1 and rtm_retired_aborted_misc3 signature we we see here can be obtained, for example, by a dummy block of the form

for (int i = 0; i < 10000000; i++){
  __transaction_atomic{
    _xabort(1);
  }
}

or one of the form

for (int i = 0; i < 10000000; i++){
  __transaction_atomic{
    printf("hello");
    fflush(stdout);
  }
}

In both cases, the perf counters look similar to what I see. However, in both cases the perf report for -e cpu/tx-abort/ points to the intuitively correct assembly lines: an xabort instruction for the first example and a syscall one for the second one. In the real codebase, the perf report points to a stack push right at the start of a function:

           :    00000000004167e0 <myns::myfun()>:
    100.00 :      4167e0:       push   %rbp
      0.00 :      4167e1:       mov    %rsp,%rbp
      0.00 :      4167e4:       push   %r15

I have also run the same command under the intel software development emulator. It turns out that the problem goes away in that case: I get no aborts as far as the application is concerned.

Quickel answered 6/5, 2015 at 6:49 Comment(4)
Could you post your transactional loop?Tenishatenn
Hi Matthew. Unfortunately its a somewhat large loop (spans multiple function calls, some of the functions are textually large, though the actual execution path need not be).Quickel
Is it possible for you to start cutting down the loop to see what's triggering this? This sounds like an accidental system call somewhere... though your perf results seem a bit bizarre.Tenishatenn
I will try to, will be back in a few days. Thanks!Quickel
R
1

Though it's been the case for a while, I found this unanswered question while searching, so here's the answer: This is a hardware bug in Haswell and early Broadwell chips.

The particular hardware erratum assigned by Intel is HSW136, and is not fixable using microcode updates. Indeed, I think it was in stepping 4 that the feature was no longer reported as available by the cpuid instruction, even when there was (faulty) silicon on the chip to implement it.

Ranch answered 14/11, 2017 at 14:1 Comment(1)
Microcode updates have disabled TSX in Broadwell as well. And due to TAA vulnerabilities, most Skylake-family CPUs also have HLE disabled in microcode, with RTM always aborting. (Or the OS can set a bit to not advertize the useless "feature") intel.com/content/www/us/en/support/articles/000059422/… . And the feature has been removed from Ice Lake. en.wikipedia.org/wiki/…Dross

© 2022 - 2024 — McMap. All rights reserved.