It seems just a marketing embellishment.
I found no references to "TSX-NI" nor on the Internet nor in the Intel manuals nor in the Intel ISA extensions manual.
Quoting Intel [1]
Intel Transactional Synchronization Extensions (Intel TSX) comes in two flavours: HLE and RTM.
Due to their implementation, these two aspects are decoupled (either can be supported separately from the other) and only RTM introduces new instructions.
So they are probably referring to RTM.
I believe HLE was introduced first and there should be processors that support HLE but not RTM (the converse, while possible, seems implausible).
So, maybe, it is just the marketing correct way of saying: "This CPU supports our latest TSX features!".
For reference, I wrote a brief introduction to the two parts of Intel TSX on the assumption that "TSX-NI" refers to "TSX RTM".
A complete reference can be found on Intel Manual 1 - Chapter 15.
HLE
The HLE (Hardware Lock Elision) part is backwards compatible.
We can still test its availability with CPUID.07H.EBX.HLE[bit 4] but it is implemented by changing the semantic of the prefixes repne
/repe
for the instructions.
This feature consists in two "new" prefixes: xacquire
and xrelease
.
The CPU is now capable of entering a transactional state where every read is added to the read-set of the transaction and every write is added to the write-set of the transaction and it is not carried out to memory.
The granularity is the size of a cache line.
If a thread reads from the read-set, or writes to either the read-set or the write-set, of another thread then the transaction is aborted.
The CPU restore the architectural state as it was at the beginning of the transaction and re-execute the instructions non-transactionally.
If the transaction complete successfully, all the written memory is committed atomically altogether.
Transactions are delimited by xacquire
and xrelease
.
They can nest but there is a limit on the depth (above which the transaction is aborted) and on the number of different locks that can be elided (exceeded that the CPU won't elide new locks but won't abort the transaction).
When a nested transaction is aborted, the CPU restarts executing the outermost transaction.
xacquire
(opcode F2
, same as repne
) is used in front of the instruction that would acquire a lock (i.e. write to the lock) and marks the beginning of a transaction.
This read is not added to the write-set (or no concurrency could happen as every thread writes to the lock and that would abort any subsequent transaction immediately).
It is added to the read-set instead.
xrelease
(opcode F3
) is used in front of the instruction that would release a lock and marks the ending of the transaction.
xrelease
must be used on the same lock used with xacquire
to pair with it and complete the transaction.
xacquire
can only be used with the lock
d version of these instructions: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCHG8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, XCHG
.
xrelease
with the same instructions plus MOV mem, reg
and MOV mem, imm
without a lock
prefix.
A new instruction xtest
is available if HLE (or RTM) is present, it sets the ZF is the processor is not inside a transaction.
RTM
The RTM (Restricted Transactional Memory) is not backwards compatible.
It can be tested with CPUID.07H.EBX.RTM [bit 11].
It introduces three new instructions: xbegin
, xend
and xabort
.
They are just a new interface to the already specified, and common, transactional execution capability.
The xbegin
must provide, as a relative offset, a pointer to the fallback code path.
This code is executed whenever the transaction fails to be committed.
In such cases eax
holds the reason of the abort.
xend
ends the transaction and instructs the CPU to commit it.
xabort
lets the programmer abort the transaction explicitly with a custom error code.
Intel makes no guarantees about the ability of the processor to successfully commit a transaction.
While HLE has a set of very specific conditions, RTM is a "best effort" kind of feature - thus the requirement for a fallback code.
RTM is more low-level than HLE, it lets the programmer use transactional memory, with or without the use of locks.
Mixing HLE and RTM
Quoting Intel:
The behaviour when HLE and RTM are nested together—HLE
inside RTM or RTM inside HLE—is implementation specific. However, in all cases, the
implementation will maintain HLE and RTM semantics. An implementation may
choose to ignore HLE hints when used inside RTM regions, and may cause a transactional abort when RTM instructions are used inside HLE regions. In the latter case,
the transition from transactional to non-transactional execution occurs seamlessly
since the processor will re-execute the HLE region without actually doing elision, and
then execute the RTM instructions.