How fast is std::swap for integer types?

Asked 17/8, 2013 at 9:52 Answered 16/12, 2020 at 15:9

STL implements a generic std::swap function to swap 2 values. It can be presented in the following way:

template <class T> void swap (T& a, T& b)
{
  T c(std::move(a));
  a=std::move(b);
  b=std::move(c);
}

However, there is a XOR swap algorithm to swap 2 integers (http://en.wikipedia.org/wiki/XOR_swap_algorithm):

void swap_u( size_t& x, size_t& y )
{
   x = x^y;
   y = x^y;
   x = x^y;
}

My questions:

Is it an optimization nowadays (on x86 or arm)?
Does C++ standard favor this kind of optimization?
Are there any real STL implementations in the wild that have std::swap specialization for integers?

Rotter answered 17/8, 2013 at 9:52 Comment(6)

XOR swap is not necessarily efficient - it's more of a novelty than a useful optimisation - just use a temporary variable, keep it simple and avoid gimmicks - let the compiler do the clever stuff. – Interdependent 17/8, 2013 at 9:55

The second implementation won't work if x and y point to the same memory address( could happen if you are passing array elements having variable indexes). – Langsdon 17/8, 2013 at 9:56

Swapping using a register for a temporary should be faster. And that's what compiler must be doing in the first case. – Torre 17/8, 2013 at 9:56

A smart compiler understands what std::swap is doing and can simply note that variables can be referred to in different registers or addresses. This is an effectively no cost swap. You can see this if you look at the generated assembly. – Gesticulatory 17/8, 2013 at 10:3

Important lesson here: just because something looks complicated and hackish does not automatically mean it is faster. :) – Theressa 17/8, 2013 at 10:29

@Gesticulatory this doesn't (always) even require a compiler that understands std::swap, or even a terribly smart compiler -- unnecessary register moves will generally be eliminated by fairly simple peephole optimization and register scheduling. (The XOR trick, on the other hand, would take a much smarter compiler to optimize away). – Truncated 17/8, 2013 at 11:34

In the vast majority of situations, XOR swap is not an optimisation.

See this wiki entry.

In most practical scenarios, the trivial swap algorithm using a temporary register is more efficient. Limited situations in which XOR swapping may be practical include:

On a processor where the instruction set encoding permits the XOR swap to be encoded in a smaller number of bytes;

In a region with high register pressure, it may allow the register allocator to avoid spilling a register.

In microcontrollers where available RAM is very limited.

Because these situations are rare, most optimizing compilers do not generate XOR swap code.

Also note that your implementation of XOR swap is broken. You need to first check that x and y aren't aliased. This check will definitely make XOR swap slower.

I'm not aware of any standard library implementation that uses XOR swap.

Note that, regardless of what the standard library implements, if XOR swap were really faster than normal swap then optimizing compilers would do a peephole optimization to turn it into an XOR swap. This really is a case of just letting the compiler choose for you.

Ganister answered 17/8, 2013 at 10:0 Comment(1)

We learned a good lesson here: Freaky-tricky-wiki code could not be an optimization – Company 19/8, 2013 at 9:50

XOR swap is really only a gimmick and can fail in certain cases (e.g. both variables are references to the same object).

XOR swap is also not particularly efficient as it has serial dependencies so it will always take at least three instruction cycles. Using a straightforward swap with a temporary has fewer dependencies, allowing for some parallelism on modern superscalar CPUs - on some CPUs it can even be implemented in one instruction, but even without special instructions it may well execute in two cycles.

Interdependent answered 17/8, 2013 at 10:2 Comment(0)

On X86, a triple XOR swap between memory locations (not CPU registers) takes the same processor cycles as a triple copy. They can be even less if the temporary is a register.

Vshaped answered 17/8, 2013 at 10:8 Comment(0)

As has already been explained in most scenarios the XOR bitfiddling will be slower.

But it also depends a lot on the surrounding code. Lets say that this swap is being done alone, far away from any other code that requires those values (so they are not loaded into registers) and we are working with "normal" x86 processors here.

Any algorithm that swaps the 2 values will at least need 2 operations to load the values from memory into registers and another 2 operations to store those values to memory again (x86 does not have operations to swap the content of 2 memory-locations directly).

When using a temp-variable like so:

void swap (int& a, int& b)
{
  int temp = a;
  a = b;
  b = temp;
}

basically any compiler will recognize that 'temp' is only used locally for the swapping and will not give it a memory-location. And as it only holds the value of 'a' it will not even be a seperate register.

The assembly-code of that will look something like this (pseudo-assembly):

load a to rA
load b to rB
store rA to b
store rB to a

So in most scenarios this would be the most efficient possible in terms of memory-access, number of instructions and number of register.

Only if the compiler fails to recognize that 'temp' is not used for anything else and would store it in a seperate register (or be damned actuall memory) could the XOR-variant be more efficient in anything.

But this is still pruely theoretical cause your swap will be surrounded by other code and that will be far more important there. If the values are not used anyore then the whole swap will be ignored. If the values are used directly after for other computations then it might just be that the following code has 2 registers swapped so the swap it self has 0 instructions. And you will be really hard pressed to find any solution that is more efficient then literally having nothing to do.

And of course there are other more obscure instructionsets that might have instructions to directly swap the content of 2 memory locations.

Cobol answered 16/12, 2020 at 15:9 Comment(0)

Recommended topics

Hot tags