There is some disagreement about the most effective way to accomplish this iteration through array partitions.
First the one time integer modulo method, this must define auto size
in addition to the changes in my answer because gcc does not yet support size
:
auto foo = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
auto size = distance(cbegin(foo), cend(foo));
auto bar = cbegin(foo);
auto finish = prev(cend(foo), size % 3);
for(auto it = size <= 3 ? cend(foo) : next(bar, 3); it != finish; bar = it, it = next(bar, 3)) {
for_each(bar, it, [](const auto& i) { cout << i << '\t'; });
cout << endl;
}
for_each(bar, finish, [](const auto& i) { cout << i << '\t'; });
cout << endl;
for_each(finish, cend(foo), [](const auto& i) { cout << i << '\t'; });
cout << endl;
This creates 112 lines of assembly, most notably the conditional it != finish
generates these instructions:
cmpq %r12, %r13
je .L19
movq %r12, %rbx
jmp .L10
Second the repeated iterator subtraction using Ben Voigt's try_advance
but only with the random access specialization because there is a compiler conflict for random access iterators:
auto foo = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
auto bar = cbegin(foo);
for (auto it = cbegin(foo), end = cend(foo); try_advance(it, 3, end); bar = it) {
for_each(bar, it, [](const auto& i) { cout << i << '\t'; });
cout << endl;
}
for_each(bar, cend(foo), [](const auto& i) { cout << i << '\t'; });
cout << endl;
This creates 119 lines of assembly, most notably the conditional in try_advance
: if (end - it < stride) return false;
incurs a per iteration generating the code:
movq %r12, %rax
subq %rbp, %rax
cmpq $11, %rax
ja .L3
Upon learning that cmpq
is really just a subtract and compare operation I have written some bench-marking code: http://coliru.stacked-crooked.com/a/ad869f69c8dbd96f I needed to use Coliru to be able to turn on optimization, but it keeps giving me bogus increments of my test count for times, I'm not sure what's going on there. What I can say is locally, the repeated iterator subtraction is always faster, sometimes significantly so. Upon learning this I believe that Ben Voigt's answer should be marked as the correct one.
EDIT:
I've made an interesting discovery. It's the algorithm that goes first that always looses. I've rewriten the code to swap the first algorithm on each pass. When this is done the integer modulo method always beats the iterator subtraction method as would be suspected by looking at the assembly, again something fishy is going on with Coliru, but you can take this code and run it locally: http://coliru.stacked-crooked.com/a/eb3e0c70cc138ecf
The next issue is that both of these algorithms are lazy; in the event that size(foo)
is a multiple of 3 they allocate an empty vector
at the end of the vector
. That requires significant branching for the integer modulo algorithm to remedy, but only the simplest of changes for the repeated iterator subtraction algorithm. The resulting algorithms exhibit effectively equal benchmark numbers but the edge goes to the repeated iterator subtraction for simplicity:
Integer modulo algorithm:
auto bar = cbegin(foo);
const auto size = distance(bar, cend(foo));
if (size <= 3) {
for_each(bar, cend(foo), [](const auto& i) { cout << i << '\t'; });
cout << endl;
}
else {
auto finish = prev(cend(testValues), (size - 1) % 3 + 1);
for (auto it = next(bar, 3); it != finish; bar = it, advance(it, 3)) {
for_each(bar, it, [](const auto& i) { cout << i << '\t'; });
cout << endl;
}
for_each(bar, finish, [](const auto& i) { cout << i << '\t'; });
cout << endl;
for_each(finish, cend(foo), [](const auto& i) { cout << i << '\t'; });
cout << endl;
}
Repeated iterator subtraction algorithm:
auto bar = cbegin(foo);
for (auto it = cbegin(foo); distance(it, cend(foo)) > 3; bar = it) {
advance(it, 3);
for_each(bar, it, [](const auto& i) { cout << i << '\t'; });
cout << endl;
}
for_each(bar, cend(foo), [](const auto& i) { cout << i << '\t'; });
cout << endl;
EDIT: Throwing the Remaining Size Algorithm into the hat
Both the Integer Modulo and Repeated Subtraction Algorithms above suffer from iterating over the input sequence more than once, other than being slower this isn't that serious because currently we're using a Bidirectional Iterator, but should our input iterator fail to qualify for Bidirectional Iterator this would be excessively expensive. Independent of iterator type the Remaining Size Algorithm beats all challengers every time at 10,000,000+ testbench iterations:
auto bar = cbegin(foo);
for (auto i = size(foo); i > STEP; i -= STEP) {
for(auto j = 0; j < STEP; ++j, ++bar) cout << *bar << '\t';
cout << endl;
}
for(auto i = 0; j < STEP; ++j, ++bar) cout << *bar << '\t';
cout << endl;
I've again copied my local testing to Coliru, which gives weird results but you can verify locally: http://coliru.stacked-crooked.com/a/361f238216cdbace
int
till I dereference it, right? – Lumbarlast + 10
would not be pointing to an invalid memory, but would not make sense, as let's saylast + 5
is the last memory address. So the standard says it's undefined behavior – Nairnint
. But we know that C++ has to be able to return anend
pointer which means that it cannot allocate space in the very last line of accessible memory. At a maximum it could only represent the allocate on the next to last line. Meaning that as long as my modulo number is smaller thansizeof(int*) * 8
there shouldn't be any undefined behavior. – Lumbarlast
there is no guarantee, Obtaining one pastlast
is undefined behavior. And I think you confuse things a bit. I don't see any relation tosizeof(int*) * 8
. I say again:last
can hypothetically be a pointer to the last addressable memory for an object type stored by the array. Going pastlast
is undefined behavior. – Nairnint
. There arepointers
anditerators
and then there isint
. The standard only guarantees that a pointer can be converted to anintptr_t
and then back again to a pointer and you get the original pointer value. It doesn't even make guarantees about theintptr_t
value. – Nairnint
here (or along
to be precise.) Pointers are simply a memory index that boils down to a unique number, hence we can compareint*
s just likeint
s, and that's what I'm doing here. – Lumbarint a; int b
it is undefined behavior to compare the pointers toa
andb
, i.e.&a < &b
is undefined behavior, even if aint *
is just an integer underneath. There are rules for integers and then there are rules for pointers. For better or worse that's how C++ was designed. – Nairn&a < &b
being undefined? – Lumbar