C memcpy in reverse
Asked Answered
I

2

19

I am working with audio data. I'd like to play the sample file in reverse. The data is stored as unsigned ints and packed nice and tight. Is there a way to call memcpy that will copy in reverse order. i.e. if I had 1,2,3,4 stored in an array, could I call memcpy and magically reverse them so I get 4,3,2,1.

Intermeddle answered 11/2, 2010 at 5:36 Comment(7)
C doesn't have a function like that, but it's very easy to write one.Ardenia
It might be worth considering changing how you iterate over the data rather than changing the order, I suspect it'd be more efficient...Gardant
You've got me stumped with magically.Gibberish
Mark, i dont iterate, i copy buffers in chunks. Craig - computers are magic right? :)Intermeddle
Computers might be way cool, which is a rough approximation of magic for some purposes I guess. :-)Gibberish
did you notice the magic tag? i didnt even add it...Intermeddle
Do you have a cpu architecture and buffer size in mind? They could affect the algorithm. For example, x86 with 128-bit buffers is a little different than x64 with 128-bit buffers. And ARM with NEON could use a different algorithm than x86/x64. Are the buffers unique, or can you operate in-place? A 128-bit buffer is slightly different than a 256-bit or 512-bit buffer on x86/x64 due to AVX. And a fixed buffer is different than a unknown or arbitrary buffer size.Dunseath
A
11

This works for copying ints in reverse:

void reverse_intcpy(int *restrict dst, const int *restrict src, size_t n)
{
    size_t i;

    for (i=0; i < n; ++i)
        dst[n-1-i] = src[i];

}

Just like memcpy(), the regions pointed-to by dst and src must not overlap.

If you want to reverse in-place:

void reverse_ints(int *data, size_t n)
{
    size_t i;

    for (i=0; i < n/2; ++i) {
        int tmp = data[i];
        data[i] = data[n - 1 - i];
        data[n - 1 - i] = tmp;
    }
}

Both the functions above are portable. You might be able to make them faster by using hardware-specific code.

(I haven't tested the code for correctness.)

Ardenia answered 11/2, 2010 at 5:43 Comment(18)
whats the effeciency of this in comparison with memcpy?Intermeddle
memcpy should be O(n) and so should this reverse_memcpy function.Nary
With my quick testing, with -O3 optimization, reverse_memcpy() is about 3 times slower than memcpy() for copying 1000000 bytes. For 10000 iterations with 1000000 bytes, memcpy() took 4 seconds, and reverse_memcpy() took 11. But these numbers are for a very specific case, so you may want to test things for yourself. Of course, as dreamlax said, both are O(n).Ardenia
It's not correct either; if the original sample data is in unsigned ints, this will also byte-reverse it.Trophoplasm
The slowdown is probably because your reads and writes aren't aligned. memcpy probably works faster by copying aligned chunks of memory (when it can).Nary
The above is plain broken. By the time the loop reaches half way it will have overwritten the second half with the "reversed" first half. 1,2,3,4 wont become 4,3,2,1 but will actually be 1,2,2,1. One needs to add a read from the end and a read from the beginning , swap and then write the values back.Mythological
@mP - memcpy will have the same result, as it assumes the two pointers do not overlap and is optimized on that assumption. The standard library memmove is used in the specific case where the two pointers may point to overlapping space. If the OP wants memmove-like behavior, he should specify it. If the OP wants reverse-in-place behavior, he should specify that as well, as that case can possibly be optimized.Confidence
@mP: I realized that the OP might want to reverse in-place. I was editing my post when you made the comment. Agree with the your point completely. The "first edition" of my answer had a warning about the need for non-overlapping regions and the use of restrict keyword in C99...Ardenia
@Alok: I noticed you edited the answer, but it appears as if the OP actually does want a solution where the buffer is both copied and reversed.Nary
@dreamlax: yes, that is how I read the question too, but I edited it anyway just in case. Thanks for your comments.Ardenia
@Aran, memcpy and these reverse functions perform different operations, thus a comparison of efficiencies is quite meaningless. Either you need it to be reversed, then you have to write your own function and can compare different implementations of it when you find it to be too slow. Or you don't need it reversed, then simply use memcpy or memmove.Subdivision
The first version would benefit a lot from restricting the arguments.Ditch
@AlexandreC. agreed. Particularly since it doesn't work if the arguments overlap.Ardenia
You can simplify the in-place code with two variables: void reverse_ints(int *data, size_t n) { if (n < 2) return; for (size_t i=0, j=n-1; i < j; ++i, --j) { int tmp = data[i]; data[i] = data[j]; data[j] = tmp; } }Wentzel
@JonathanLeffler, sure. There are many ways to write that loop.Ardenia
I've never thought much about measuring a memcpy implementation's performance by using time-complexity. It's not the right tool for the job. We can pretty much all agree that O(n) will always be the case, unless something horribly went wrong! Doing side-by-side benchmarks with various data sets under equal conditions is the way to go.Lohner
I want something with the efficiency of memcpy. Keeping memory caches, SSE, and such in mind, this solution is going to be slow. And as Nuzzolilo says, big-O isn't the question here.Trager
@Nuzzolilo - Concerning benchmarks, definitely agree. I'm searching for a portable pattern that compilers like Clang, GCC, ICC and MSVC recognize to generate architecture specific code, like SSE2 (128-bit buffers), NEON (128-bit buffers), AVX (256-bit buffers) and AVX-512 (512-bit buffers).Dunseath
A
13

No, memcpy won't do that backwards. If you're working in C, write a function to do it. If you're really working in C++ use std::reverse or std::reverse_copy.

Amino answered 11/2, 2010 at 5:40 Comment(1)
I know this is old, but could you post an example of a function that does this? It's simple, but it might help someone.Rhee
A
11

This works for copying ints in reverse:

void reverse_intcpy(int *restrict dst, const int *restrict src, size_t n)
{
    size_t i;

    for (i=0; i < n; ++i)
        dst[n-1-i] = src[i];

}

Just like memcpy(), the regions pointed-to by dst and src must not overlap.

If you want to reverse in-place:

void reverse_ints(int *data, size_t n)
{
    size_t i;

    for (i=0; i < n/2; ++i) {
        int tmp = data[i];
        data[i] = data[n - 1 - i];
        data[n - 1 - i] = tmp;
    }
}

Both the functions above are portable. You might be able to make them faster by using hardware-specific code.

(I haven't tested the code for correctness.)

Ardenia answered 11/2, 2010 at 5:43 Comment(18)
whats the effeciency of this in comparison with memcpy?Intermeddle
memcpy should be O(n) and so should this reverse_memcpy function.Nary
With my quick testing, with -O3 optimization, reverse_memcpy() is about 3 times slower than memcpy() for copying 1000000 bytes. For 10000 iterations with 1000000 bytes, memcpy() took 4 seconds, and reverse_memcpy() took 11. But these numbers are for a very specific case, so you may want to test things for yourself. Of course, as dreamlax said, both are O(n).Ardenia
It's not correct either; if the original sample data is in unsigned ints, this will also byte-reverse it.Trophoplasm
The slowdown is probably because your reads and writes aren't aligned. memcpy probably works faster by copying aligned chunks of memory (when it can).Nary
The above is plain broken. By the time the loop reaches half way it will have overwritten the second half with the "reversed" first half. 1,2,3,4 wont become 4,3,2,1 but will actually be 1,2,2,1. One needs to add a read from the end and a read from the beginning , swap and then write the values back.Mythological
@mP - memcpy will have the same result, as it assumes the two pointers do not overlap and is optimized on that assumption. The standard library memmove is used in the specific case where the two pointers may point to overlapping space. If the OP wants memmove-like behavior, he should specify it. If the OP wants reverse-in-place behavior, he should specify that as well, as that case can possibly be optimized.Confidence
@mP: I realized that the OP might want to reverse in-place. I was editing my post when you made the comment. Agree with the your point completely. The "first edition" of my answer had a warning about the need for non-overlapping regions and the use of restrict keyword in C99...Ardenia
@Alok: I noticed you edited the answer, but it appears as if the OP actually does want a solution where the buffer is both copied and reversed.Nary
@dreamlax: yes, that is how I read the question too, but I edited it anyway just in case. Thanks for your comments.Ardenia
@Aran, memcpy and these reverse functions perform different operations, thus a comparison of efficiencies is quite meaningless. Either you need it to be reversed, then you have to write your own function and can compare different implementations of it when you find it to be too slow. Or you don't need it reversed, then simply use memcpy or memmove.Subdivision
The first version would benefit a lot from restricting the arguments.Ditch
@AlexandreC. agreed. Particularly since it doesn't work if the arguments overlap.Ardenia
You can simplify the in-place code with two variables: void reverse_ints(int *data, size_t n) { if (n < 2) return; for (size_t i=0, j=n-1; i < j; ++i, --j) { int tmp = data[i]; data[i] = data[j]; data[j] = tmp; } }Wentzel
@JonathanLeffler, sure. There are many ways to write that loop.Ardenia
I've never thought much about measuring a memcpy implementation's performance by using time-complexity. It's not the right tool for the job. We can pretty much all agree that O(n) will always be the case, unless something horribly went wrong! Doing side-by-side benchmarks with various data sets under equal conditions is the way to go.Lohner
I want something with the efficiency of memcpy. Keeping memory caches, SSE, and such in mind, this solution is going to be slow. And as Nuzzolilo says, big-O isn't the question here.Trager
@Nuzzolilo - Concerning benchmarks, definitely agree. I'm searching for a portable pattern that compilers like Clang, GCC, ICC and MSVC recognize to generate architecture specific code, like SSE2 (128-bit buffers), NEON (128-bit buffers), AVX (256-bit buffers) and AVX-512 (512-bit buffers).Dunseath

© 2022 - 2024 — McMap. All rights reserved.