How is strcpy implemented?
Asked Answered
L

2

7

I have a question about using strcpy. I know the ANSI C standard says : source and destination must not overlap, otherwise the behaviour is unpredictable. I show you a piece of code that works as I expect if it is compiled using an old gnu C compiler under Linux.

#include <string.h>
#include <stdio.h>

char S[80],*P;

int main() {
    strcpy(S,"abcdefghi\r\njklmnopqr\r\nstuvwxyz\r\n");
    for (P=S; P=strchr(P,'\r'); P++) strcpy(P,P+1);
    printf("%s\n",S);
    return 0;
}

This sequence removes every \r (carriage return) from the input string. I know (from Kernigham and Ritchie) that a very simple implementation for strcpy is the following

while (*t++=*s++) ;

Now I compiled my program using gcc (Gentoo 4.5.4 p1.0, pie-0.4.7) 4.5.4 and it prints this:

abcdefghi
jklmnpqr          <-- missing 'o'
stuvwxxyz         <-- doubled 'x'

I suppose this compiler (in fact its library) uses a very sophisticated sequence for strcpy, and I don't understand the reason.

Lodhia answered 17/10, 2012 at 13:25 Comment(7)
Heh, @jsalonen beat me to the editRaines
you could see the implementation by finding the .asm file in your system.Obolus
It probably uses optimizations that copy bigger (multi-byte) chunks. A common technique is to cast the pointers to the longest available integer unit (like long long *) and copy that. This means that the copy overwrite what is being copied.Loginov
The bizzare result is abcdefghi then jklmnpqr then stuvwxxyz . On the second line o is missing and on the third line x is doubled.Lodhia
I viewed S using gdb (gnu debugger) : every '\r' is dropped and S is "abcdefghi\njklmnpqr\nstuvwxxyz\n"Lodhia
I should mention you don't actually ask a question here.Ketene
Valgrind warns about this error. See #4824164Behoof
K
13

You were warned not to do that. The reason is that a byte-for-byte copy is actually quite slow and requires a lot of looping to get through a string. The compiler can optimize this easily (for instance, by copying an int-sized chunk at a time, or using some platform-specific parallellization.)

But if the strings overlap, then those optimizations make assumptions about your data that are no longer valid. As a result, they give you unspecified results. It is likely your older GCC simply didn't do any such optimizations.

Since the documentation for strcpy() says not to use overlapping strings, don't.

Ketene answered 17/10, 2012 at 13:29 Comment(0)
J
2

The best way to figure out what your implementation is doing is of course to read the source of its library.

If the source is not available, the next best choice might be to read the generated assembly code that the compiler generates.

You can also look at "serious" open source implementations of the library, and maybe draw some conclusions from that.

Onea idea might be that the library is copying data in larger chunks than a character at a time, which breaks when you violate the design assumptions.

Juline answered 17/10, 2012 at 13:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.