C strcpy() - evil?
Asked Answered
Z

17

22

Some people seem to think that C's strcpy() function is bad or evil. While I admit that it's usually better to use strncpy() in order to avoid buffer overflows, the following (an implementation of the strdup() function for those not lucky enough to have it) safely uses strcpy() and should never overflow:

char *strdup(const char *s1)
{
  char *s2 = malloc(strlen(s1)+1);
  if(s2 == NULL)
  {
    return NULL;
  }
  strcpy(s2, s1);
  return s2;
}

*s2 is guaranteed to have enough space to store *s1, and using strcpy() saves us from having to store the strlen() result in another function to use later as the unnecessary (in this case) length parameter to strncpy(). Yet some people write this function with strncpy(), or even memcpy(), which both require a length parameter. I would like to know what people think about this. If you think strcpy() is safe in certain situations, say so. If you have a good reason not to use strcpy() in this situation, please give it - I'd like to know why it might be better to use strncpy() or memcpy() in situations like this. If you think strcpy() is okay, but not here, please explain.

Basically, I just want to know why some people use memcpy() when others use strcpy() and still others use plain strncpy(). Is there any logic to preferring one over the three (disregarding the buffer checks of the first two)?

Zebada answered 4/3, 2009 at 11:54 Comment(3)
In multi-threaded environment it's rarely reasonable for every library function to handle locking itself.Divestiture
Since strlen may segfault, or return an unreasonably large value on an improperly terminated s1, your strdup is not safe.Messieurs
If 'malloc' changes 's1', then there is no guarantee that the buffer will be large enough when you later get around to copying into it with 'strcpy'. Suppose 's1' is a pointer to a string maintained internally by the memory management system -- maybe the last time 'malloc' was called.Kenaz
B
25

memcpy can be faster than strcpy and strncpy because it does not have to compare each copied byte with '\0', and because it already knows the length of the copied object. It can be implemented in a similar way with the Duff's device, or use assembler instructions that copy several bytes at a time, like movsw and movsd

Beset answered 4/3, 2009 at 17:8 Comment(1)
This is certainly the reason for for using memcpy here, it has nothing to do with safety. It would be an interesting performance analysis to determine if memcpy here is actually faster or not. Like you I assume it would be for most cases, but possibly for very small strings, strcpy may end up being faster.Sha
F
19

I'm following the rules in here. Let me quote from it

strncpy was initially introduced into the C library to deal with fixed-length name fields in structures such as directory entries. Such fields are not used in the same way as strings: the trailing null is unnecessary for a maximum-length field, and setting trailing bytes for shorter names to null assures efficient field-wise comparisons. strncpy is not by origin a ``bounded strcpy,'' and the Committee has preferred to recognize existing practice rather than alter the function to better suit it to such use.

For that reason, you will not get a trailing '\0' in a string if you hit the n not finding a '\0' from the source string so far. It's easy to misuse it (of course, if you know about that pitfall, you can avoid it). As the quote says, it wasn't designed as a bounded strcpy. And i would prefer not to use it if not necessary. In your case, clearly its use is not necessary and you proved it. Why then use it?

And generally speaking, programming code is also about reducing redundancy. If you know you have a string containing 'n' characters, why tell the copying function to copy maximal n characters? You do redundant checking. It's little about performance, but much more about consistent code. Readers will ask themselves what strcpy could do that could cross the n characters and which makes it necessary to limit the copying, just to read in manuals that this cannot happen in that case. And there the confusion start happen among readers of the code.

For the rational to use mem-, str- or strn-, i chose among them like in the above linked document:

mem- when i want to copy raw bytes, like bytes of a structure.

str- when copying a null terminated string - only when 100% no overflow could happen.

strn- when copying a null terminated string up to some length, filling the remaining bytes with zero. Probably not what i want in most cases. It's easy to forget the fact with the trailing zero-fill, but it's by design as the above quote explains. So, i would just code my own small loop that copies characters, adding a trailing '\0':

char * sstrcpy(char *dst, char const *src, size_t n) {
    char *ret = dst;
    while(n-- > 0) {
        if((*dst++ = *src++) == '\0')
            return ret;
    }
    *dst++ = '\0';
    return ret;
}

Just a few lines that do exactly what i want. If i wanted "raw speed" i can still look out for a portable and optimized implementation that does exactly this bounded strcpy job. As always, profile first and then mess with it.

Later, C got functions for working with wide characters, called wcs- and wcsn- (for C99). I would use them likewise.

Floatable answered 4/3, 2009 at 12:24 Comment(1)
There is no general pattern for strn* functions. The function strncpy is for use in cases where a string which may either be zero-padded or zero-terminated should be copied, zero-padded, to a buffer the same size as the maximum string length. No other function named strn* has any similar semantics. The strncat function is for use in cases where one doesn't know the length of the present string in the destination buffer, but knows that at least a certain amount of space remains within it. A totally different (and IMHO far less likely) situation.Brame
C
16

The reason why people use strncpy not strcpy is because strings are not always null terminated and it's very easy to overflow the buffer (the space you have allocated for the string with strcpy) and overwrite some unrelated bit of memory.

With strcpy this can happen, with strncpy this will never happen. That is why strcpy is considered unsafe. Evil might be a little strong.

Citronella answered 4/3, 2009 at 12:10 Comment(5)
strncpy is also dangerous because it does NOT guarantee that the destination string is 0-erminated! Better use strncpy_s, though I'm not sure whether these functions are MS specific.Moratorium
strncpy only keeps you safe if you pass in the right length. If you're strncpying to a dst which is not the start of a buffer, you still have to calculate the available space. This isn't fundamentally different from using strlen to check your strcpy will fit: it's still subtraction. But less code.Fourgon
@jn - strncpy_s is a proposed addition to ISO/ANSI C (ISO/IEC TR 24731-1) except that MS doesn't implement it quite this way.Unprofitable
"strings are not always null terminated" is a false statement. A string, by definition, is a sequence of characters that's null terminated. If you have a char buffer without a null terminator, it is by definition not a string.Marconigraph
Except that in the example we know the string is null-terminated, and that we have enough space. Using strncpy as a general rule is fine, but strcpy is safe in this case.Haematoma
D
11

Frankly, if you are doing much string handling in C, you should not ask yourself whether you should use strcpy or strncpy or memcpy. You should find or write a string library that provides a higher level abstraction. For example, one that keeps track of the length of each string, allocates memory for you, and provides all the string operations you need.

This will almost certainly guarantee you make very few of the kinds of mistakes usually associated with C string handling, such as buffer overflows, forgetting to terminate a string with a NUL byte, and so on.

The library might have functions such as these:

typedef struct MyString MyString;
MyString *mystring_new(const char *c_str);
MyString *mystring_new_from_buffer(const void *p, size_t len);
void mystring_free(MyString *s);
size_t mystring_len(MyString *s);
int mystring_char_at(MyString *s, size_t offset);
MyString *mystring_cat(MyString *s1, ...); /* NULL terminated list */
MyString *mystring_copy_substring(MyString *s, size_t start, size_t max_chars);
MyString *mystring_find(MyString *s, MyString *pattern);
size_t mystring_find_char(MyString *s, int c);
void mystring_copy_out(void *output, MyString *s, size_t max_chars);
int mystring_write_to_fd(int fd, MyString *s);
int mystring_write_to_file(FILE *f, MyString *s);

I wrote one for the Kannel project, see the gwlib/octstr.h file. It made life much simpler for us. On the other hand, such a library is fairly simple to write, so you might write one for yourself, even if only as an exercise.

Divestiture answered 4/3, 2009 at 12:52 Comment(5)
My current project doesn't make extensive use of strings, but this is a very good suggestion. I'll keep that in mind. And, if only for the learning experience, I'll probably be going the "roll-my-own" way.Zebada
-1 A library is not always suitable, and not related to the question.Nonfiction
+1. Programming is all about abstractions to handle the complexity, but a lot of C programmers seem to think that, because it is C, you have to stay as close to the bare metal as possible, even resisting to put some function call with repeated boilerplate code into a little wrapper function. This may work for some small toy programs or for some narrow parts where speed really matters (whatever). Any project of sufficient size will quickly end in memory management hell. Just because it's C it doesn't mean that you can't use principles of modern software engineering.Zincate
@Tomas: a library is the right thing to use for C applications that does string handling. Examples: qmail, postfix.Sane
@ninjalj: I wholeheartly agreeNonfiction
M
9

No one has mentioned strlcpy, developed by Todd C. Miller and Theo de Raadt. As they say in their paper:

The most common misconception is that strncpy() NUL-terminates the destination string. This is only true, however, if length of the source string is less than the size parameter. This can be problematic when copying user input that may be of arbitrary length into a fixed size buffer. The safest way to use strncpy() in this situation is to pass it one less than the size of the destination string, and then terminate the string by hand. That way you are guaranteed to always have a NUL-terminated destination string.

There are counter-arguments for the use of strlcpy; the Wikipedia page makes note that

Drepper argues that strlcpy and strlcat make truncation errors easier for a programmer to ignore and thus can introduce more bugs than they remove.*

However, I believe that this just forces people that know what they're doing to add a manual NULL termination, in addition to a manual adjustment to the argument to strncpy. Use of strlcpy makes it much easier to avoid buffer overruns because you failed to NULL terminate your buffer.

Also note that the lack of strlcpy in glibc or Microsoft's libraries should not be a barrier to use; you can find the source for strlcpy and friends in any BSD distribution, and the license is likely friendly to your commercial/non-commercial project. See the comment at the top of strlcpy.c.

Meunier answered 5/3, 2009 at 5:10 Comment(1)
As an OS X developer, I have all these fun functions like strdup() and strcpy(), but it looks fairly easy to write oneself in a pinch.Zebada
P
8

I personally am of the mindset that if the code can be proven to be valid—and done so quickly—it is perfectly acceptable. That is, if the code is simple and thus obviously correct, then it is fine.

However, your assumption seems to be that while your function is executing, no other thread will modify the string pointed to by s1. What happens if this function is interrupted after successful memory allocation (and thus the call to strlen), the string grows, and bam you have a buffer overflow condition since strcpy copies to the NULL byte.

The following might be better:

char *
strdup(const char *s1) {
  int s1_len = strlen(s1);
  char *s2 = malloc(s1_len+1);
  if(s2 == NULL) {
    return NULL;
  }

  strncpy(s2, s1, s1_len);
  return s2;
}

Now, the string can grow through no fault of your own and you're safe. The result will not be a dup, but it won't be any crazy overflows, either.

The probability of the code you provided actually being a bug is pretty low (pretty close to non-existent, if not non-existent, if you are working in an environment that has no support for threading whatsoever). It's just something to think about.

ETA: Here is a slightly better implementation:

char *
strdup(const char *s1, int *retnum) {
  int s1_len = strlen(s1);
  char *s2 = malloc(s1_len+1);
  if(s2 == NULL) {
    return NULL;
  }

  strncpy(s2, s1, s1_len);
  retnum = s1_len;
  return s2;
}

There the number of characters is being returned. You can also:

char *
strdup(const char *s1) {
  int s1_len = strlen(s1);
  char *s2 = malloc(s1_len+1);
  if(s2 == NULL) {
    return NULL;
  }

  strncpy(s2, s1, s1_len);
  s2[s1_len+1] = '\0';
  return s2;
}

Which will terminate it with a NUL byte. Either way is better than the one that I quickly put together originally.

Pirri answered 4/3, 2009 at 12:6 Comment(17)
Not having worked with threads, I didn't think about this, but I'm glad to know there is a rational reason for this. How often do functions in one thread modify variables while other threads are working with them in threaded programs? Or is that a stupid question?Zebada
You shouldn't be accessing any buffer that other threads modify without locking it first. Using strncpy does not make your function thread safe.Reni
'What happens if this function is interrupted after successful memory allocation (and thus the call to strlen), the string grows, and bam you have a buffer overflow condition since strcpy copies to the NULL byte.' well...Floatable
I can't say how much it happens, though I've had trouble with bugs related to threading and pieces of memory being moved around like this. It's just a defensive maneuver. Upside is that it won't happen (ever) if your function uses only thread-local storage. But can you ensure that? Probably not.Pirri
"what happens if another thread calls "free" in the memory block that is pointed to by s1?" your code will be equally broken. i think that's not a good argument. he wrote his code not with multiple threads in mind, and you always have to make some assumptions based on guarantees you have.Floatable
@ssg: True enough, though it is very common for that to not happen, particularly in older C code.Pirri
Keep in mind that everything that starts as a very unlikely race condition can easily end up as an aggressive exploit. The probability of this becoming a bug in a well-written program is very low, but it's still a security problem.Plowboy
@litb Definitely. It's not completely bullet-proof, of course. Nothing really is, though: there is always a corner case that will break code.Pirri
Please don't think that single-threaded code is safe from this potential issue. A signal handler (or other platform-specific interrupt) could just as easily modify the contents of s1.Fourgon
If s1 is modified outside the function, you're done anyway. Using strncpy won't change anything. In particular, s2 may not be null terminated if s1 length changes, which only make the problem worse (more difficult to debug). There is almost no reason to use strncpy in your code.Dilator
Not to mention that this code shows the classic bug with strncpy(): forgetting to explicitly set NUL in the last byteSubclavius
As for other code modifying the string, freeing the memory, and so on: while these will lead to invalid results, they won't necessarily crash the program (unless the freed memory is released from the virtual memory map)Subclavius
@kdgregory: agreed. Having a string with unspecified value is less bad than overwriting a buffer with undefined results. David is right that "you're done anyway", but you have a better chance of only being DoS done, rather than arbitrary code execution done.Fourgon
You can still have buffer overflow with strncpy: the code above using strncpy may not be NULL terminated (actually, it is never NULL terminated - it happens that when compiling the above on my platform, the buffer from malloc is fill with '\0' - if it weren't, s2 would have not been NULL terminated)Dilator
Bear in mind that only the last of your three strncmp examples is safe, assuming s1 can be modified while the function is running. It's the only one that will null-terminate s2 if s1 grows. If it takes three tries to get something right, and the first two look good, there's a problem.Haematoma
@David: There are plenty of times in C where you do not NULL terminate an array, but instead pass its length back. You can sometimes do both. They are both safe, it just depends on the convention that you follow in the calling functions.Pirri
An array, yes, a string, I am not so sure: that's the definition of a C string after all (to be NULL terminated). In the given example, since the string is returned, I think it is fair to expect it to be null terminated.Dilator
C
5

I agree. I would recommend against strncpy() though, since it will always pad your output to the indicated length. This is some historical decision, which I think was really unfortunate as it seriously worsens the performance.

Consider code like this:

char buf[128];
strncpy(buf, "foo", sizeof buf);

This will not write the expected four characters to buf, but will instead write "foo" followed by 125 zero characters. If you're for instance collecting a lot of short strings, this will mean your actual performance is far worse than expected.

If available, I prefer to use snprintf(), writing the above like:

snprintf(buf, sizeof buf, "foo");

If instead copying a non-constant string, it's done like this:

snprintf(buf, sizeof buf, "%s", input);

This is important, since if input contains % characters snprintf() would interpret them, opening up whole shelvefuls of cans of worms.

Crumple answered 4/3, 2009 at 12:15 Comment(3)
strncpy was designed to fill the filename fields in directory entries in Really Ancient Unix (think 1970s), which were up to 14 characters and zero padded if shorter. The padding was important to prevent information leaking at the end of the buffer. This justifies strncpy's design.Divestiture
It's also useful when debugging. If you have guard pages after buffers, then strncpy's 0-fill ensures that even if you do pass in the wrong length (well, subject to alignment rounding), you trap immediately, rather than only trapping if the src string is long enough.Fourgon
How much overhead does snprintf's parsing of its format string add?Grath
F
5

I think strncpy is evil too.

To truly protect yourself from programming errors of this kind, you need to make it impossible to write code that (a) looks OK, and (b) overruns a buffer.

This means you need a real string abstraction, which stores the buffer and capacity opaquely, binds them together, forever, and checks bounds. Otherwise, you end up passing strings and their capacities all over the shop. Once you get to real string ops, like modifying the middle of a string, it's almost as easy to pass the wrong length into strncpy (and especially strncat), as it is to call strcpy with a too-small destination.

Of course you might still ask whether to use strncpy or strcpy in implementing that abstraction: strncpy is safer there provided you fully grok what it does. But in string-handling application code, relying on strncpy to prevent buffer overflows is like wearing half a condom.

So, your strdup-replacement might look something like this (order of definitions changed to keep you in suspense):

string *string_dup(const string *s1) {
    string *s2 = string_alloc(string_len(s1));
    if (s2 != NULL) {
        string_set(s2,s1);
    }
    return s2;
}

static inline size_t string_len(const string *s) {
    return strlen(s->data);
}

static inline void string_set(string *dest, const string *src) {
    // potential (but unlikely) performance issue: strncpy 0-fills dest,
    // even if the src is very short. We may wish to optimise
    // by switching to memcpy later. But strncpy is better here than
    // strcpy, because it means we can use string_set even when
    // the length of src is unknown.
    strncpy(dest->data, src->data, dest->capacity);
}

string *string_alloc(size_t maxlen) {
    if (maxlen > SIZE_MAX - sizeof(string) - 1) return NULL;
    string *self = malloc(sizeof(string) + maxlen + 1);
    if (self != NULL) {
        // empty string
        self->data[0] = '\0';
        // strncpy doesn't NUL-terminate if it prevents overflow, 
        // so exclude the NUL-terminator from the capacity, set it now,
        // and it can never be overwritten.
        self->capacity = maxlen;
        self->data[maxlen] = '\0';
    }
    return self;
}

typedef struct string {
    size_t capacity;
    char data[0];
} string;

The problem with these string abstractions is that nobody can ever agree on one (for instance whether strncpy's idiosyncrasies mentioned in comments above are good or bad, whether you need immutable and/or copy-on-write strings that share buffers when you create a substring, etc). So although in theory you should just take one off the shelf, you can end up with one per project.

Fourgon answered 4/3, 2009 at 13:14 Comment(0)
J
4

I'd tend to use memcpy if I have already calculated the length, although strcpy is usually optimised to work on machine words, it feels that you should provide the library with as much information as you can, so it can use the most optimal copying mechanism.

But for the example you give, it doesn't matter - if it's going to fail, it will be in the initial strlen, so strncpy doesn't buy you anything in terms of safety (and presumbly strncpy is slower as it has to both check bounds and for nul), and any difference between memcpy and strcpy isn't worth changing code for speculatively.

Jacks answered 4/3, 2009 at 12:1 Comment(0)
C
4

The evil comes when people use it like this (although the below is super simplified):

void BadFunction(char *input)
{
    char buffer[1024]; //surely this will **always** be enough

    strcpy(buffer, input);

    ...
}

Which is a situation that happens suprising often.

But yeah, strcpy is as good as strncpy in any situation where you are allocating memory for the destination buffer and have already used strlen to find the length.

Compeer answered 4/3, 2009 at 12:7 Comment(0)
D
1

strlen finds upto last null terminating place.

But in reality buffers are not null terminated.

that's why people use different functions.

Dagda answered 4/3, 2009 at 11:59 Comment(1)
strlen() computes the length of string which always has a terminating null character. To use strlen() on something else like a non-null terminated array of char or strlen(3.14159) is simple bad code. Of course a good compiler will flag the 2nd one.Inerasable
B
0

Well, strcpy() is not as evil as strdup() - at least strcpy() is part of Standard C.

Bigmouth answered 4/3, 2009 at 12:16 Comment(1)
and these is strdupa() as well :-)Beset
D
0

In the situation you describe, strcpy is a good choice. This strdup will only get into trouble if the s1 was not ended with a '\0'.

I would add a comment indicating why there are no problems with strcpy, to prevent others (and yourself one year from now) wondering about its correctness for too long.

strncpy often seems safe, but may get you into trouble. If the source "string" is shorter than count, it pads the target with '\0' until it reaches count. That may be bad for performance. If the source string is longer than count, strncpy does not append a '\0' to the target. That is bound to get you into trouble later on when you expect a '\0' terminated "string". So strncpy should also be used with caution!

I would only use memcpy if I was not working with '\0' terminated strings, but that seems to be a matter of taste.

Dashiell answered 4/3, 2009 at 12:49 Comment(0)
A
0
char* dupstr(char* str)
{
   int full_len; // includes null terminator
   char* ret;
   char* s = str;

#ifdef _DEBUG
   if (! str)
      toss("arg 1 null", __WHENCE__);
#endif

   full_len = strlen(s) + 1;
   if (! (ret = (char*) malloc(full_len)))
      toss("out of memory", __WHENCE__);
   memcpy(ret, s, full_len); // already know len, so strcpy() would be slower

   return ret;
}
Amii answered 9/5, 2009 at 6:48 Comment(0)
B
0
char *strdup(const char *s1)
{
  char *s2 = malloc(strlen(s1)+1);
  if(s2 == NULL)
  {
    return NULL;
  }
  strcpy(s2, s1);
  return s2;
}

Problems:

  1. s1 is unterminated, strlen causes the access of unallocated memory, program crashes.
  2. s1 is unterminated, strlen while not causing the access of unallocated memory access memory from another part of your application. It's returned to the user (security issue) or parsed by another part of your program (heisenbug appears).
  3. s1 is unterminated, strlen results in a malloc which the system can't satisfy, returns NULL. strcpy is passed NULL, program crashes.
  4. s1 is unterminated, strlen results in a malloc which is very large, system allocs far too much memory to perform the task at hand, becomes unstable.
  5. In the best case the code is inefficient, strlen requires access to every element in the string.

There are probably other problems... Look, null termination isn't always a bad idea. There are situations where, for computational efficiency, or to reduce storage requirements it makes sense.

For writing general purpose code, e.g. business logic does it make sense? No.

Burgos answered 27/9, 2011 at 22:7 Comment(1)
Your answer doesn't make sense. Either you're assuming nul-terminated strings for your application, or you should be using a string library (even if it's just a quickly put-together struct { size_t len; char str[]; } and a few functions to work with them). Why should business logic be at all related to how your code handles strings? If nul-termination is a danger, it's a danger for every str* standard library function.Zebada
I
0

This answer uses size_t and memcpy() for a fast and simple strdup().

Best to use type size_t as that is the type returned from strlen() and used by malloc() and memcpy(). int is not the proper type for these operations.

memcpy() is rarely slower than strcpy() or strncpy() and often significantly faster.

// Assumption: `s1` points to a C string.
char *strdup(const char *s1) {
  size_t size = strlen(s1) + 1;
  char *s2 = malloc(size);
  if(s2 != NULL) {
    memcpy(s2, s1, size);
  }
  return s2;
} 

§7.1.1 1 "A string is a contiguous sequence of characters terminated by and including the first null character. ..."

Inerasable answered 21/4, 2014 at 17:2 Comment(0)
F
-1

Your code is terribly inefficient because it runs through the string twice to copy it.

Once in strlen().

Then again in strcpy().

And you don't check s1 for NULL.

Storing the length in some additional variable costs you about nothing, while running through each and every string twice to copy it is a cardinal sin.

Farrier answered 4/3, 2009 at 13:54 Comment(4)
Since the function doesn't get told how long the string is, how do you avoid the double traversal? AFAICS, there isn't a way, so "terribly inefficient" is inaccurate.Aquinas
agree, it is inefficient. If you pass the already known length to memcpy(), you remove second scan of the string for '\0'Beset
In C if there is any chance to do so, you should cache a once determined string length (if the string has not been modified in the meantime)Farrier
And that's why you should use pascal-style strings - struct { size_t len; char str[]; }Abase

© 2022 - 2024 — McMap. All rights reserved.