If you know the size of the data to be copied, then memcpy()
should be as fast or faster than strcpy()
. Otherwise, memcpy()
can't be used alone, and strcpy()
should be as fast or faster than strlen()
followed by memcpy()
.
However...
A lot of implementations of memcpy()
and/or strcpy()
and/or strlen()
are designed to efficiently handle large amounts of data. This often means additional startup overhead (e.g. determining alignment, setting up SIMD, cache management, etc) and makes these implementations bad (slow) for copying small amounts of data (which is far more likely in well written code). Because of this, "should be as fast or faster" does not necessarily imply "is as fast or faster". For example, for small amounts of data an memcpy()
optimised for large amounts of data may be significantly slower than a strcpy()
that wasn't optimised for large amounts of data.
Also note that the main problem here is that generic code (e.g. memcpy()
and strcpy()
) can't be optimised for a specific case. The best solution would have been to have multiple functions - e.g. memcpy_small()
that's optimised for copying small amounts of data and memcpy_large()
that's optimised for bad code that failed avoid copying a large amount of data.
memcpy
from many years ago broke the copy down into three sections, an unaligned prefix, the main body, and an unaligned suffix. Which is to say that alignment issues are transparent to the user, and the bulk of the copy (the main body) is done at maximum aligned speed using full sized (e.g. 32bit) transfers. – Petualoop
is one of the slowest ways, no one uses it anymore. Old libc implementations userep movsb
whereas newer ones use SIMD to speedup. Why are complicated memcpy/memset superior? – Onstage