Why does this implementation of strlen() work?

Asked 1/12, 2013 at 12:58 Answered 1/12, 2013 at 13:36

_{(Disclaimer: I've seen this question, and I am not re-asking it -- I am interested in why the code works, and not in how it works.)}

So here's this implementation of Apple's (well, FreeBSD's) strlen(). It uses a well-known optimization trick, namely it checks 4 or 8 bytes at once, instead of doing a byte-by-byte comparison to 0:

size_t strlen(const char *str)
{
    const char *p;
    const unsigned long *lp;

    /* Skip the first few bytes until we have an aligned p */
    for (p = str; (uintptr_t)p & LONGPTR_MASK; p++)
        if (*p == '\0')
            return (p - str);

    /* Scan the rest of the string using word sized operation */
    for (lp = (const unsigned long *)p; ; lp++)
        if ((*lp - mask01) & mask80) {
        p = (const char *)(lp);
        testbyte(0);
        testbyte(1);
        testbyte(2);
        testbyte(3);
#if (LONG_BIT >= 64)
        testbyte(4);
        testbyte(5);
        testbyte(6);
        testbyte(7);
#endif
    }

    /* NOTREACHED */
    return (0);
}

Now my question is: maybe I'm missing the obvious, but can't this read past the end of a string? What if we have a string of which the length is not divisible by the word size? Imagine the following scenario:

|<---------------- all your memories are belong to us --------------->|<-- not our memory -->
+-------------+-------------+-------------+-------------+-------------+ - -
|     'A'     |     'B'     |     'C'     |     'D'     |      0      |
+-------------+-------------+-------------+-------------+-------------+ - -
^                                                      ^^
|                                                      ||
+------------------------------------------------------++-------------- - -
                       long word #1                      long word #2

When the second long word is read, the program accesses bytes that it shouldn't in fact be accessing... isn't this wrong? I'm pretty confident that Apple and the BSD folks know what they are doing, so could someone please explain why this is correct?

One thing I've noticed is that beerboy asserted this to be undefined behavior, and I also believe it indeed is, but he was told that it isn't, because "we align to word size with the initial for loop" (not shown here). However, I don't see at all why alignment would be any relevant if the array is not long enough and we are reading past its end.

Salinometer answered 1/12, 2013 at 12:58 Comment(3)

Note: Aside from the potential UB well answered, the method's "5.2 times as fast" by taking advantage that char is typically ASCII (codes 0 - 127) is itself noteworthy. – Henning 1/12, 2013 at 22:13

As a general principle the code that appears in a standard library implementation doesn't exist in a standard C environment. Sometimes this is to the "benefit" of the code -- it can rely on implementation-specific guarantees that the standard doesn't offer, as in this case. Sometimes it's to the "detriment" of the code -- for example there might be an obvious implementation of X in terms of Y that you can't do because you already chose the obvious implementation of Y in terms of X. – Rella 19/8, 2014 at 22:42

For x86, duplicate of Is it safe to read past the end of a buffer within the same page on x86 and x64? – Casimiracasimire 30/9, 2020 at 11:51

Although this is technically undefined behavior, in practice no native architecture checks for out-of-bounds memory access at a finer granularity than the size of a word. So while garbage past the terminator may end up being read, the result will not be a crash.

Kevon answered 1/12, 2013 at 13:4 Comment(15)

I gues it might crash if it happens to be at the boundary of a guard page? – Gradient 1/12, 2013 at 13:7

No, because every word which is read is guaranteed to be at least partially in the array, and hence not trapped. – Kevon 1/12, 2013 at 13:9

There is a school of thinking that asserts that any operation merely touching random bytes leads into an undefined state. However, the mere act of reading data is not UB -- and the random data that gets accessed "after" the end of the string gets never processed. – Stanleigh 1/12, 2013 at 13:17

@Stanleigh In fact, reading uninitialized variables, for example, is in itself undefined behavior, so is accessing an array out of bounds. – Salinometer 1/12, 2013 at 14:54

@H2CO3: I don't think it is. Consider if (x != 0 && y/x) -- dividing by zero is bad but it is never done. Reading a memory-wise valid but undefined byte and never doing anything with it sounds okay to me. I agree this philosophically touches upon the sound of a falling tree. Maybe we should call it "quantum behavior". – Stanleigh 1/12, 2013 at 15:0

@Stanleigh What "sounds OK" to you needn't agree with what's in the Standard. And the Standard says it's UB, so it's UB, period. – Salinometer 1/12, 2013 at 15:10

@H2CO3: that's rather worrisome, I use strlen all the time! What should I use instead? – Stanleigh 1/12, 2013 at 15:13

@Devolus: notice that the code starts reading byte by byte until it reaches a word-aligned address. Then, only words containing at least some part of the array are read. Though, it's easy to blunder and read one word too much: #18525008 – Brazee 1/12, 2013 at 15:14

@Stanleigh strlen(). I don't doubt it works on actual, specific implementations, despite being UB, but that is only the case because the writers of the standard library have intimate knowledge of the platform and/or the compiler they are using. In a sense, it can be OK for library/other implementation code to rely on certain aspects of UB, but it is never OK for user code to do so. – Salinometer 1/12, 2013 at 15:16

@Jongware: the implementation has additional guarantess that the standard doesn't mandate, and abuses them in their internal implementation of strlen(). You don't care how strlen() is implemented internally, you use the guarantees the standard promises you abot strlen(). – Brazee 1/12, 2013 at 15:17

@H2CO3: can we agree, then, that in the case under discussion -- this implementation of strlen -- UB is not triggered because it relies on the fact that any bytes in the UB DangerZone are never tested? – Stanleigh 1/12, 2013 at 15:21

@Stanleigh I think yes... or, more precisely, one can tell this if one considers the actual, low-level implementation details of the actual OS, compiler, etc., and it's still UB according to the standard, but here the undefined-ness, thanks to the implementation, results in correct working. – Salinometer 1/12, 2013 at 15:24

@Jongware: it is UB: you cannot take that code and move it to an arbitrary architecture, most notably to a !MMU (non-paged based) architecture, where a string may end at the end of the data segment which may have a byte-granularity limit set at the MPU. – Brazee 1/12, 2013 at 15:25

@ninjalj: I get your point, but this is only FreeBSD's version of strlen. I suppose any architectures or compilers where this problem might happen supply their own, adjusted, libraries. (And regularly have to explain "no, this one trick will not work on our system".) – Stanleigh 1/12, 2013 at 15:34

That's why FreeBSD developers use very specific compilers to build FreeBSD: so that all "UB"-parts actually has (implementation-)defined behaviour they can rely on. – Ergonomics 1/12, 2013 at 17:5

I don't see at all why alignment would be any relevant if the array is not long enough and we are reading past its end.

The routine starts with aligning to a word boundary for two reasons: first, reading words from an aligned address is faster on most architectures (and it's also mandatory on a few CPUs). The speed increase is enough to use the same trick in a host of similar operations: memcpy, strcpy, memmove, memchr, etc.

Second: if you continue reading words starting at a word boundary, you are assured the rest of the string resides in the same memory page. A string (including its terminating zero) cannot straddle a memory page boundary, and neither can reading a word. (1)

So this is always fastest and safest, even if the memory page granularity is sizeof(LONG_BIT) (which it isn't).

Picking up an entire word near the end of a string may pick up additional bytes after the final zero, but reading Undefined Bytes from valid memory is not UB -- only acting upon its contents is (2). If the word contains a zero terminator anywhere inside, the individual bytes are inspected with test_byte, and this, as is shown in the original source, will never act on bytes after the terminator.

(1) Obviously they can, but I meant "never into a locked page" or something similar.

(2) Under Discussion. See (sorry about that!) under Sneftel's answer.

Stanleigh answered 1/12, 2013 at 13:36 Comment(0)

Recommended topics

Hot tags