Implementing strnstr
Asked Answered
G

3

12

I am trying to implement a strnstr function into C (strstr but it checks the length), for some reason it doesn't work (output is always no):

#include <stdio.h>

char *searchingFor = "stackdummy";
char *in = "la da\ndoo a da\nnow here comes the stack\nok there it was.\n";

char *strnstr(char *s1, char *s2, int length) {
    if(s1 == NULL || s2 == NULL) return NULL;
    printf("searching \n\n\"%s\"\n for %.*s\n", s1, length, s2);
    char *ss1 = malloc(strlen(s1) + 1);
    strcpy(ss1, s1);
    char *ss2 = malloc(length + 1);
    strncpy(ss2, s2, length);
    char *result = strstr(ss1, ss2);
    free(ss1);
    free(ss2);
    return result;
}

int main(void) {
    printf("found: %s\n", strnstr(in, searchingFor, 5) ? "yes" : "no");
    printf("found: %s\n", strnstr(in, searchingFor, 5) ? "yes" : "no");
    printf("found: %s\n", strnstr(in, searchingFor, 5) ? "yes" : "no");
    return 0;
}
Genisia answered 2/6, 2014 at 17:7 Comment(7)
strncpy does not add a terminating zero.Depopulate
Well, at least it's in the documentation.Depopulate
@MOehm my mistake, really. There's nothing wrong with pointers in a Boolean context. Thanks for the critique, btw :)Besot
@7heo.tk: Well, I haven't said anything, then.Dividivi
1) After strncpy(ss2, s2, length);, add ss2[length] = '\0'; to insure ss2 is terminated. 2) note: returning pointer to a malloc'd variable.Cellobiose
An alternative implementation can be found here: Simple and safe implementationSapro
I am and I think others are confused by the question since the function, strnstr is in FreeBSD. It has different semantics than what I assume yours does. But, you may not know about that function or be trying to mimic its behavior. I suspect the comment from chux and others about terminating the string may make your test pass. ... but, it returns a pointer to freed memory. That is asking for a hard crash! If you want to return a boolean indication, then return something like !!result and declare result as bool or int or similar.Triumphant
F
10

The implementation provided by Chris Dodd has the following disadvantages:

  1. It defeats the purpose of strnstr in that the while condition uses the unbounded string function strchr
  2. It depends on haystack being NULL terminated, which is a deviation from the usual implementation of strnstr, for example as provided by GNU-Darwin
  3. The call to strchr is an unnecessary function call when strchar is not inlined
  4. Returns haystack instead of NULL when len is zero, a deviation from the accepted strstr semantics
  5. Returns an empty string instead of haystack when needle has length of zero

The following implementation remedies the above problems without becoming as difficult to read as the GNU-Darwin implementation, and is Creative Commons licensed:

#include <string.h>

char *strnstr(const char *haystack, const char *needle, size_t len)
{
        int i;
        size_t needle_len;

        if (0 == (needle_len = strnlen(needle, len)))
                return (char *)haystack;

        for (i=0; i<=(int)(len-needle_len); i++)
        {
                if ((haystack[0] == needle[0]) &&
                        (0 == strncmp(haystack, needle, needle_len)))
                        return (char *)haystack;

                haystack++;
        }
        return NULL;
}
Folklore answered 6/9, 2014 at 22:15 Comment(8)
This seems broken as it will fail in the OP's code, because it runs off the end of needle (which is not null terminated). Also, strchr (or memchr) is generally faster than a 1-char-at-a-time for-loop, as it usually works word-at-a-time.Klara
@ChrisDodd: You are correct regarding needle. However the NULL termination is required in all of the common string library (Linux, Mac, FreeBSD) implementations and so my assumption was that the OP really meant strnstr when he wrote strnstr. Regarding the use of strchr instead of a one-at-a-time loop, strchr itself is implemented as a one-at-a-time loop, so moving the loop to strnstr saves the function call overhead. The memchr implementation is even slower than strchr, and here you would need to add an additional test for a NULL character before len chars.Folklore
This doesn't work if len == strlen(needle) which looks like a pretty common scenario.Gies
@almosnow Whoops, your right. Off by one error. Try it now. Thanks!Folklore
There's a difference between the real strnstr and this tho, as the real one crashes (segmentation fault) when NULL is given for big and len is greater than the length of needle. In your case the for doesn't execute (cause len_little - len is negative) so the program doesn't reach if((haystack[0]....) to crash since it attempts NULL + 0, so your implementation doesn't crush whereas the real one does. Might not seem like much, but if you want an implementation that matches strnstr in everything the real one does, then you have to account for crashes as well.Nikos
One more difference between this implementation and BSD one when calling with the following arguments: strnstr("abcdef", "abc", 2). This impl returns ptr to haystack, the BSD impl returns NULL. NULL is correct because we only should use 2-char portion of haystack.Noon
I'm afraid your implementation is incorrect: len is the maximum number of characters from haystack to consider, all characters in null terminated needle must match a portion of the first len characters of haystack.Sporozoite
"Returns haystack instead of NULL when len is zero, a deviation from the accepted strstr semantics". Doesn't this implementation do the same? in "if (0 == (needle_len = strnlen(needle, len))) return (char *)haystack";Valiant
K
2

How about:

char *strnstr(char *haystack, char *needle, size_t len) {
    if (len == 0) return haystack; /* degenerate edge case */
    while (haystack = strchr(haystack, needle[0])) {
        if (!strncmp(haystack, needle, len)) return haystack;
        haystack++; }
    return 0;
}

If you want haystack to not be null terminated, you'll need two length args:

char *memmem(char *haystack, size_t hlen, char *needle, size_t nlen) {
    if (nlen == 0) return haystack; /* degenerate edge case */
    if (hlen < nlen) return 0; /* another degenerate edge case */
    char *hlimit = haystack + hlen - nlen + 1;
    while (haystack = memchr(haystack, needle[0], hlimit-haystack)) {
        if (!memcmp(haystack, needle, nlen)) return haystack;
        haystack++; }
    return 0;
}

which is available in GNU libc, though older versions are broken.

Klara answered 2/6, 2014 at 17:24 Comment(8)
Wow, that's a much cleaner approach than mine.Genisia
Though first line should also check for if(len == 0 || !haystack || !needle) return haystack;Genisia
As written its analogous to strncmp -- len == 0 is well defined as comparing zero chars, but passing a null pointer is undefined if len > 0.Klara
@ChrisDodd Possibly pointless optimization, but given that you've already identified haystack[0] == needle[0], you could strncmp(haystack+1, needle+1, len-1)... I guess the pointer adjustments could potentially outweigh the gain from not re-comparing the first character in some cases...Carousel
This approach is also pathologically slow (quadratic time). If your system has memmem (nonstandard but fairly common) I think it could be used to implement "strnstr" efficiently, but doing so is not entirely trivial still.Alkahest
Note: This code does not prevent reading &haystack[len] and beyond.Cellobiose
@R: This is O(nm) time, not quadratic, and is slower than optimal only on pathological inputs. @chux: len is the length of needle and has no relation to the length of haystack.Klara
FreeBSD's version of strnstr treats len as length of portion of haystack to scan, not the length of needle. In your code you also treat it this way: if (len == 0) return haystack;, if (nlen == 0) return haystack; (which contradicts your comment). So results of your implementation differ from FreeBSD ones, e.g. for std::strnstr("abcdef", "abc", 2) and std::strnstr("abcdef", "bcd", 3).Noon
S
0

The strnstr function is not defined in the C Standard, it is available on BSD and some other systems as an extension.

Here is the man page on OS/X:

NAME

strstr, strcasestr, strnstr -- locate a substring in a string

LIBRARY

Standard C Library (libc, -lc)

SYNOPSIS

    #include <string.h>

[...]

    char *strnstr(const char *haystack, const char *needle, size_t len);

[...]

DESCRIPTION

[...]

The strnstr() function locates the first occurrence of the null-terminated string needle in the string haystack, where not more than len characters are searched. Characters that appear after a '\0' character are not searched. Since the strnstr() function is a FreeBSD specific API, it should only be used when portability is not a concern.

RETURN VALUES

If needle is an empty string, haystack is returned; if needle occurs nowhere in haystack, NULL is returned; otherwise a pointer to the first character of the first occurrence of needle is returned.

EXAMPLES

The following sets the pointer ptr to the "Bar Baz" portion of largestring:

       const char *largestring = "Foo Bar Baz";
       const char *smallstring = "Bar";
       char *ptr;

       ptr = strstr(largestring, smallstring);

The following sets the pointer ptr to NULL, because only the first 4 characters of largestring are searched:

       const char *largestring = "Foo Bar Baz";
       const char *smallstring = "Bar";
       char *ptr;

       ptr = strnstr(largestring, smallstring, 4);

This specification is not concise enough, (the man page for the linux kernel version is even more imprecise), yet the example on BSD systems (notably above here) is clear: len is the maximum number of bytes to consider in haystack, not needle, which is just a regular null terminated C string.

Your function does not work for multiple reasons:

  • the semantics are incorrect as you consider length to limit s2 instead of s1
  • in your approach, duplicating s1 is useless and counter-productive: result, if non NULL, will point into the allocated copy that is freed before returning from the function, hence accessing the string pointed to by the return value will have undefined behavior.
  • strncpy does not null terminate the destination array if the source string has at least length characters before its own null terminator. You must set ss2[length] = '\0'; for your approach to work, but again, the real strnstr() function operates differently.
  • using malloc() and free() is probably not what you are expected to do, and failing to test for potential allocation failure is a mistake.

Here is a corrected version:

char *strnstr(const char *s1, const char *s2, size_t n) {
    // simplistic algorithm with O(n2) worst case
    size_t i, len;
    char c = *s2;

    if (c == '\0')
        return (char *)s1;

    for (len = strlen(s2); len <= n; n--, s1++) {
        if (*s1 == c) {
            for (i = 1;; i++) {
                if (i == len)
                    return (char *)s1;
                if (s1[i] != s2[i])
                    break;
            }
        }
    }
    return NULL;
}
Sporozoite answered 21/9, 2020 at 9:54 Comment(1)
I suspect that the OP is not trying to mimic the behavior of the FreeBSD function with the same name. But this thread is so old we'll probably never know.Triumphant

© 2022 - 2024 — McMap. All rights reserved.