How to calculate the length of a string in C efficiently?
Asked Answered
B

14

25

How to calculate the length of a string in C efficiently (in time)?

Right now I'm doing:

int calculate_length(char *string) {
    int length = 0;
    while (string[length] != '\0') {
        length++;
    }
    return length;
}

But it's very slow compared to strlen() for example, is there any other way to do it?

Thanks.

EDIT: I'm working in a freestanding environment, I'm not allowed to use any external lib including "string.h".

Beatrisbeatrisa answered 15/1, 2010 at 9:34 Comment(5)
Why not use strlen then? Or is this an exercise?Pawnshop
It's not an exercise, the environment where I'm working in doesn't allow me to include other "libs", including "string.h" so I have to implement it and would like it to be as efficient as possible while being maintainble.Briney
You may want to edit your original post to mention that you're in a freestanding environment.Maelstrom
Take into account that the std library can also be compiled with compiler optimizations activated and your code don't.Scratchy
There are excellent answers here, but keep in mind that this is micro-optimization, and not all programmers understand the use and high importance of macro-optimization. Here's an example of a 40x speedup in perfectly OK-looking code: #926766Infralapsarian
I
46

From the FreeBSD source code:

size_t
strlen(const char *str)
{
    const char *s;
    for (s = str; *s; ++s);
    return(s - str);
}

Compared to your code, this probably maps very nicely to an assembler instruction, which can explain a big performance difference.

Indonesia answered 15/1, 2010 at 9:40 Comment(1)
The compiler should be able to optimise this fairly effectively, which means the code is still readable, and should still run fairly quickly.Maelstrom
H
11

strlen(). Odds are, if somebody had found a better, faster generic method, strlen would have been replaced with that.

Heyday answered 15/1, 2010 at 9:39 Comment(0)
P
9

Take a look at the source code of strlen in the standard libc. Functions in standard libraries are generally highly optimized. Check it out here (coded in assembly) - this is from the GNU libc.

size_t
DEFUN(strlen, (str), CONST char *str)
{
  int cnt;

  asm("cld\n"                   /* Search forward.  */
      /* Some old versions of gas need `repne' instead of `repnz'.  */
      "repnz\n"                 /* Look for a zero byte.  */
      "scasb" /* %0, %1, %3 */ :
      "=c" (cnt) : "D" (str), "0" (-1), "a" (0));

  return -2 - cnt;
}
Pyrogallate answered 15/1, 2010 at 9:39 Comment(1)
An assembly version may be faster, but you'd need some numbers to back up that claim. See leaf.dragonflybsd.org/mailarchive/commits/2011-11/msg00195.htmlCabriole
A
6

Take a look at GNU C library's strlen() source.

It uses a number of non-obvious tricks to gain speed without dropping to assembly, including:

  • getting to a character that's properly aligned
  • reading those aligned parts of the string into an int (or some larger datatype) to read several chars at a time
  • using bit twiddling tricks to check if one of the chars embedded in that block of chars is zero

etc.

Admetus answered 15/1, 2010 at 9:44 Comment(3)
Current FreeBSD one uses something similar, could come handy too: freebsd.org/cgi/cvsweb.cgi/src/lib/libc/string/…Incumber
What do you mean "without dropping to assembly"? On i386, it does use assembly (see Sudhanshu's reply).Indoors
Sudhanshu's is different than the one I linked to. It's certainly possible that when glibc is built for an x86 Sudhanshu's gets used (I'm honestly not sure); however, the example I pointed to is straight C code that can be used as an example of some possible optimizations.Admetus
N
3

The easiest way is to call strlen(). Seriously. It's already optimized by your compiler and/or library vendors, to be as fast as possible for your architecture.

One common optimization is to remove the need to increase a counter, and compute the length from the pointer:

size_t my_strlen(const char *s)
{
  const char *anchor = s;

  while(*s)
   s++;

  return s - anchor;
}
Nahuatl answered 15/1, 2010 at 9:39 Comment(0)
A
3

C strings are intrinsically inefficient, there are two reasons for using the ASCIZ convention:

  • The standard C library uses it
  • The compiler uses it for literal string constants

The first of these is academic in this instance since you are not using the standard library, the second is easily overcome by creating functions or macros that provide conversions from C strings to a more efficient convention such as Pascal strings. The point is you need not be a slave to the C convention if you are not using the C library.

Ancon answered 15/1, 2010 at 10:39 Comment(3)
++ You're right, but sometimes we look for cycles in all the wrong places. I haven't seen any real code where the speed of strlen was even on the radar, considering the multitude of macro-ways that typically make software slow.Infralapsarian
@Mike: Couldn't agree more. Likely to be premature optimisation, but the article I linked gives a couple of real world examples where it has been critical. A strlen() function for a pascal string is both fast and deterministic.Ancon
C strings are inefficient for a lot of use cases, but superior to pascal strings for some use cases (e.g. substring = &string[skipped];). Tracking string length elsewhere (not prepending it to the string itself) may be more efficient than both pascal strings and C strings.Mannie
B
2

Yet another way to speed up char counting is to use vectorization!

Here's an example of how to do this with respect to UTF8-encoded strings:

Even faster UTF-8 character counting,

http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html

Blockbusting answered 15/1, 2010 at 12:4 Comment(0)
H
1

Some of the above answers are very good, and this is my take. There is a keyword known as "register"

#include <stdio.h>
size_t strlenNew(char *s);

int main(int argc, char* argv[])
{
    printf("Size of \"Hello World\" is ::\t%d",strlenNew("Hello World"));
    return 0;
}

size_t strlenNew(char *s)
{
    register int i=0;
    while(s[i]!='\0') i++;
    return i;
}

Read here: http://gustedt.wordpress.com/2010/08/17/a-common-misconsception-the-register-keyword/ and http://msdn.microsoft.com/en-us/library/482s4fy9(v=vs.80).aspx

From the first link:

This can be particularly useful for array variables. An array variable is easily confounded with a pointer variable. Unless it is followed by a [expr] or with a sizeof it evaluates to the address of the first element. If you declare the array register all these uses are forbidden; we only access individual elements or ask for the total size. Such an register-array then may be much easier used as if it just were a set of variable by the optimizer. No aliasing (accessing the same variable through different pointers) may occur.

Thus, sometimes, there may be performance fluctuations. Personally, this is one of my fav implementations, but Sudhanshu and Andomar also provide a good implementation :)

Hendrik answered 4/4, 2013 at 18:5 Comment(0)
I
0

On i386 processors, libc often use an ultra-optimized version of strlen, often written in assembly language. The paper "String Length" explains how they work.

Here is one optimized version for OpenBSD. (They also have a portable version.) Here is the version for the GNU libc.

Indoors answered 15/1, 2010 at 12:55 Comment(0)
I
0

I had the same problem, and I resolved it. The key is the 2nd condition of the for loop:

int longitud(char cad[]){

    int i, cont;

    cont = 0;

    for(i = 0; i < 30 && cad[i] != '\0'; i++){
        if(cad[i] != '\0'){
            if(cad[i] != ' '){
                cont++;
            }
        }
    }
    cont--;
    return cont;
}
Imbecility answered 30/8, 2016 at 12:39 Comment(0)
C
0

Basic C program to calculate string length.

#include <stdio.h>

/**
* Method to calculate string length.
* Returns -1 in case of null pointer, else return string length.
**/
int length(char *str) {

    int i = -1;
    // Check for NULL pointer, then return i = -1;
    if(str == NULL) return i;

    // Iterate till the empty character.
    while (str[++i] != '\0');
    return i;  // Return string length.
}

int main (int argc, char **argv) {

    int len = 0;
    char abc[] = "hello";
    len = length(abc);
    printf("%d", len);  
    return 0;
}

NOTE: For better way, we should always pass the array size to function to avoid the case of out of bound access. For example the prototype of method should be:

/**
* @desc calculate the length of str.
* @param1 *str pointer to base address of char array.
* @param2 size = capacity of str to hold characters.
* @return int -1 in case of NULL, else return string length.
**/
int length (char *str, int size);
Chuffy answered 20/9, 2017 at 15:7 Comment(0)
E
-1

I am not quite sure of what you want to do.

You want to re-write strlen to make your code compatible with standard c-Library, or you want to manage strings.

In the first case, I think you'd better directly use the standard libraries.

The other case is interesting : you should take a look at the c++ string class, that have an implementation of the traits strategy (allowing quick manipulations of very big strings).

Exaggeration answered 25/7, 2016 at 21:9 Comment(1)
The question is pretty precise in what he meant, and states that he can't use the standard includes as he's in a free standing environment.Pines
E
-1

I did not find better :

inline size_t mystrlen(char *_)

  { return ((_ == NULL) ? (_[0] != '\0')) ? 0 : (1 + mystrlen(_ + 1)); }
Exaggeration answered 24/9, 2016 at 18:37 Comment(1)
uses recursion which add overhead of calling itself (you cant inline function that recursing to unknown depth)Pulchia
P
-5
int max;
max = sizeof(str);
return (--max);
Popinjay answered 25/1, 2012 at 5:54 Comment(2)
This would only work for char arrays and C-string literals. This won't work for pointers to strings.Sarina
this WON'T WORK with string variablesBonkers

© 2022 - 2024 — McMap. All rights reserved.