atoi implementation in C
Asked Answered
B

6

29

I can't understand the following atoi implementation code, specifically this line:

k = (k << 3) + (k << 1) + (*p) - '0';

Here is the code:

int my_atoi(char *p) {
    int k = 0;
    while (*p) {
        k = (k << 3) + (k << 1) + (*p) - '0';
        p++;
     }
     return k;
}

Can someone explain it to me ?

Another question: what should be the algorithm of atof implementation ?

Berry answered 8/10, 2012 at 23:57 Comment(0)
M
32
k = (k << 3) + (k << 1);

means

k = k * 2³ + k * 2¹ = k * 8 + k * 2 = k * 10

Does that help?

The *p - '0' term adds the value of the next digit; this works because C requires that the digit characters have consecutive values, so that '1' == '0' + 1, '2' == '0' + 2, etc.

As for your second question (atof), that should be its own question, and it's the subject for a thesis, not something simple to answer...

Monograph answered 8/10, 2012 at 23:58 Comment(3)
@Berry The algorithm is simply "For each new digit, multiply the value of the preceding digits by 10 and add the new digit". Note the problems mentioned by Karoly Horvath in his answer, though, if the input contains anything but decimal digits, you'll get garbage - in particular, it doesn't handle negative numbers. And if the input is too long, you'll have overflow and undefined behaviour.Prepare
Now it's much better... can you show me the code for it without bitwise ? another issue what should be atof algorithm ?Berry
@DanielFischer tbf, atoi doesn't handle overflow, eitherBotello
K
34

<< is bit shift, (k<<3)+(k<<1) is k*10, written by someone who thought he was more clever than a compiler (well, he was wrong...)

(*p) - '0' is subtracting the value of character 0 from the character pointed by p, effectively converting the character to a number.

I hope you can figure out the rest... just remember how the decimal system works.

Here is a specification for the standard function atoi. Sorry for not quoting the standard, but this will work just as fine (from: http://www.cplusplus.com/reference/clibrary/cstdlib/atoi/ )

The function first discards as many whitespace characters (as in isspace) as necessary until the first non-whitespace character is found. Then, starting from this character, takes an optional initial plus or minus sign followed by as many base-10 digits as possible, and interprets them as a numerical value.

The string can contain additional characters after those that form the integral number, which are ignored and have no effect on the behavior of this function.

If the first sequence of non-whitespace characters in str is not a valid integral number, or if no such sequence exists because either str is empty or it contains only whitespace characters, no conversion is performed and zero is returned.

Kirman answered 8/10, 2012 at 23:59 Comment(9)
"he was wrong" - Um, maybe, maybe not. It depends on the compiler and when said implementation was written. Perhaps this was written some time ago when optimizing compilers weren't nearly as good. There's no reason to go back and change it now. Writers of standard library implementations have to deal with many (many) more use cases then we typically do.Lettering
@Ed S.: maybe with some #ifdefs.. I doubt you will find this in any common standard library code..Kirman
That I agree with, so I suppose you're probably right in this case.Lettering
@EdS. - Even if the compiler at the time could not optimize k*10, that's no excuse for writing k*10 in such an obtuse way. Improving performance by obfuscating code is never a good idea.Raffinate
@aroth: "Improving performance by obfuscating code is never a good idea" - You almost always make a piece of code harder to understand when optimizing it, and from time to time it does in fact get quite tricky (a comment is warranted in that case though). Sometimes things really do require very low level optimizations, and unless you have that rare breed of user who cares more about code quality than performance, saying that it is "never a good idea" is just silly. I think Duff's Device would like to have a chat.Lettering
@KarolyHorvath: Can you edit your message and add your full suggested implementation ?Berry
@EdS. - We'll have to agree to disagree then. Though I'll point out that there are more people involved than the end user and the guy writing the code. There's also all the other people who might come along to maintain the obtuse code, long after the original developer has left. And in most cases (unless the optimization is of the sort that replaces an O(n^2) algorithm with an O(n log n) one) the end user will see more performance improvement from a single iteration of Moore's Law than from any amount of obfuscating code optimizations.Raffinate
I disagree with both of you. Moore's Law ended 5+ years ago; cpus are not getting faster anymore, they're just getting more cores, which generally does not help the end user perceptibly. However, it would take a pathologically bad compiler not to choose the best way of performing multiplication by a constant. Writing the multiplication by 10 as bitshifts is not an optimization; it's an expression of the assembly-language code the author wanted the compiler to generate on a particular machine, which is not what should be expressed by C.Monograph
@R.. - Hooray for disagreement! Although Moore's Law is still going strong. You're correct that most new transistors go towards adding cores, however some go towards architectural improvements that increase IPC, and process improvements allow for higher clock speeds. So single-threaded performance is still improving with each iteration (by about 10-20%). Not by the full 2x, but still enough to be noticeable and to eclipse gains that might be made by obfuscating code here and there.Raffinate
M
32
k = (k << 3) + (k << 1);

means

k = k * 2³ + k * 2¹ = k * 8 + k * 2 = k * 10

Does that help?

The *p - '0' term adds the value of the next digit; this works because C requires that the digit characters have consecutive values, so that '1' == '0' + 1, '2' == '0' + 2, etc.

As for your second question (atof), that should be its own question, and it's the subject for a thesis, not something simple to answer...

Monograph answered 8/10, 2012 at 23:58 Comment(3)
@Berry The algorithm is simply "For each new digit, multiply the value of the preceding digits by 10 and add the new digit". Note the problems mentioned by Karoly Horvath in his answer, though, if the input contains anything but decimal digits, you'll get garbage - in particular, it doesn't handle negative numbers. And if the input is too long, you'll have overflow and undefined behaviour.Prepare
Now it's much better... can you show me the code for it without bitwise ? another issue what should be atof algorithm ?Berry
@DanielFischer tbf, atoi doesn't handle overflow, eitherBotello
A
1
#include <stdio.h>
#include <errno.h>
#include <limits.h>

double atof(const char *string);

int debug=1;

int main(int argc, char **argv)
{
    char *str1="3.14159",*str2="3",*str3="0.707106",*str4="-5.2";
    double f1,f2,f3,f4;
    if (debug) printf("convert %s, %s, %s, %s\n",str1,str2,str3,str4);
    f1=atof(str1);
    f2=atof(str2);
    f3=atof(str3);
    f4=atof(str4);

    if (debug) printf("converted values=%f, %f, %f, %f\n",f1,f2,f3,f4);
    if (argc > 1)
    {
        printf("string %s is floating point %f\n",argv[1],atof(argv[1]));
    }
}

double atof(const char *string)
{
    double result=0.0;
    double multiplier=1;
    double divisor=1.0;
    int integer_portion=0;

    if (!string) return result;
    integer_portion=atoi(string);

    result = (double)integer_portion;
    if (debug) printf("so far %s looks like %f\n",string,result);

    /* capture whether string is negative, don't use "result" as it could be 0 */
    if (*string == '-')
    {
        result *= -1; /* won't care if it was 0 in integer portion */
        multiplier = -1;
    }

    while (*string && (*string != '.'))
    {
        string++;
    }
    if (debug) printf("fractional part=%s\n",string);

    // if we haven't hit end of string, go past the decimal point
    if (*string)
    {
        string++;
        if (debug) printf("first char after decimal=%c\n",*string);
    }

    while (*string)
    {
        if (*string < '0' || *string > '9') return result;
        divisor *= 10.0;
        result += (double)(*string - '0')/divisor;
        if (debug) printf("result so far=%f\n",result);
        string++;
    }
    return result*multiplier;
}
Atworth answered 24/12, 2012 at 6:13 Comment(1)
This does not work for "1 .1" among other problems.Jaella
A
0

Interestingly, the man page for atoi doesn't indicate setting of errno so if you're talking any number > (2^31)-1, you're out of luck and similarly for numbers less than -2^31 (assuming 32-bit int). You'll get back an answer but it won't be what you want. Here's one that could take a range of -((2^31)-1) to (2^31)-1, and return INT_MIN (-(2^31)) if in error. errno could then be checked to see if it overflowed.

#include <stdio.h>
#include <errno.h>  /* for errno */
#include <limits.h> /* for INT_MIN */
#include <string.h> /* for strerror */

extern int errno;

int debug=0;
int atoi(const char *c)
{
    int previous_result=0, result=0;
    int multiplier=1;

    if (debug) printf("converting %s to integer\n",c?c:"");
    if (c && *c == '-')
    {
        multiplier = -1;
        c++;
    }
    else
    {
        multiplier = 1;
    }
    if (debug) printf("multiplier = %d\n",multiplier);
    while (*c)
    {
        if (*c < '0' || *c > '9')
        {
            return result * multiplier;
        }
        result *= 10;
        if (result < previous_result)
        {
            if (debug) printf("number overflowed - return INT_MIN, errno=%d\n",errno);
            errno = EOVERFLOW;
            return(INT_MIN);
        }
        else
        {
            previous_result *= 10;
        }
        if (debug) printf("%c\n",*c);
        result += *c - '0';

        if (result < previous_result)
        {
            if (debug) printf("number overflowed - return MIN_INT\n");
            errno = EOVERFLOW;
            return(INT_MIN);
        }
        else
        {
            previous_result += *c - '0';
        }
        c++;
    }
    return(result * multiplier);
}

int main(int argc,char **argv)
{
    int result;
    printf("INT_MIN=%d will be output when number too high or too low, and errno set\n",INT_MIN);
    printf("string=%s, int=%d\n","563",atoi("563"));
    printf("string=%s, int=%d\n","-563",atoi("-563"));
    printf("string=%s, int=%d\n","-5a3",atoi("-5a3"));
    if (argc > 1)
    {
        result=atoi(argv[1]);
        printf("atoi(%s)=%d %s",argv[1],result,(result==INT_MIN)?", errno=":"",errno,strerror(errno));
        if (errno) printf("%d - %s\n",errno,strerror(errno));
        else printf("\n");
    }
    return(errno);
}
Atworth answered 24/12, 2012 at 4:4 Comment(1)
Your attempt at detecting signed overflow is doomed: the behavior is undefined if signed integer overflow occurs. Testing the resulting value is meaningless. Furthermore, atoi does not need to handle overflow in any particular manner, so why bother? Btw the errno value for out of range is ERANGE.Jaella
S
0

Here is my implementation(tested successfully with cases containing and starting with letters, +, - and zero's). I tried to reverse-engineer atoi function in Visual Studio. If the input string only contained numerical characters, it could be implemented in one loop. but it gets complicated because you should take care of -,+ and letters.

int atoi(char *s)
{    
    int c=1, a=0, sign, start, end, base=1;
//Determine if the number is negative or positive 
    if (s[0] == '-')
        sign = -1;
    else if (s[0] <= '9' && s[0] >= '0')
        sign = 1;
    else if (s[0] == '+')
        sign = 2;
//No further processing if it starts with a letter 
    else 
        return 0;
//Scanning the string to find the position of the last consecutive number
    while (s[c] != '\n' && s[c] <= '9' && s[c] >= '0')
        c++;
//Index of the last consecutive number from beginning
    start = c - 1;
//Based on sign, index of the 1st number is set
    if (sign==-1)       
        end = 1;
    else if (sign==1)
        end = 0;
//When it starts with +, it is actually positive but with a different index 
//for the 1st number
    else
    { 
        end = 1;
        sign = 1;
    }
//This the main loop of algorithm which generates the absolute value of the 
//number from consecutive numerical characters.  
    for (int i = start; i >=end ; i--)
    {
        a += (s[i]-'0') * base;
        base *= 10;
    }
//The correct sign of generated absolute value is applied
    return sign*a;
}
Spraggins answered 20/10, 2018 at 13:13 Comment(0)
C
0

about atoi() hint code from here:

and based on the atoi(), my implementation of atof():

[have same limitation of original code, doesn't check length, etc]

double atof(const char* s)
{
  double value_h = 0;
  double value_l = 0;
  double sign = 1;

  if (*s == '+' || *s == '-')
  {
    if (*s == '-') sign = -1;
    ++s;
  }

  while (*s >= 0x30 && *s <= 0x39)
  {
    value_h *= 10;
    value_h += (double)(*s - 0x30);
    ++s;
  }

  // 0x2E == '.'
  if (*s == 0x2E)
  {
    double divider = 1;
    ++s;
    while (*s >= 0x30 && *s <= 0x39)
    {
      divider *= 10;
      value_l *= 10;
      value_l += (double)(*s - 0x30);
      ++s;
    }
    return (value_h + value_l/divider) * sign;
  }
  else
  {
    return value_h * sign;
  }
}
Camm answered 30/4, 2019 at 6:16 Comment(2)
You should use '.', '0' and '9' instead of hardcoded ASCII values.Jaella
Your implementation is not perfect because it does not support the precise rounding specification for border cases, it is a good illustration. You can extend it to support the exponent notation with a small effort.Jaella

© 2022 - 2024 — McMap. All rights reserved.