Correct usage of strtol
Asked Answered
H

5

43

The program below converts a string to long, but based on my understanding it also returns an error. I am relying on the fact that if strtol successfully converted string to long, then the second parameter to strtol should be equal to NULL. When I run the below application with 55, I get the following message.

./convertToLong 55
Could not convert 55 to long and leftover string is: 55 as long is 55

How can I successfully detect errors from strtol? In my application, zero is a valid value.

Code:

#include <stdio.h>
#include <stdlib.h>

static long parseLong(const char * str);

int main(int argc, char ** argv)
{
    printf("%s as long is %ld\n", argv[1], parseLong(argv[1]));
    return 0;
 }

static long parseLong(const char * str)
{
    long _val = 0;
    char * temp;

    _val = strtol(str, &temp, 0);

    if(temp != '\0')
            printf("Could not convert %s to long and leftover string is: %s", str, temp);

    return _val;
}
Hourglass answered 5/1, 2013 at 20:32 Comment(8)
Read the documentation again; you also should handle errors like overflow.Nominee
Also, the proper error checking for strto* functions is not done by checking the output pointer. It should be done by checking for a zero return value and a set errno.Prioress
Why don't you use std::stoi in C++ ? (you added the C++ tag)Euripus
@BatchyX, It won't work quite as well for strings like "123abc" (as was the consensus in my previous question). The OP is checking for the entire string to be converted.Waxwing
@chris: You can do exactly the same thing with std::stoi. In fact, the prototype of stoi is almost the same as strtol, but uses exceptions where exceptions are due, instead of an error return value with global error variable hackery.Euripus
@BatchyX, True, but it's really annoying trying to see if the whole string was converted. I'd expect implementations to use strtol under the hood anyway, as one exception is based on a reported failure from strtol, but completely leave out converting the whole string in the checking. I find boost::lexical_cast a good substitute for that behaviour, though people have made a case against it as well.Waxwing
@chris: come on... doing that with strtoi is just if (*pos != string.length()) throw std::invalid_argument();, and it will reuse your invalid_argument exception handler. And sometimes, you ẁant to accept unconverted string if it begins with a space..Euripus
@BatchyX, Whatever works. I'm just surprised it doesn't do that in the first place, so you have to add your own code onto it if you want that functionality.Waxwing
W
22

You're almost there. temp itself will not be null, but it will point to a null character if the whole string is converted, so you need to dereference it:

if (*temp != '\0')
Waxwing answered 5/1, 2013 at 20:35 Comment(1)
Additional checks are needed to handle overflows and parsing an empty string. See Jonathan Leffler's answer.Electuary
C
80

Note that names beginning with an underscore are reserved for the implementation; it is best to avoid using such names in your code. Hence, _val should be just val.

The full specification of error handling for strtol() and its relatives is complex, surprisingly complex, when you first run across it. One thing you're doing absolutely right is using a function to invoke strtol(); using it 'raw' in code is probably not correct.

Since the question is tagged with both C and C++, I will quote from the C2011 standard; you can find the appropriate wording in the C++ standard for yourself.

ISO/IEC 9899:2011 §7.22.1.4 The strtol, strtoll, strtoul and strtoull functions

long int strtol(const char * restrict nptr, char ** restrict endptr, int base);

¶2 [...] First, they decompose the input string into three parts: an initial, possibly empty, sequence of white-space characters (as specified by the isspace function), a subject sequence resembling an integer represented in some radix determined by the value of base, and a final string of one or more unrecognized characters, including the terminating null character of the input string. [...]

¶7 If the subject sequence is empty or does not have the expected form, no conversion is performed; the value of nptr is stored in the object pointed to by endptr, provided that endptr is not a null pointer.

Returns

¶8 The strtol, strtoll, strtoul, and strtoull functions return the converted value, if any. If no conversion could be performed, zero is returned. If the correct value is outside the range of representable values, LONG_MIN, LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned (according to the return type and sign of the value, if any), and the value of the macro ERANGE is stored in errno.

Remember that no standard C library function ever sets errno to 0. Therefore, to be reliable, you must set errno to zero before calling strtol().

So, your parseLong() function might look like:

static long parseLong(const char *str)
{
    errno = 0;
    char *temp;
    long val = strtol(str, &temp, 0);

    if (temp == str || *temp != '\0' ||
        ((val == LONG_MIN || val == LONG_MAX) && errno == ERANGE))
        fprintf(stderr, "Could not convert '%s' to long and leftover string is: '%s'\n",
                str, temp);
        // cerr << "Could not convert '" << str << "' to long and leftover string is '"
        //      << temp << "'\n";
    return val;
}

Note that on error, this returns 0 or LONG_MIN or LONG_MAX, depending on what strtol() returned. If your calling code needs to know whether the conversion was successful or not, you need a different function interface — see below. Also, note that errors should be printed to stderr rather than stdout, and error messages should be terminated by a newline \n; if they're not, they aren't guaranteed to appear in a timely fashion.

Now, in library code you probably do not want any printing, and your calling code might want to know whether the conversion was successful of not, so you might revise the interface too. In that case, you'd probably modify the function so it returns a success/failure indication:

bool parseLong(const char *str, long *val)
{
    char *temp;
    bool rc = true;
    errno = 0;
    *val = strtol(str, &temp, 0);

    if (temp == str || *temp != '\0' ||
        ((*val == LONG_MIN || *val == LONG_MAX) && errno == ERANGE))
        rc = false;

    return rc;
}

which you could use like:

if (parseLong(str, &value))
    …conversion successful…
else
    …handle error…

If you need to distinguish between 'trailing junk', 'invalid numeric string', 'value too big' and 'value too small' (and 'no error'), you'd use an integer or enum instead of a boolean return code. If you want to allow trailing white space but no other characters, or if you don't want to allow any leading white space, you have more work to do in the function. The code allows octal, decimal and hexadecimal; if you want strictly decimal, you need to change the 0 to 10 in the call to strtol().

If your functions are to masquerade as part of the standard library, they should not set errno to 0 permanently, so you'd need to wrap the code to preserve errno:

int saved = errno;  // At the start, before errno = 0;

…rest of function…

if (errno == 0)     // Before the return
    errno = saved;
Cinnamon answered 5/1, 2013 at 21:30 Comment(14)
Thanks for the extensive answer! But why do you explicitly check for "errno == ERANGE" instead of "errno != 0"? If the user could specify an own base for conversion, errno could also be set to EINVAL... Also, "man strtol" (linux.die.net/man/3/strtol) uses the following code for error checking, and I really don't get the reason for this: "if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN)) || (errno != 0 && val == 0)){ error }". Why isn't this a simple "errno != 0" as well?Coumas
The standard doesn't mention setting errno == EINVAL for values of base other than 0 or 2..36, but it is a reasonable thing to do. In general, you should be cautious about trying to detect error conditions with errno rather than the return from a function; the library can set errno to a non-zero value even if the function succeeds. (On Solaris, if the output was not a terminal, you'd find errno == ENOTTY after a successful operation.) In theory, strtol() could convert "1" to 1 and set errno to a non-zero value and this would be legitimate but perverted (and successful).Cinnamon
Is there a reason errno == ERANGE is checked unconditionally, whether strtol returned LONG_MIN/LONG_MAX or not? (For the reason you give in the comment, a library function may set errno on success.)Bloodroot
@mafso: Originally, some variation on the theme of exhaustion, laziness or carelessness. I've updated the answer to address your valid point, and miscellaneous other minor issues (spelling, etc).Cinnamon
There's an error in your example. val is a long int *, but you do the check val == LONG_MIN, it should be *val == LONG_MIN...Sihunn
@Joakim: Thanks! You're right; the second example was using val where *val was necessary. I've compiled the amended version of both samples under stringent warning options; they're probably OK now.Cinnamon
Why check temp == str || *temp != '\0' ? Isn't the check temp == str already covered? (If temp points to str, then it is not '\0' unless str is a null-string, in which case the conversion would also have failed... Or am I missing something?Tadtada
@BmyGuest: Two different failure modes. If temp == str, then there was nothing in the string that was recognizable as a long. If *temp != '\0', then there was a number, but it did not use the whole string — there was some other character after the number. You're at liberty to decide that it isn't a problem if you are given "19Z" and the Z isn't convertible; the test shown assumes it is a problem (the 'trailing junk' mentioned in the answer). But good question: it is important to understand what you're using.Cinnamon
Disagree with "the library can set errno to a non-zero value even if the function succeeds." C11 §7.5 3 discuses that but that does not apply to strtol() because "provided the use of errno is not documented in the description of the function" which strtol() does. if (temp == str || *temp != '\0' || errno == ERANGE) is sufficient . IMO if (temp == str || *temp != '\0' || errno) is better as it catches some ID extensions. The (*val == LONG_MIN || *val == LONG_MAX) are not needed.Quicklime
@chux: That comment is subject to the 'In general' prefix; you're right that it doesn't apply when the use of errno is specified (so it doesn't apply to strtol()) and I don't explicitly say so. It gets tricky when the C standard only says ERANGE but some implementations might set EINVAL instead when the base is invalid. It's undefined behaviour to call the function with invalid values; you get what you get (setting the output pointer to the input pointer and returning 0 and setting errno to EINVAL is all reasonable if base is not 0 or 2..36).Cinnamon
@JonathanLeffler Agree about EINVAL and so the suggested temp == str || *temp != '\0' || errno - I think we agree well there. Yet the comment is about the need for *val == LONG_MIN || *val == LONG_MAX, which is not enhanced given the other errno possibilities. If errno == ERANGE is true, then even if *val == LONG_MIN || *val == LONG_MAX was false on some unicorn machine, the strtol() should still be consider as failed.Quicklime
"Note that names beginning with an underscore are reserved for the implementation; it is best to avoid using such names in your code. Hence, _val should be just val." This isn't quite true AFAIK. The standard reserves names beginning with an underscore followed by either an underscore or a capital letter. So __val and _Val are reserved, but _val is not.Strigose
@celticminstrel: For C, part of C11 §7.1.3 Reserved identifiers says: — All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces. See also What does double underscore (__const) mean in C? Yes, you can use names that start with underscores; no, you won't always get away with it.Cinnamon
That sounds like it's safe to use an initial underscore followed by a lowercase character or a digit for local variables or class variables, but not for global file-static variables or variables in an anonymous namespace. I bet std::placeholders is one of the main reasons for that second rule…Strigose
W
22

You're almost there. temp itself will not be null, but it will point to a null character if the whole string is converted, so you need to dereference it:

if (*temp != '\0')
Waxwing answered 5/1, 2013 at 20:35 Comment(1)
Additional checks are needed to handle overflows and parsing an empty string. See Jonathan Leffler's answer.Electuary
Q
7

How can I successfully detect errors from strtol?

static long parseLong(const char * str) {
    int base = 0;
    char *endptr;
    errno = 0;
    long val = strtol(str, &endptr, base);

3 tests specified/supported by the standard C library:

  1. Any conversion done?

     if (str == endptr) puts("No conversion.");
    
  2. In range?

     // Best to set errno = 0 before the strtol() call.
     else if (errno == ERANGE) puts("Input out of long range.");
    
  3. Tailing junk?

     else if (*endptr) puts("Extra junk after the numeric text.");
    

Success

    else printf("Success %ld\n", val);

Input like str == NULL or base not 0, [2 to 36] is undefined behavior. Various implementations (extensions to the C library) provide defined behavior and report via errno. We could add a 4th test.

    else if (errno) puts("Some implementation error found.");

Or combine with the errno == ERANGE test.


Sample terse code that also takes advantage of common implementation extensions.

long my_parseLong(const char *str, int base, bool *success) {
    char *endptr = 0;
    errno = 0;
    long val = strtol(str, &endptr, base);
   
    if (success) {
      *success = endptr != str && errno == 0 && endptr && *endptr == '\0';
    }
    return val;
}
Quicklime answered 27/9, 2020 at 14:13 Comment(0)
P
4

You're missing a level of indirection. You want to check whether the character is the terminating NUL, and not if the pointer is NULL:

if (*temp != '\0')

By the way, this is not a good approach for error checking. The proper error checking method of the strto* family of functions is not done by comparing the output pointer with the end of the string. It should be done by checking for a zero return value and getting the return value of errno.

Prioress answered 5/1, 2013 at 20:36 Comment(0)
R
1

You should be checking

*temp != '\0'

You should also be able to check the value of errno after calling strotol according to this:

RETURN VALUES
     The strtol(), strtoll(), strtoimax(), and strtoq() functions return the result
     of the conversion, unless the value would underflow or overflow.  If no conver-
     sion could be performed, 0 is returned and the global variable errno is set to
     EINVAL (the last feature is not portable across all platforms).  If an overflow
     or underflow occurs, errno is set to ERANGE and the function return value is
     clamped according to the following table.


       Function       underflow     overflow
       strtol()       LONG_MIN      LONG_MAX
       strtoll()      LLONG_MIN     LLONG_MAX
       strtoimax()    INTMAX_MIN    INTMAX_MAX
       strtoq()       LLONG_MIN     LLONG_MAX
Roadway answered 5/1, 2013 at 20:41 Comment(5)
Citing from "the following table" does not make sense if you don't say where the "following table" can be found.Lasky
Did you write this documentation yourself, or did you just forget to mention the source you copied it from?Lasky
No it's a man page. Just "man strtol" on any unix based system.Roadway
I'm just asking since the NetBSD man page looks quite different, even though it is a UNIX-like system.Lasky
Furthermore, the question is tagged as "C, C++", therefore the proper reference is from the C or C++ standard, not from a particular implementation on a particular hardware architecture.Lasky

© 2022 - 2024 — McMap. All rights reserved.