Locale-independent "atof"?
Asked Answered
W

7

28

I'm parsing GPS status entries in fixed NMEA sentences, where fraction part of geographical minutes comes always after period. However, on systems where locale defines comma as decimal separator, atof function ignores period and whole fraction part.

What is the best method to deal with this issue? Long/latitude string in stored in character array, if it matters.

Example Code:

m_longitude = atof((char *)pField); 

Where

pField[] = "01000.3897"; 

Cross-platform project, compiled for Windows XP and CE.

Comment to solution:

Accepted answer is more elegant, but this answer (and comment) is also worth knowing as a quick fix

Wickerwork answered 26/8, 2009 at 9:28 Comment(4)
Can you give us a few examples of the data you have to work with? It might help us provide a better solution.Poly
m_longitude = atof((char *)pField); where pField[] = "01000.3897"; Cross-platform project, compiled for Windows XP and CE.Wickerwork
Is there a valid reason not to use strtod (which has the same characteristic about locale but it has better error handling)?Rajput
Sorry, but accepted answer is kind of low life solution for me :(Holcomb
L
19

You could always use (modulo error-checking):

#include <sstream>
...

float longitude = 0.0f;
std::istringstream istr(pField);

istr >> longitude;

The standard iostreams use the global locale by default (which in turn should be initialized to the classic (US) locale). Thus the above should work in general unless someone previously has changed the global locale to something else, even if you're running on a non-english platform. To be absolutely sure that the desired locale is used, create a specific locale and "imbue" the stream with that locale before reading from it:

#include <sstream>
#include <locale>

...
float longitude = 0.0f;
std::istringstream istr(pField);

istr.imbue(std::locale("C"));
istr >> longitude;

As a side note, I've usually used regular expressions to validate NMEA fields, extract the different parts of the field as captures, and then convert the different parts using the above method. The portion before the decimal point in an NMEA longitude field actually is formatted as "DDDMM.mmm.." where DDD correspond to degrees, MM.mmm to minutes (but I guess you already knew that).

Logogram answered 26/8, 2009 at 10:59 Comment(6)
It use the global C++ locale. Modifying the global C++ locale modifies the C locale if it has a name -- if it hasn't the effect on the C locale is implementation defined.Rajput
@AProgrammer: Did you actually read and understand my reply before commenting/downvoting?Logogram
@AProgrammer: Ok, re-reading my reply it might not have been very clear. Nevertheless I never suggested changing the global locale, just mentioned that if someone else did, it will have effect on the sample code.Logogram
@Cwe: yes I read it (I even did a +1 -- mainly for mentioning that the data isn't decimal). I confirmed what you where writing -- you expressed a doubt in your first formulation -- and added information about the interaction between C++ and C global locale.Rajput
C++ has a function to get a reference to a C locale directly: std::locale::classic() so temporary creation via std::locale("C") isn't needed.Blowup
Be sure to read this before utilizing this approach: cplusplus.github.io/LWG/lwg-active.html#2381Sonorous
L
7

A nasty solution I've done once is to sprintf() 0.0f and grab the second character from the output. Then in the input string replace '.' by that character. This solves the comma case, but would also work if a locale defined other decimal separators.

Lording answered 26/8, 2009 at 9:47 Comment(1)
localeconv (in <locale.h>) returns a pointer to struct whose decimal_point member contains that value. Note that the pointer is valid until the next localeconv() or setlocale()Rajput
N
7

This question is old, but in the meantime in C++ we got a "locale-independent" atof:

std::from_chars (with its sibling std::to_chars), added in c++17, provide locale-independent float scanning (and formatting). They are located in header <charconv>.

You can read more about them here:

https://en.cppreference.com/w/cpp/utility/from_chars

https://en.cppreference.com/w/cpp/utility/to_chars

I recomment Stephan T. Lavavej wonderful talk about these two tools, here's the link to the part where he talks about using std::from_chars: https://youtu.be/4P_kbF0EbZM?t=1367

And a short example by me:

#include <charconv>
#include <iostream>
#include <system_error>

int main()
{
    char buffer[16] { "123.45678" };
    float result;
    auto [p, ec] = std::from_chars(std::begin(buffer), std::end(buffer), result);
    if(ec == std::errc{})
        std::cout << result;
}

Unfortunately, as for today (05.06.2020) only MSVC supports these functions with floating types. Implementing them efficiently turned out to be a big problem.

@edit (27.04.2021) libstdc++ released today with stable GCC 11.1 adds support for floating-type <charconv>. However, this implementation seems to be not standard-compliant - it needs to copy the text into another buffer and calls strto(f/d/ld) with default C locale and set Floating Environment, taking error from errno. In extremely weird cases it can allocate, throw and catch exceptions underneath. You can find the implementation here: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/src/c%2B%2B17/floating_from_chars.cc#L304

Nu answered 5/6, 2020 at 17:21 Comment(0)
A
2

Any reason why you can't do a setlocale "C" before the atof and restore the locale afterwards? Maybe I misunderstood the question...

Adamite answered 26/8, 2009 at 9:50 Comment(3)
Definitely. I can't risk any impact on other parts of the system and changing locale for sure can affect other processes.Wickerwork
the setlocale call only affects the locale of the current process. If you have other threads which are doing locale-dependent things they would need to be synchronized.Adamite
AFAIK under windows CE locales are global, not replicated per processWickerwork
P
0

You could iterate through all the characters in the array and swap any non-numbers with a . character, which should work as long as the coordinates are in a number-single_delimiter_character_-number format.

Poly answered 26/8, 2009 at 9:53 Comment(2)
Misundrestanding. There always will be single period, but sometimes atof will expect comma and ignore fraction part after period.Wickerwork
Right. In that case, I'd go with MSalters's solution: print a float, get the delimiter, then replace the . with it.Poly
R
0

Do you really need to get locale behavior for numerics? If not

setlocale(LC_ALL|~LC_NUMERIC, "");

or the equivalent use of std::locale constructor.

Rajput answered 26/8, 2009 at 11:22 Comment(0)
V
0

Some of the solutions above did not seem to work, so I propose this as a perfectly failproof solution. Just copy-paste this function and use it instead.

float stor(const char* str) {
    float result = 0;
    float sign = *str == '-' ? str++, -1 : 1;
    while (*str >= '0' && *str <= '9') {
        result *= 10;
        result += *str - '0';
        str++;
    }
    if (*str == ',' || *str == '.') {
        str++;
        float multiplier = 0.1;
        while (*str >= '0' && *str <= '9') {
            result += (*str - '0') * multiplier;
            multiplier /= 10;
            str++;
        }
    }
    result *= sign;
    if (*str == 'e' || *str == 'E') {
        str++;
        float powerer = *str == '-'? str++, 0.1 : 10;
        float power = 0;
        while (*str >= '0' && *str <= '9') {
            power *= 10;
            power += *str - '0';
            str++;
        }
        result *= pow(powerer, power);
    }
    return result;
}
Verbenaceous answered 24/6, 2017 at 21:44 Comment(2)
How does it avoid accumulation of rounding errors? I suspect it doesn't, in which case it's not a real solution.Constraint
Valid point. The single precision variables could be replaced by double precision. My data were quite imprecise, so I didn't need such precision for my application (16-bit float would be enough it was available) and I didn't think of it.Verbenaceous

© 2022 - 2024 — McMap. All rights reserved.