python - locale in dateutil / parser
Asked Answered
H

3

6

I set

locale.setlocale(locale.LC_TIME, ('de', 'UTF-8'))

the string to parse is:

Montag, 11. April 2016 19:35:57

I use:

note_date = parser.parse(result.group(2))

but get the following error:

Traceback (most recent call last): File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1531, in globals = debugger.run(setup['file'], None, None, is_module) File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 938, in run pydev_imports.execfile(file, globals, locals) # execute the script File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/Users/adieball/Dropbox/Multiverse/Programming/python/repositories/kindle/kindle2en.py", line 250, in main(sys.argv[1:]) File "/Users/adieball/Dropbox/Multiverse/Programming/python/repositories/kindle/kindle2en.py", line 154, in main note_date = parser.parse(result.group(2)) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/dateutil/parser.py", line 1164, in parse return DEFAULTPARSER.parse(timestr, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/dateutil/parser.py", line 555, in parse raise ValueError("Unknown string format") ValueError: Unknown string format

a debug show that parser is not using the "correct" dateutil values (german), it's still using the english ones.

enter image description here

I'm sure I'm missing something obvious here, but can't find it.

Thanks.

Heiney answered 27/5, 2016 at 13:48 Comment(0)
D
5

dateutil.parser doesn't use locale. You'll need to subclass dateutil.parser.parserinfo and construct a German equivalent:.

from dateutil import parser

class GermanParserInfo(parser.parserinfo):
    WEEKDAYS = [("Mo.", "Montag"),
                ("Di.", "Dienstag"),
                ("Mi.", "Mittwoch"),
                ("Do.", "Donnerstag"),
                ("Fr.", "Freitag"),
                ("Sa.", "Samstag"),
                ("So.", "Sonntag")]

s = 'Montag, 11. April 2016 19:35:57'
note_date = parser.parse(s, parserinfo=GermanParserInfo())

You'd need to extend this to also work for other values, such as month names.

Detta answered 27/5, 2016 at 14:3 Comment(1)
If I recall correctly, there are downstream libraries that do handle other locales, such as dateparser. Probably better to use one of those than to maintain your own library of locale edge cases.Joel
T
3

In another answer, I answered a simple Locale aware parseinfo class. This isn't a complete solution for all languages in the world, but solved all my localization problems.

Here it is:

import calendar
from dateutil import parser
    
class LocaleParserInfo(parser.parserinfo):
    WEEKDAYS = list(zip(calendar.day_abbr, calendar.day_name))
    MONTHS = list(zip(calendar.month_abbr, calendar.month_name))[1:]

And you can use:

In [1]: import locale;locale.setlocale(locale.LC_ALL, "pt_BR.utf8")
In [2]: from localeparserinfo import LocaleParserInfo                                   

In [3]: from dateutil.parser import parse                                                

In [4]: parse("Ter, 01 Out 2013 14:26:00 -0300", parserinfo=LocaleParserInfo())              
Out[4]: datetime.datetime(2013, 10, 1, 14, 26, tzinfo=tzoffset(None, -10800))

Test it and take a look the class variables in the original parseinfo, specially the HMS variable. Maybe'll need to declare other variables.

Thrips answered 26/6, 2020 at 16:1 Comment(1)
For locale fr define months without the abbreviation dot, e.g.MONTHS=list(zip(map(lambda x: x.replace(".", ""), calendar.month_abbr), calendar.month_name))[1:].Priam
D
0

Multiple languages - Allow english and german month names

The way to implement multiple languages at once. I know there are other possibilites with calendar.day_abbr, calendar.day_name but this was the most convinient one for me. Just combine all the month names and list them down all togehter. These then will get accepted by the dateutil.parser

from dateutil import parser as dateparser

class LocaleParserInfo(dateparser.parserinfo):
        MONTHS = [('Jan', 'Januar', 'January', 'Jänner'),
                  ('Feb', 'Februar', 'February'),
                  ('Mrz', 'März', 'March', 'Mar'),
                  ('Apr', 'April'),
                  ('Mai', 'May'),
                  ('Jun', 'Juni', 'June'),
                  ('Jul', 'Juli', 'July'),
                  ('Aug', 'August'),
                  ('Sep', 'September'),
                  ('Okt', 'Oktober', 'October', 'Oct'),
                  ('Nov', 'November'),
                  ('Dez', 'Dezember', 'Dec', 'December')]

parsed_date = dateparser.parse("31 Jänner 2022", dayfirst=True, parserinfo=LocaleParserInfo())
parsed_date = dateparser.parse("31.December 2022", dayfirst=True, parserinfo=LocaleParserInfo())
Dianthe answered 9/9, 2022 at 14:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.