I am using datetime
in some Python udfs that I use in my pig
script. So far so good. I use pig 12.0 on Cloudera 5.5
However, I also need to use the pytz
or dateutil
packages as well and they dont seem to be part of a vanilla python install.
Can I use them in my Pig
udfs in some ways? If so, how? I think dateutil
is installed on my nodes (I am not admin, so how can I actually check that is the case?), but when I type:
import sys
#I append the path to dateutil on my local windows machine. Is that correct?
sys.path.append('C:/Users/me/AppData/Local/Continuum/Anaconda2/lib/site-packages')
from dateutil import tz
in my udfs.py
script, I get:
2016-08-30 09:56:06,572 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1121: Python Error. Traceback (most recent call last):
File "udfs.py", line 23, in <module>
from dateutil import tz
ImportError: No module named dateutil
when I run my pig script.
All my other python udfs (using datetime
for instance) work just fine. Any idea how to fix that?
Many thanks!
UPDATE
after playing a bit with the python path, I am now able to
import dateutil
(at least Pig does not crash). But if I try:
from dateutil import tz
I get an error.
from dateutil import tz
File "/opt/python/lib/python2.7/site-packages/dateutil/tz.py", line 16, in <module>
from six import string_types, PY3
File "/opt/python/lib/python2.7/site-packages/six.py", line 604, in <module>
viewkeys = operator.methodcaller("viewkeys")
AttributeError: type object 'org.python.modules.operator' has no attribute 'methodcaller'
How to overcome that? I use tz in the following manner
to_zone = dateutil.tz.gettz('US/Eastern')
from_zone = dateutil.tz.gettz('UTC')
and then I change the timezone of my timestamps. Can I just import dateutil to do that? what is the proper syntax?
UPDATE 2
Following yakuza's suggestion, I am able to
import sys
sys.path.append('/opt/python/lib/python2.7/site-packages')
sys.path.append('/opt/python/lib/python2.7/site-packages/pytz/zoneinfo')
import pytz
but now I get and error again
Caused by: Traceback (most recent call last): File "udfs.py", line 158, in to_date_local File "__pyclasspath__/pytz/__init__.py", line 180, in timezone pytz.exceptions.UnknownTimeZoneError: 'America/New_York'
when I define
to_zone = pytz.timezone('America/New_York')
from_zone = pytz.timezone('UTC')
Found some hints here UnknownTimezoneError Exception Raised with Python Application Compiled with Py2Exe
What to do? Awww, I just want to convert timezones in Pig :(
all_timezones_set
. From source code it seems that this exception is either thrown if timezone is not composed of ASCII characters, or is not in known timezones list. Verify if your installation is not corrupted and that this entry is actually located inpytz/__init__.py
file. – DarnelldarnerUS/Eastern
. That should work, right? – Filippapytz
- so yes, that should work. – Darnelldarner