Bottom line up front,
I had the same error. This was the solution for me:
!pip install pyarrow==0.13.0
I'm not sure this is limited to Windows 10, I am getting the same error in AWS Sagemaker in the last few days. This was working fine before, on a previous Sagemaker instance.
Using the Conda Packages menu in Jupyter, the conda_python3 kernel showed it had pyarrow 0.13.0 installed from https://repo.anaconda.com/pkgs/main/linux-64, build py36he6710b0_0.
However a subsequent call to
!conda -list
Did not show pyarrow as being in the Jupyter conda_python3 kernel, even after restarting the kernel.
Normally in a Sagemaker [Jupyter notebook] instance, I would use !pip commands because they just seem to work better, and don't have the timeout errors I sometimes find with the Conda Packages menu. (Also I don't need to worry about passing -y
flags, the installs just happen)
Normally !pip install pyarrow
was working, but I noticed it was installing pyarrow 0.15.1 from Nov 1, 2019.
Perhaps there is an error in that version with loading the _orc package, or some other conflicting library.
My intuition is that something is wrong with the conda version of pyarrow 0.13.0, and with pyarrow 0.15.1.
In a Jupyter cell I tried this:
!pip uninstall pyarrow -y
!pip install pyarrow
from pyarrow import orc
Output:
Uninstalling pyarrow-0.15.1:
Successfully uninstalled pyarrow-0.15.1
Collecting pyarrow
Downloading https://files.pythonhosted.org/packages/6c/32/ce1926f05679ea5448fd3b98fbd9419d8c7a65f87d1a12ee5fb9577e3a8e/pyarrow-0.15.1-cp36-cp36m-manylinux2010_x86_64.whl (59.2MB)
|████████████████████████████████| 59.2MB 381kB/s eta 0:00:01
Requirement already satisfied: numpy>=1.14 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from pyarrow) (1.14.3)
Requirement already satisfied: six>=1.0.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from pyarrow) (1.11.0)
Installing collected packages: pyarrow
Successfully installed pyarrow-0.15.1
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-6-36378dee5a25> in <module>()
1 get_ipython().system('pip uninstall pyarrow -y')
2 get_ipython().system('pip install pyarrow')
----> 3 from pyarrow import orc
~/anaconda3/envs/python3/lib/python3.6/site-packages/pyarrow/orc.py in <module>()
23 from pyarrow import types
24 from pyarrow.lib import Schema
---> 25 import pyarrow._orc as _orc
26
27
ModuleNotFoundError: No module named 'pyarrow._orc'
Note that when you try to uninstall pyarrow 0.15.1 and install a specific older version, like 0.13.0, you should restart the kernel after uninstalling. There are some incompatible binaries that get left behind.
I did not post that output because it was so long.
pip uninstall pyarrow -y
Restart Kernel, then:
!pip install pyarrow==0.13.0
from pyarrow import orc
Output:
Collecting pyarrow==0.13.0
Using cached https://files.pythonhosted.org/packages/ad/25/094b122d828d24b58202712a74e661e36cd551ca62d331e388ff68bae91d/pyarrow-0.13.0-cp36-cp36m-manylinux1_x86_64.whl
Requirement already satisfied: numpy>=1.14 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from pyarrow==0.13.0) (1.14.3)
Requirement already satisfied: six>=1.0.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from pyarrow==0.13.0) (1.11.0)
Installing collected packages: pyarrow
Successfully installed pyarrow-0.13.0
There is now no error from the import command, and orc files can be read again.