pyhive connection error: thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
Asked Answered
W

2

13

I'm trying to get a table located in hive (hortonworks) ,to collect some twitter data to implement on a machine learning project, using pyhive since pyhs2 is not supported by python3.6.

Here's my code:

from pyhive import hive
conn = hive.Connection(host='192.168.1.11', port=10000, auth='NOSASL')
import pandas as pd
import sys
df = pd.read_sql("SELECT * FROM my_table", conn)
print(sys.getsizeof(df))
df.head()

Got this error:

Traceback (most recent call last):
File "C:\Users\PWST112\Desktop\import.py", line 44, in <module>
conn = hive.Connection(host='192.168.1.11', port=10000, auth='NOSASL')
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-    packages\pyhive\hive.py", line 164, in __init__
response = self._client.OpenSession(open_session_req)
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site- packages\TCLIService\TCLIService.py", line 187, in OpenSession
return self.recv_OpenSession()
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\TCLIService\TCLIService.py", line 199, in recv_OpenSession
(fname, mtype, rseqid) = iprot.readMessageBegin()
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\thrift\protocol\TBinaryProtocol.py", line 148, in readMessageBegin
name = self.trans.readAll(sz)
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\thrift\transport\TTransport.py", line 60, in readAll
chunk = self.read(sz - have)
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\thrift\transport\TTransport.py", line 161, in read
self.__rbuf = BufferIO(self.__trans.read(max(sz, self.__rbuf_size)))
File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\thrift\transport\TSocket.py", line 132, in read
message='TSocket read 0 bytes')
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
[Finished in 0.3s]

Here's the pip list:

beautifulsoup4 (4.6.0)
bleach (2.0.0)
colorama (0.3.9)
cycler (0.10.0)
decorator (4.0.11)
entrypoints (0.2.3)
ez-setup (0.9)
future (0.16.0)
html5lib (0.999999999)
impala (0.2)
ipykernel (4.6.1)
ipython (6.1.0)
ipython-genutils (0.2.0)
ipywidgets (6.0.0)
jedi (0.10.2)
Jinja2 (2.9.6)
jsonschema (2.6.0)
jupyter (1.0.0)
jupyter-client (5.1.0)
jupyter-console (5.1.0)
jupyter-core (4.3.0)
konlpy (0.4.4)
MarkupSafe (1.0)
matplotlib (2.0.2)
mistune (0.7.4)
nbconvert (5.2.1)
nbformat (4.3.0)
nltk (3.2.4)
notebook (5.0.0)
numpy (1.13.1+mkl)
pandas (0.20.3)
pandocfilters (1.4.1)
pickleshare (0.7.4)
pip (9.0.1)
prompt-toolkit (1.0.14)
pure-sasl (0.4.0)
Pygments (2.2.0)
PyHive (0.5.0)
pyhs2 (0.6.0)
pyparsing (2.2.0)
python-dateutil (2.6.0)
pytz (2017.2)
pyzmq (16.0.2)
qtconsole (4.3.0)
sasl (0.2.1)
scikit-learn (0.18.2)
scipy (0.19.1)
setuptools (28.8.0)
simplegeneric (0.8.1)
six (1.10.0)
testpath (0.3.1)
thrift (0.10.0)
thrift-sasl (0.3.0)
tornado (4.5.1)
traitlets (4.3.2)
wcwidth (0.1.7)
webencodings (0.5.1)
wheel (0.30.0)
widgetsnbextension (2.0.0)

Can anyone please help? I'm using Windows 10.

Many thanks in advance.

Walkout answered 16/11, 2017 at 16:28 Comment(0)
H
0

I'm not sure about the Hortonworks tools, but in general Cloudera connections seem to be having issues with Thrift and Sasl.

I was able to get a SqlAlchemy connection (which uses Thrift) pushing and pulling data with help from this issue over at Cloudera's Impyla module - it's not PyHive but the Thrift Tsocket connection seems to be what's causing the error for your code too. You can try version locking the modules; the downside is it requires Python 2.7.

If you want to test version locking, here's what got me to a working Thrift connection:

pip install thrift==0.9.3
pip install thrift_sasl==0.2.1
pip uninstall sasl && pip install pure-sasl

Hope this helps!

Hood answered 18/4, 2018 at 21:19 Comment(0)
R
0

For those who are encountering same problem. This what works for me: Don't forget to start hiveserver2 and leave session open.

Run command: hive --service hiveserver2

Sample code for connection:

cursor = hive.connect('localhost', port=10000, username="hive").cursor()  # or use hive.connect or use trino.connect
cursor.execute('SELECT * FROM default.employee LIMIT 10')
Rego answered 13/7, 2023 at 12:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.