pyspark.sql.utils.AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'
Asked Answered
P

3

5

This has a different answer to those given in the post above

I am getting an error that reads

pyspark.sql.utils.AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'

when I try to read in a parquet file like such using Spark 2.1.0

data = spark.read.parquet('/myhdfs/location/')

I have checked and the file/table is not empty by looking at the impala table through the Hue WebPortal. Also, other files that I have stored in similar directories read absolutely fine. For the record, the file names contain hyphens but no underscores or full-stops/periods.

Hence, none of the answers in the following post apply Unable to infer schema when loading Parquet file

Any ideas?

Proust answered 2/11, 2018 at 16:54 Comment(5)
Have you checked the answers on this post first: #44955392Maureenmaureene
Possible duplicate of Unable to infer schema when loading Parquet fileElectrical
Yeap. I’ve read that and none of the answers apply.Proust
Try reading an individual Parquet file by providing its full path and report the outcome.Flowering
Ah hah! It turns out there was another level in the directory structure!Proust
P
6

It turns out I was getting this error because there was another level to the directory structure. The following was what I needed;

data = spark.read.parquet('/myhdfs/location/anotherlevel/')
Proust answered 6/11, 2018 at 11:21 Comment(0)
C
0

I got the same problem but none of the answers I found online worked for me. It turns out that I was writing the code in this way:

data = spark.read.parquet("/myhdfs/location/anotherlevel/")

so, using double " . When I switched to using single ' , my problem was solved.

data = spark.read.parquet('/myhdfs/location/anotherlevel/')

Sharing in case it helps anybody

Clarkclarke answered 25/3, 2022 at 16:2 Comment(1)
This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. To get notified when this question gets new answers, you can follow this question. Once you have enough reputation, you can also add a bounty to draw more attention to this question. - From ReviewLoser
S
0

For me, it worked when I specified the properties manually like below.

data = spark.read.parquet("/myhdfs/location/anotherlevel/").select( "Property1", "Property2", "Property3" )

Spermic answered 5/9 at 16:28 Comment(1)
Please use code formatting to improve clarity.Ailey

© 2022 - 2024 — McMap. All rights reserved.