AWS Glue unable to access input data set
Asked Answered
E

2

5

I have a dataset registered in Glue / Athena, call it my_db.table. I'm able to query it via Athena and everything generally seems to be in order.

I'm trying to use this table in a Glue job, but am getting the following fairly opaque error message:

py4j.protocol.Py4JJavaError: An error occurred while calling o54.getCatalogSource.
: java.lang.Error: No classification or connection in my_db.table

This would appear to indicate that Glue can't see the catalog entry for my table, or can't use the information in that entry, but I don't have any further visibility than that.

Has anyone experience with this error and what might be causing it?

Elimination answered 7/9, 2017 at 21:59 Comment(1)
Have you tried creating table with Glue Crawler? In my project only the tables created by crawler work properly with Jobs. Also the tables generated by Glue Crawler are readable by AthenaMaryleemarylin
E
8

The error message actually describes the problem well - there was no classification for the table being queried.

Tables created via Glue are registered with a Classification - csv, parquet, orc, avro, json. See Creating Tables Using Athena for AWS Glue Jobs.

The table I created 'manually' via Athena did not have a classifcation. See the below screenshot from the Glue 'tables' page.

enter image description here

The solution is easy: at the end of the CREATE TABLE script user must append a classification property like so

CREATE EXTERNAL TABLE IF NOT EXISTS my_db.my_table (
  `id` int,
  `description` string 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = ',',
  'field.delim' = ',',
  'collection.delim' = 'undefined',
  'mapkey.delim' = 'undefined'
) LOCATION 's3://my_bucket/'
TBLPROPERTIES ('classification'='csv');

Now the table has a classification within the Glue interface and is accessible via a Glue job.

Elimination answered 8/9, 2017 at 17:16 Comment(4)
This did not work for me - adding the 'classification' tblproperty had no effect on whether or not I could read the table via glueContext.create_dynamic_frame.from_catalog. Only tables created by the Glue crawler seemed to work. I even tried making all the properties of my Athena table match exactly those of the Glue crawler table and it still wouldnt work. Still trying to find a solution...Olson
Interesting - the above didnt work for a CSV table but it did work for a parquet table. Hmm...Olson
It does not seem to work for Views either, fyi -- I'm hoping to pull data from a view in an ETL job, if anyone has managed to do that.Concentre
I’m having this problem too using an Athena view (simple union of two tables). Both tables have the ‘classification = csv’ table property.Kinna
J
1

Need to add the classification in the table you've created. To add it via UI follow these steps:

  1. Go to the table in glue:

enter image description here

  1. click on Edit Table and add it as shown in image: enter image description here
Jaffe answered 8/1, 2022 at 16:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.