SerDe properties list for AWS Athena (JSON)
Asked Answered
T

2

7

I'm testing the Athena product of AWS, so far is working very good. But I want to know the list of SerDe properties. I've searched far and wide and couldn't find it. I'm using this one for example "ignore.malformed.json" = "true", but I'm pretty sure there are a ton of other options to tune the queries.

I couldn't find info for example, on what the "path" property does, so having the full list will be amazing.

I have looked at Apache Hive docs but couldn't find this, and neither on AWS docs/forums.

Thanks!

Tandy answered 22/5, 2017 at 17:18 Comment(0)
W
6

It seems you are using the Openx-JsonSerDe
http://docs.aws.amazon.com/athena/latest/ug/json.html

// properties used in configuration
public static final String PROP_IGNORE_MALFORMED_JSON = "ignore.malformed.json";
public static final String PROP_DOTS_IN_KEYS = "dots.in.keys";
public static final String PROP_CASE_INSENSITIVE ="case.insensitive" ;

https://github.com/rcongiu/Hive-JSON-Serde/blob/master/json-serde/src/main/java/org/openx/data/jsonserde/JsonSerDe.java

Wynnie answered 23/5, 2017 at 7:44 Comment(6)
So there are only three properties? What about the "path" property that I saw in the Amazon docs, where is that?Tandy
Show me the linkTeeters
Will be this one: docs.aws.amazon.com/athena/latest/ug/json.htmlTandy
Seems to me like a dummy properties that have no effect.Teeters
As of 9/2/18, the case insensitive property does not work as AWS does not have the version of Openx-JsonSerDe that implements this property. I can confirm that dots.in.keys works. Additionally, a missing feature not mentioned here is the ability to remap fields with name violations such as foo-bar by adding serde property mapping.foo_bar = foo-bar does work. The schema should then specify foo_bar as the column name and you should delete / re-add all partitions to pull the schema update.Buttery
I couldn't work with caseInsensitive nor remapping in Athena + Glue... I agree with @DavidדודוMarkovitz that they seem dummy properties. Nevertheless, they seem to be in the AWS documentation... anybody could move on since 2018?Sarcastic
S
4

As stated in release notes (see bullet #2 please), the JSON OpenX SerDe used in Athena has been improved. The improvements include, but are not limited to, the following:

  • Support for the ConvertDotsInJsonKeysToUnderscores property. When set to TRUE, it allows the SerDe to replace the dots in key names with underscores. For example, if the JSON dataset contains a key with the name "a.b", you can use this property to define the column name to be "a_b" in Athena. The default is FALSE. By default, Athena does not allow dots in column names.
  • Support for the case.insensitive property. By default, Athena requires that all keys in your JSON dataset use lowercase. Using WITH SERDE PROPERTIES ("case.insensitive"= FALSE;) allows you to use case-sensitive key names in your data. The default is TRUE. When set to TRUE, the SerDe converts all uppercase columns to lowercase.

For more information, see OpenX JSON SerDe in the Amazon Athena User Guide.

Simian answered 8/3, 2019 at 23:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.