Pig: loading a data file using an external schema file
Asked Answered
G

2

5

I have a data file and a corresponding schema file stored in separate locations. I would like to load the data using the schema in the schema-file. I tried using

A= LOAD '<file path>' USING PigStorage('\u0001') as '<schema-file path>' 

but get an error.

What is the syntax for correctly loading the file?

The schema file format is something like:

data1 - complex - - - - format - -
data1 event_type - - - - - long - "ends '\001'"
data1 event_id - - - - - varchar(50) - "ends '\001'"
data1 name_format - - - - - varchar(10) - "ends newline"
Gallinule answered 24/11, 2013 at 10:6 Comment(0)
B
6

The AS clause is for specifying the schema directly not the path to the schema file.

 A = LOAD '<file path>' USING PigStorage('\u0001') as 'type: long, id:chararray, nameformat:chararray';

Alternatively, a file named .pig_schema containing the schema and located in your input directory could work as well. Never tried that though. It must be a JSON file with the following syntax:

{"fields":[
        {"name":"type","type":55,"description":"Fu","schema":null},
        {"name":"id","type":15,"description":"Bar","schema":null},
        {"name":"nameFormat","type":55,"description":"Xu","schema":null},
    ] ,"version":0,"sortKeys":[],"sortKeyOrders":[]}

This file is also generated if you specify the -schema option when storing with PigStorage.

Brassbound answered 24/11, 2013 at 14:50 Comment(0)
S
7

It's possible to load data with schema file.

When you store your data with the '-schema' flag, in the output path, there is .pig-schema file that hold json with the schema.

You can use it when loading data

B = LOAD '<>' USING PigStorage(',','-schema'); 

You can see the schema by running

describe A;

Check this good post for more details.

This feature is available beginning with Pig 0.10.

Stranger answered 30/3, 2014 at 11:57 Comment(0)
B
6

The AS clause is for specifying the schema directly not the path to the schema file.

 A = LOAD '<file path>' USING PigStorage('\u0001') as 'type: long, id:chararray, nameformat:chararray';

Alternatively, a file named .pig_schema containing the schema and located in your input directory could work as well. Never tried that though. It must be a JSON file with the following syntax:

{"fields":[
        {"name":"type","type":55,"description":"Fu","schema":null},
        {"name":"id","type":15,"description":"Bar","schema":null},
        {"name":"nameFormat","type":55,"description":"Xu","schema":null},
    ] ,"version":0,"sortKeys":[],"sortKeyOrders":[]}

This file is also generated if you specify the -schema option when storing with PigStorage.

Brassbound answered 24/11, 2013 at 14:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.