Hive : How to explode a JSON column with an array, and embedded in a CSV file?
Asked Answered
C

1

6

From a CSV file (with a header and a pipe delimiter) I've got the following content which contains a JSON column (with a collection inside), like this:

ProductId|IngestTime|ProductOrders
9180|20171025145034|[{"OrderId":"299","Location":"NY"},{"OrderId":"499","Location":"LA"}]
8251|20171026114034|[{"OrderId":"1799","Location":"London"}]

What I need is to create a SELECT Hive query which returns:

ProductId  IngestTime      OrderId        OrderLocation
9180       20171025145034  299            NY
9180       20171025145034  499            LA
8251       20171026114034  1799           London

So far, I tried many combinations by using 'explode', 'get_json_object' and so on, but I still haven't found the right SQL query.

Have you got a solution ?

Thanks a lot for your help :-)

Copalite answered 25/10, 2017 at 21:35 Comment(0)
S
1

I was having similar kind of requirement. The solution from this link helped me solve it. BTW, below is the query for your requirement assuming all the columns in your DB_TABLE are of type 'String'.

    SELECT ProductId,
       IngestTime,
       split(split(results,",")[0],':')[1] AS OrderId,
       regexp_replace(split(split(results,",")[1],':')[1], "[\\]|}]", "") AS OrderLocation
    FROM
       (SELECT ProductId,
             IngestTime,
             split(translate(ProductOrders, '"\\[|]|\""',''), "},") AS r
       FROM DB_TABLE) t1 LATERAL VIEW explode(r) rr AS results
Steinberg answered 29/5, 2018 at 10:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.