I have next directory structure in HDFS:
logs_folder
|---2021-03-01
|---log1
|---log2
|---log3
2021-03-02
|---log1
|---log2
2021-03-03
|---log1
|---log2
...
Logs are made up of text data. There is no date in the data because it is already in the folder name. I want to read all the logs and save them in the following format:
date id
where id - field from the log, but I need to take the date from the folder name. Expected output:
2021-03-01 id1
2021-03-01 id2
...
2021-03-02 id234
2021-03-02 id456
...
How to add date from folder name to output?
I found close question how to add full pathname to data on reading:
A = LOAD '/logs_folder/*' using PigStorage(',','-tagPath');
DUMP A ;
How can I incorporate the current input filename into my Pig Latin script?
It is very close, but how to get parent folder name only instead of full path?