Pig UDF for iso to yyyy-mm-dd hh:mm:ss.000
Asked Answered
C

3

6

Iam looking to convert the ISO time format to yyyy-mm-dd hh:mm:ss.SSS. However Im not able achive the conversion. Iam new to pig and im trying to write a udf to handle the conversion from ISO format to yyyy-mm-dd hh:mm:ss.SSS.

Kindly guide me I tried the built functions of pig (FORMAT,DATE_FORMAT) however was not able to convert the data to the needed format.

Current data format: 2013-08-22T13:23:18.226220+01:00

Required Data format: 2013-08-22 13:23:18.226

import java.io.IOException;
import java.text.DateFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.EvalFunc;
import org.joda.time.DateTime;
import org.joda.time.format.*;
import org.joda.time.format.DateTimeFormatter;
import org.joda.time.format.DateTimeFormatterBuilder;
public class test extends EvalFunc<String>{

public String exec(Tuple input) throws IOException {

    if ((input == null) || (input.size() == 0))
        return null;
    try{
        String time = (String)input.get(0);
         DateFormat dt = new SimpleDateFormat ("yyyy-mm-dd hh:mm:ss.SSS");
         Date d_t = dt.parse(time);
         String timedt = getTimedt(d_t);
         return timedt; 
    } catch (ParseException e) {

        return null;
    }


}

private String getTimedt(Date d_t) {
     DateTimeFormatterBuilder formatter =  new DateTimeFormatterBuilder();   

    } 
}

How can i deal with the date conversions in pig?

Claudieclaudina answered 6/9, 2013 at 11:33 Comment(1)
Is a UDF required to perform this task? I'm currently facing this issue myself. I've seen some talk of PiggyBank UDFs that might accomplish this, but maybe those are no longer needed in 0.11?Construe
C
7

With pig 0.11.1, a UDF is not required to convert from ISO 8601 format to yyyy-mm-dd hh:mm:ss.SSS format. Following is example code that shows how to convert a column of ISO 8601 format dates into yyyy-MM-dd HH:mm:ss.SSS dates.

converted_dates = FOREACH input_dates GENERATE ToString(date,'yyyy-MM-dd HH:mm:ss.SSS') as date:chararray;


NOTE:

I don't think the ToString function is documented... I guessed at this usage from this Google SOC proposal:

http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/zjshen/21002

where the following function is mentioned as needing to be converted from a piggybank UDF into a built-in.

String ToString(DateTime d, String format)

My guess is that it was converted, but hasn't made its way into the main documentation yet. Here is the class documentation for the ToString built-in:

http://pig.apache.org/docs/r0.11.1/api/org/apache/pig/builtin/ToString.html

But we can see that the ToString function is missing from apache's pig documentation here:

http://pig.apache.org/docs/r0.11.1/func.html

Construe answered 11/9, 2013 at 6:25 Comment(1)
See issues.apache.org/jira/browse/PIG-3349 for more info. In that thread they notice that ToString(datetime, format_string) is not documented, and commit the doc change for version 0.12.Donell
A
1

2013-08-22T13:23:18.226220+01:00 is XSD dateTime format and it should be parsed this way

XMLGregorianCalendar xc = DatatypeFactory.newInstance().newXMLGregorianCalendar("2013-08-22T13:23:18.226220+01:00");

from XMLGregorianCalendar you can get GregorianCalendar and then java.util.Date

GregorianCalendar gc = xc.toGregorianCalendar
Date date = gc.getTime();

Note that 226220 is fractional second. If you try to parse it with SimpleDateFormat as SSS it will parse it as 226220 milliseconds and it will be 226 secs 220 ms instead of 0.2226220 sec

Alethaalethea answered 6/9, 2013 at 11:58 Comment(0)
S
0
    DateFormat dffrom = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS");
    DateFormat dfto = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
    //TimeZone zone = TimeZone.getTimeZone("America/Los_Angeles");
    //dfto.setTimeZone(zone);
    Date date = dffrom.parse("2013-08-22T13:23:18.226220+01:00");    
    //2013-08-22T13:23:18.226220+01:00
    String s = dfto.format(date);
    System.out.println(s);
Swarth answered 6/9, 2013 at 11:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.