How to pass date/timestamp as lowerBound/upperBound in spark-sql-2.4.1v with ojdbc14.jar?
Asked Answered
W

2

7

I'm using spark-sql-2.4.1v and ojdbc6.jar to read data from oracle.

Have oracle table as below

create table schema1.modal_vals(
  FAMILY_ID          NOT NULL NUMBER,
  INSERTION_DATE     NOT NULL DATE,
  ITEM_VALUE         VARCHAR2(4000),
  YEAR               NUMBER,
  QUARTER            NUMBER,
  LAST_UPDATE_DATE   DATE
) 

Load sample data :

insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-02","bbb-",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-13","b+",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-17","bbb-",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-13","bb",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-02","ccc-",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-13","aa-",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-OCT-13","a-",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-03","bbb-",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-13","b",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-FEB-03","aa+",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-13","aa+",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JAN-19","aaa+",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-18","ccc-",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"01-MAY-19","bb-",2013,2,null);  

Try to load the data into spark-sql as below :

//please fill the respected oracle details

 DataFrameReader ora_df_reader  = spark.read().format("jdbc")
            .option("url", o_url)  
            .option("driver", Constants.ORACLE_DRIVER)
            .option("user", o_userName)
            .option("password", o_passwd)
            .option("fetchsize",1000);

Dataset<Row> ss = ora_df_reader
            .option("inferSchema", true)
            .option("schema","schema1")
            .option("numPartitions", 20);
             .option("partitionColumn", "INSERTION_DATE");
            .option("lowerBound", "2002-03-31" )
            .option("upperBound", "2019-05-01")
            .option("dateFormat", "yyyy-MM-dd" )// Tried all "yyyy-mm-dd" ,"yyyy-MM-dd" "YYYY-MM-DD" "DD-MMM-YY" "dd-MMM-yy"
            .option("dbtable", "select * from schema1.modal_vals") 
            .load();

Errors when i tried different ways :

Tried 5ht-way :

.option("lowerBound", "2002-03-31 00:00:00"); 
.option("upperBound", "2019-05-01 23:59:59");
.option("timestampFormat", "yyyy-mm-dd hh:mm:ss");
.option("numPartitions", 240);

But gives error :

java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
    at java.sql.Timestamp.valueOf(Timestamp.java:204)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:179)

Tried 2nd-way :

.option("lowerBound","2002-03-31");
.option("upperBound", "2019-05-01");
.option("dateFormat", "yyyy-mm-dd");

I'm getting the error:

ORA-01861: literal does not match format string

How to pass the dates for "lower/upperBound"?

Error :

java.sql.SQLException: ORA-12801: error signaled in parallel query server P001(2)
ORA-01861: literal does not match format string
ORA-02063: preceding 2 lines from CAPDBPROD

    at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)
    at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
    at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:879)
    at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:450)
    at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
    at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
    at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
    at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:884)
    at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167)
    at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289)
    at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3584)
    at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3628)
    at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1493)

As per this fix, it's understood we can use date/timestamp as the partition column: https://issues.apache.org/jira/browse/SPARK-22814

Tried 2nd-way :

.option("lowerBound","03/31/2002 00:00:00"); 
.option("upperBound", "05/01/2019 23:59:59");
.option("dateFormat", "mm/dd/yyyy hh:mm:ss");



java.lang.IllegalArgumentException
    at java.sql.Date.valueOf(Date.java:143)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:178)

Tried 3rd-way :

.option("lowerBound","03/31/2002"); 
.option("upperBound", "05/01/2019");
.option("dateFormat", "mm/dd/yyyy");


java.lang.IllegalArgumentException
    at java.sql.Date.valueOf(Date.java:143)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:178)

Tried 4th-way :

option("lowerBound", "31.03.2002 00:00:00" );
option("upperBound", "01.05.2019 23:59:59");
option("dateFormat", "DD.MM.YYYY HH24:MI:SS")

java.lang.IllegalArgumentException at java.sql.Date.valueOf(Date.java:143) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:178)

Weakminded answered 7/5, 2019 at 10:2 Comment(11)
FYI, you can't use @ to persons that haven't been active on a question/answer (e.g. comment, edit, closing, etc.).Eloign
Also, try changing dateFormat to be "yyyy-MM-dd" (from the github pull request: github.com/apache/spark/pull/21834#issuecomment-489972861).Eloign
@Eloign , same error with .option("dateFormat", "yyyy-MM-dd");Weakminded
You have a date or timestamp? What happens with 03/16/2016 00:00:00 and format mm/dd/yyyy hh:mm:ss?Tomekatomes
@AlexandrosBiratsis , tried Tried 2nd-way & Tried 3rd-way as in the question mentioned , but no luckWeakminded
Hi @Shyam again, could try some more as such DD.MM.YYYY HH24:MI:SS which is oracle's official format? I got it from here #1388417Tomekatomes
@AlexandrosBiratsis, Tried 4th-way , no luck , same errorWeakminded
@AlexandrosBiratsis , any clue of "java.sql.SQLException: ORA-12801: error signaled in parallel query server P001" errorWeakminded
No clue @Shyam unfortunately. Those were all the assumptions I could makeTomekatomes
@AlexandrosBiratsis , these spark guys added some thing in new version but no sample as such ...Weakminded
@764040 can you please help on thisWeakminded
W
3

If we add below it will fix the issue.

option("sessionInitStatement", "ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'"); 
Weakminded answered 15/5, 2019 at 14:17 Comment(0)
J
2

When using timestamp column as partitionColumn, below code worked for me:

spark.read.format("jdbc")\
    .option("url",db_url)\
    .option("dbtable",pushdownquery)\
    .option("user",db_username)\
    .option("password",db_password)\
    .option('partitionColumn',"cdc_time")\
    .option('lowerBound','2023-01-27 00:00:00')\
    .option('upperBound','2023-01-28 00:00:00')\
    .option('numPartitions',5)\
    .option("sessionInitStatement", "ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD HH24:MI:SS'")\
    .option('fetchSize',2000).load()

If you'd like to date instead of timestamp, then add below option:

.option("oracle.jdbc.mapDateToTimestamp", "false")
Juxtapose answered 27/1, 2023 at 23:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.