Invalid digits on Redshift

Asked 19/3, 2018 at 21:11 Answered 10/6, 2022 at 18:0

I'm trying to load some data from stage to relational environment and something is happening I can't figure out.

I'm trying to run the following query:

SELECT
  CAST(SPLIT_PART(some_field,'_',2) AS BIGINT) cmt_par
FROM
  public.some_table;

The some_field is a column that has data with two numbers joined by an underscore like this:

some_field -> 38972691802309_48937927428392

And I'm trying to get the second part.

That said, here is the error I'm getting:

[Amazon](500310) Invalid operation: Invalid digit, Value '1', Pos 0, 
Type: Long 
Details: 
 -----------------------------------------------
  error:  Invalid digit, Value '1', Pos 0, Type: Long 
  code:      1207
  context:   
  query:     1097254
  location:  :0
  process:   query0_99 [pid=0]
  -----------------------------------------------;

Execution time: 2.61s
Statement 1 of 1 finished

1 statement failed.

It's literally saying some numbers are not valid digits. I've already tried to get the exactly data which is throwing the error and it appears to be a normal field like I was expecting. It happens even if I throw out NULL fields.

I thought it would be an encoding error, but I've not found any references to solve that. Anyone has any idea?

Thanks everybody.

Crepitate answered 19/3, 2018 at 21:11 Comment(0)

I just ran into this problem and did some digging. Seems like the error Value '1' is the misleading part, and the problem is actually that these fields are just not valid as numeric.

In my case they were empty strings. I found the solution to my problem in this blogpost, which is essentially to find any fields that aren't numeric, and fill them with null before casting.

select cast(colname as integer) from
(select
 case when colname ~ '^[0-9]+$' then colname
 else null
 end as colname
 from tablename);

Bottom line: this Redshift error is completely confusing and really needs to be fixed.

Balneology answered 17/10, 2018 at 18:41 Comment(2)

Warning: ^ and $ match beginning and end of line. So, if you have a value like a multi-line address, the above regex could match non-numeric values. To avoid this, use use !~ '[^0-9]' instead of ~ '^[0-9]+$'. – Frequentation 2/1, 2020 at 14:53

@KevinBorders Good point. In my case, the goal was to find fields that are not numeric. – Balneology 9/11, 2020 at 22:37

If you get error message like “Invalid digit, Value ‘O’, Pos 0, Type: Integer” try executing your copy command by eliminating the header row. Use IGNOREHEADER parameter in your copy command to ignore the first line of the data file. So the COPY command will look like below:

COPY orders FROM 's3://sourcedatainorig/order.txt' credentials 'aws_access_key_id=<your access key id>;aws_secret_access_key=<your secret key>' delimiter '\t' IGNOREHEADER 1;

Unbounded answered 3/1, 2022 at 4:12 Comment(0)

Hmmm. I would start by investigating the problem. Are there any non-digit characters?

SELECT some_field
FROM public.some_table
WHERE SPLIT_PART(some_field, '_', 2) ~ '[^0-9]';

Is the value too long for a bigint?

SELECT some_field
FROM public.some_table
WHERE LEN(SPLIT_PART(some_field, '_', 2)) > 27

If you need more than 27 digits of precision, consider a decimal rather than bigint.

Cocoa answered 19/3, 2018 at 21:17 Comment(2)

The two queries have returned empty. All characters are digits, and the biggest number has 17 digits. That's why I'm thinking about encoding, but even it has no sense to me neither. – Cervantes 20/3, 2018 at 16:16

@MaurícioBorges . . . This is very curious. – Cocoa 21/3, 2018 at 1:37

For my Redshift SQL, I had to wrap my columns with Cast(col As Datatype) to make this error go away.

For example, setting my columns datatype to Char with a specific length worked:

Cast(COLUMN1 As Char(xx)) = Cast(COLUMN2 As Char(xxx))

Sigrid answered 4/10, 2019 at 11:21 Comment(0)

When you are using a Glue job to upsert data from any data source to Redshift:

Glue will rearrange the data then copy which can cause this issue. This happened to me even after using apply-mapping.

In my case, the datatype was not an issue at all. In the source they were typecast to exactly match the fields in Redshift.

Glue was rearranging the columns by the alphabetical order of column names then copying the data into Redshift table (which will obviously throw an error because my first column is an ID Key, not like the other string column).

To fix the issue, I used a SQL query within Glue to run a select command with the correct order of the columns in the table.. It's weird why Glue did that even after using apply-mapping, but the work-around I used helped.

For example: source table has fields ID|EMAIL|NAME with values 1|[email protected]|abcd and target table has fields ID|EMAIL|NAME But when Glue is upserting the data, it is rearranging the data by their column names before writing. Glue is trying to write [email protected]|1|abcd in ID|EMAIL|NAME. This is throwing an error because ID is expecting a int value, EMAIL is expecting a string. I did a SQL query transform using the query "SELECT ID, EMAIL, NAME FROM data" to rearrange the columns before writing the data.

Caveator answered 10/6, 2022 at 18:0 Comment(5)

I think your answer could be much more effective if there was more detail about the query you were able to use to fix the issue. Something more than just text about what you did. Perhaps even serving better as an answer to a different question. – Aurilia 11/6, 2022 at 4:53

For say: source table has fields ID|EMAIL|NAME with values 1|[email protected]|abcd and target table has fields ID|EMAIL|NAME But when Glue is upserting the data, it is rearranging the data by their column names before writing. Glue is trying to write [email protected]|1|abcd in ID|EMAIL|NAME. This is throwing an error because ID is expecting a int value, EMAIL is expecting a string. I did a SQL query transform using the query "SELECT ID, EMAIL, NAME FROM data" to rearrange the columns before writing the data – Caveator 11/7, 2022 at 19:18

@Caveator - I would just update your original answer to have this above comment. This is totally fixed my problem and this was the only place on the internet I could find that had any hints of the issue I was running into. Honestly... I think it may be due to the fact that many people are using SQL operations in the glue jobs anyway - BUT for those of us trying to tackle things mainly in scala/pyspark - this created a huge headache that seemed to have no solution. – Putty 20/8, 2022 at 15:29

@Caveator Your comment is saving my life. Thank you millions <3 – Tuber 31/10, 2022 at 22:22

He doesn't even mention Glue. He's talking about tables already existing within Redshift, not a job/query failing in a glue job. It's a datatype issue, not a glue issue. – Circularize 29/12, 2022 at 14:54

When you are using a Glue job to upsert data from any data source to Redshift:

Recommended topics

Hot tags