AWS Kinesis Firehose not inserting data in Redshift
Asked Answered
D

5

20

I try to have a Kinesis Firehose pushing data in a Redshift table.

The firehose stream is working and putting data in S3.

But nothing arrive in the destination table in Redshift.

  • In the metrics DeliveryToRedshift Success is 0 (DeliveryToRedshift Records is empty)
  • The load logs (redshift web console) and STL_LOAD_ERRORS table are empty.
  • I checked that Firehose is able to connect to Redshift (I see the connections in STL_CONNECTION_LOG)

How can I troubleshoot this ?

Dilettante answered 10/12, 2015 at 16:46 Comment(2)
Also experiencing this. Made sure my grant options were set correctly as well and still running into this problem.Bangalore
Edit: Tried superuser and it worked, so not sure why a non-superuser with INSERT privileges doesn't work??Bangalore
D
44

In the end, I made it work by deleting and re-creating the Firehose stream :-/ Probably the repeated edits via the web console made the thing unstable.

But here are troubleshooting guidelines :

  • A good start point is this procedure : http://docs.aws.amazon.com/firehose/latest/dev/troubleshooting.html
  • Check that data is arriving in S3
  • At this point, the "DeliveryToS3 Success" metric in Firehose monitoring should be non-zero
  • The Redshift cluster must be publicly accessible (see cluster web console)
  • The security group of the cluster must allow inbound traffic from Firehose IP addresses : Firehose currently uses one CIDR block for each available AWS Region:
    • 52.70.63.192/27 for US East (N. Virginia)
    • 52.89.255.224/27 for US West (Oregon)
    • 52.19.239.192/27 for EU (Ireland)
  • Double check the redshift user/password you gave to Firehose
  • At this point, you should be able to see the connection attempts in Redshift logs :

    select * from stl_connection_log where remotehost like '52%' order by recordtime desc;  
    
  • Check that the Redshift user used by Firehose has enough privileges on the target table :

    select tablename, 
       HAS_TABLE_PRIVILEGE(tablename, 'select') as select,
       HAS_TABLE_PRIVILEGE(tablename, 'insert') as insert,
       HAS_TABLE_PRIVILEGE(tablename, 'update') as update,
       HAS_TABLE_PRIVILEGE(tablename, 'delete') as delete, 
       HAS_TABLE_PRIVILEGE(tablename, 'references') as references 
    from pg_tables where schemaname='public' order by tablename;
    
  • Then you can check if the COPY command is run :

    select * from stl_query order by endtime desc limit 10;
    
  • Then check load errors, or server errors :

    select * from stl_load_errors  order by starttime desc;
    select * from stl_error where userid!=0 order by recordtime desc;
    
  • If you have format problems in your data, or in the COPY options, or a mismatch between your data and the target columns, you should at least see the COPY attempts, and some load errors.

  • If you're still stuck, with nothing appearing in those log tables, try deleting and recreating the whole firehose stream, as there may be some bugs related to the web console. (This step worked for me)

Dilettante answered 11/12, 2015 at 10:56 Comment(7)
Thank you. This was super helpful! Other sticking points I had: 1. you have to manually create a table in redshift and the column names must match your data (if you use JSON 'auto' copy options). 2. if your column values are huge, your columns should be created to account for that size. eg. create table test_table (key VARCHAR(255),value VARCHAR(4000))Congregationalism
The IP address needed to be added for inbound rule (in security group) for me. These is no way these errors are logged. Kinesis should improve on error loggingMaggio
Recreating the firehose stream helped in my case. As I changed the firehose config, events suddenly stopped getting COPYd in redshift (tough s3 was always receiving events).Stile
After more digging, my real issue was what you pointed out with security group. Because my cluster is in Ireland, adding 52.19.239.192/27 to the vpc inbound rules (AWS VPC > select current VPC, tab inbound rules) fixed the problem. I think recreating the firehose (see previous comment) also worked because AWS applied this rule automatically.Stile
Interestingly creating firehose stream solved it for me too!Geoff
When checking for connection attempts, I had to modify LIKE to use '%52%' because with IPv6 support the content appears different now: "::ffff:52.70.63.207"Quimby
grant all on <TABLE_NAME> to <USER> solved it for meTabshey
G
8

Go to the IAM role(firehose_delivery_role) auto-created during the Kinesis Firehose setup and make sure that the following roles are attached:

AmazonS3FullAccess
AmazonRedshiftFullAccess
AmazonKinesisFullAccess
AmazonKinesisFirehoseFullAccess

There is a bug that omits the S3 credentials in IAM, leaving the Kinesis setup unable to work.

Also verify that you in fact see the data files accumulating in S3.

Glioma answered 10/12, 2015 at 18:21 Comment(0)
J
6

Wanted to post my case for anyone who's lost here.

We were enforcing server-side encryption calls to our S3 bucket via the directions posted here: http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html

The problem with using this policy on a bucket that Firehose uses is that Firehose creates unencrypted manifest files as well as encrypted data files. If this policy is in place, the manifest files will not be created on S3 and thus, will not trigger the Redshift loads. So our behavior showed data arriving to the bucket, but no manifest files, ergo no data loads.

Jeanelle answered 11/8, 2017 at 21:17 Comment(0)
G
1

During Kinesis Firehos setup, use the Redshift masteruser credentials. Any other user will not work.

Glioma answered 10/12, 2015 at 20:56 Comment(1)
You should specify a specific user with proper rights. Using the master user should be done only when managing other AWS accounts in IAM.Deprecative
Q
0

If there are multiple sources streaming data into the same firehose and if you are testing through test data then the json text of your demo data may get modified and it will not match the redshift table schema so it will not load data into the redshift table.

Quadrate answered 21/9, 2022 at 15:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.