r language support for AWS DynamoDB [duplicate]
Asked Answered
C

2

7

This is a follow up / updated question to this:

AWS dynamodb support for "R" programming language

I am looking for examples or documentation on how to read in a table from DynamoDB into R.

This question pointed me in the right direction:

R + httr and EC2 api authentication issues

(answered by the great @hadley himself!).

It's ok if I have to use httr and then parse a json response, but I can't even figure out how to format the POST request.

Thanks!

Ceria answered 21/9, 2013 at 18:35 Comment(2)
My company (Analytical Flavor Systems) is now working on an open source package for querying DynamoDB from R. I will post the link here when the package is released.Ceria
any updates to your DynamoDB - R package?Parallelepiped
S
3

Repeating my answer from here since someone sent me this page asking a similar question.

Here's a simplified version of what I'm using for reading data from DynamoDB into R. It relies on the fact that R and Python can exchange data, and a library called boto in Python makes it really easy to get data from DynamoDB. It would be neat if this was all an R package, but I won't complain given the 25GB of free storage you can get from Amazon.

First, you need a Python script like so named query_dynamo.py:

import boto3
import time

dynamodb = boto3.resource('dynamodb',
                          aws_access_key_id='<GET ME FROM AWS>',
                          aws_secret_access_key='<ALSO GET ME FROM AWS CONSOLE>',
                          region_name='us-east-1')

table = dynamodb.Table('comment')  ###Your table name in DynamoDB here

response = table.scan()
data = response['Items']

while 'LastEvaluatedKey' in response:
    response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
    data.extend(response['Items'])

Then in R you do this. If you're trying this on Windows, you may want to try rPython-win instead. I did all this on Ubuntu Linux 16.04 LTS.

library(rPython)


python.load("query_dynamo.py")
temp = as.data.frame(python.get('data'))
df = as.data.frame(t(temp))
rm(temp)

Now you'll have a dataframe called "df" with the contents of whatever you put in DynamoDB.

Stasiastasis answered 1/5, 2017 at 12:26 Comment(1)
Just flagged it.Stasiastasis
P
0

You could stream updates to your DynamoDB table with a Lambda function to Connect to Redshift as you would connect to a SQL database, and then perform the big data analytics you need against Redshift.

Priory answered 13/2, 2017 at 0:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.