AWS dynamodb support for "R" programming language
Asked Answered
F

4

7

Has anyone been able to successfully CRUD records in amazon dynamodb using the R programming language? I found this reference of language bindings supported:

http://aws.typepad.com/aws/2012/04/amazon-dynamodb-libraries-mappers-and-mock-implementations-galore.html

Alas, no R. We are considering using dynamodb for a large scale data project, but our main analyst is most comfortable in R, so we are exploring our options.

Frankel answered 8/1, 2013 at 22:11 Comment(4)
I've considered creating an r package for this purpose, but ultimately decided to use php instead. Here is the documentation you would have to follow to create your own package, I think: docs.aws.amazon.com/amazondynamodb/latest/developerguide/… You will probably want to use the RCurl package as assistance in accessing the APINielson
I had a quick look at the python code for dynamodb, and it seems you first have to implement a lot of lower-level general amazon API access stuff. I doubt anyone's done this in R yet. You could use system to call some python code, but you'd lose some efficiency.Stander
It will probably be easier to use existing R libraries to work with S3 or HDFS on EMR, and export DynamoDB to thereMandimandible
One potential strategy would be to build an R wrapper package around calling the AWS CLI utility (github.com/aws/aws-cli). That would allow you to bypass the generic AWS API authentication and signing. It would bring a performance penalty as compared to a native R implementation, but could be a quick way to get started. Another alternative is to use rPython (rpython.r-forge.r-project.org) to call functions that use the boto Python module (docs.pythonboto.org/en/latest/ref/dynamodb.html) to do the heavy lifting.Wageworker
A
3

Here's a simplified version of what I'm using for reading data from DynamoDB into R. It relies on the fact that R and Python can exchange data, and a library called boto in Python makes it really easy to get data from DynamoDB. It would be neat if this was all an R package, but I won't complain given the 25GB of free storage you can get from Amazon.

First, you need a Python script like so named query_dynamo.py:

import boto3
import time

dynamodb = boto3.resource('dynamodb',
                          aws_access_key_id='<GET ME FROM AWS>',
                          aws_secret_access_key='<ALSO GET ME FROM AWS CONSOLE>',
                          region_name='us-east-1')

table = dynamodb.Table('comment')  ###Your table name in DynamoDB here

response = table.scan()
data = response['Items']

while 'LastEvaluatedKey' in response:
    response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
    data.extend(response['Items'])

Then in R you do this. If you're trying this on Windows, you may want to try rPython-win instead. I did all this on Ubuntu Linux 16.04 LTS.

library(rPython)


python.load("query_dynamo.py")
temp = as.data.frame(python.get('data'))
df = as.data.frame(t(temp))
rm(temp)

Now you'll have a dataframe called "df" with the contents of whatever you put in DynamoDB.

Angora answered 1/5, 2017 at 12:22 Comment(4)
Unfotunately, I can't get this to work. It appears that the code executes from withing R, but I can't seem to get the python result to return to R.Immolate
Add "print(data)" without quotes to the bottom of the python script and then run it alone. My first guess would be the "data" object isn't getting populated.Angora
OK, so after a lot of hours and back and forth testing, I got this to work on windows, but only partially. I have detailed the behavior I'm seeing on the developer's website: github.com/cjgb/rPython-win/issues/16. I'd like to know if you experience these same problems. Thanks again for your help @CalZ!Immolate
I did all this on Ubuntu Linux running within a VM, so I never encountered that data type issue. I found several mentions that RPython-win didn't work nearly as well as regular RPython. My best suggestion would be to install VirtualBox + Ubuntu Linux for free :)Angora
M
2

For anyone who comes across this, there is now the Paws package, an AWS SDK for R. You can install it with install.packages("paws").

Disclaimer: I am a maintainer of the Paws package.

For example:

# Create a client object.
svc <- paws::dynamodb()

# This example retrieves an item from the Music table. The table has a
# partition key and a sort key (Artist and SongTitle), so you must specify
# both of these attributes.
item <- svc$get_item(
  Key = list(
    Artist = list(
      S = "Acme Band"
    ),
    SongTitle = list(
      S = "Happy Day"
    )
  ),
  TableName = "Music"
)

# This example adds a new item to the Music table.
svc$put_item(
  Item = list(
    AlbumTitle = list(
      S = "Somewhat Famous"
    ),
    Artist = list(
      S = "No One You Know"
    ),
    SongTitle = list(
      S = "Call Me Today"
    )
  ),
  ReturnConsumedCapacity = "TOTAL",
  TableName = "Music"
)
Muns answered 17/12, 2020 at 3:35 Comment(1)
Could this be expanded to read a scan into a data frame?Elver
L
1

There are several approaches to this... let me add two:

1- EMR with Hive and Streaming.

Hive would be used to query DynamoDB and that could be used as an input to Haddop Streaming, that can be used with any language that can read and write from standard IO, including R.

Of course that would be very different from your typical R program and environment, but it would leverage the "big data" tools.

2- R-ish in the JVM

If you use an R interpreter for the JVM (such as Renjin) or an similar language in the JVM, you would be able to use the AWS Java SDK and DynamoDB libraries directly, and that might be much more familiar to the developer, but you'd be responsible for handling the "bigness" of your data.

Lys answered 8/4, 2014 at 15:0 Comment(0)
G
1

Cloudyr's aws.dynamodb is convenient for reading data from DynamoDB. However, it has an unfortunately tendency to coerce things to characters. Also, I have had trouble using the put_item function to add anything but string data to DynamoDB.

AWS CLI works well. Example here:

$ aws dynamodb put-item --table-name "SOMETABLE" --item '{"aStringItem": {"S": "1900-01-02|myid"}, "aNumericItem": {"N": "2"}, "aMapItem": {"M": {"Source": {"S": "CLI"}}}}'

Two other options that haven't been mentioned are Rcpp and rJava. There are native SKDs available in both Java and C++.

Guesstimate answered 20/10, 2018 at 21:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.