How to create RDD object on cassandra data using pyspark
Asked Answered
Z

2

9

I am using cassandra 2.0.3 and I would like to use pyspark (Apache Spark Python API) to create an RDD object from cassandra data.

PLEASE NOTE: I do not want to do import CQL and then CQL query from pyspark API rather I would like to create an RDD on which I woud like to do some transformations.

I know this can be done in Scala but I am not able to find out how this could be done from pyspark.

Really appreciate if anyone could guide me on this.

Zielinski answered 30/12, 2013 at 8:54 Comment(0)
T
2

Might not be relevant to you anymore, but I was looking for the same thing and couldn't find anything which I was happy with. So I did some work on this: https://github.com/TargetHolding/pyspark-cassandra. Needs a lot of testing before use in production, but I think the integration works quite nicely.

Treillage answered 21/2, 2015 at 18:3 Comment(2)
Congrats, since Jun 2015 it seems to be the official way to go?! Last slide of slideshare.net/JonHaddad/intro-to-py-spark-and-cassandraJabber
@user1885518, no, not official by any means :) I don't know of any direct open source alternative to pyspark-cassandra. But it's just out there. Not an official release by Apache / Datastax / Databrix / whomever ...Treillage
M
0

I am not sure if you have looked at this example yet https://github.com/apache/spark/blob/master/examples/src/main/python/cassandra_inputformat.py I have read from Cassandra using a similar pattersn

Melancholic answered 26/10, 2014 at 22:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.