Update: SHC now seems to work with Spark 2 and the Table API. See https://github.com/GoogleCloudPlatform/cloud-bigtable-examples/tree/master/scala/bigtable-shc
Original answer:
I don't believe any of these (or any other existing connector) will do all that you would like today.
- spark-hbase will probably the right solution when it is release (HBase 1.4?), but currently only builds at head and is still working on Spark 2 support.
- spark-hbase-connector only seems to support RDD APIs, but since they are more stable, might be somewhat helpful.
- hortonworks-spark/shc probably won't work because I believe it only supports Spark 1 and uses the older HTable APIs which do not work with BigTable.
I would recommend just using HBase MapReduce APIs with RDD methods like newAPIHadoopRDD (or possibly the spark-hbase-connector?). Then manually convert RDDs into DataSets. This approach is a lot easier in Scala or Java than Python.
This is an area that the HBase community is working to improve and Google Cloud Dataproc will incorporate those improvements as they happen.
spark.sparkContext.newAPIHadoopRDD(config, classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])
. How should I use this API for bulk writes ? – Harkins