Read Data from HBase running on EMR Cluster with Spark installed on local machine
Asked Answered
C

0

11

I have HBase running on EMR cluster and I'm trying to access the tables on it with Spark from local machine.

It seems that it connects to Zookeeper but can't find the table I'm looking for.

Here is my code, hbase-site.xml and the messages I get.

package org.apache.spark.examples

    import org.apache.hadoop.fs.Path
    import org.apache.hadoop.hbase.HBaseConfiguration
    import org.apache.hadoop.hbase.client.HBaseAdmin
    import org.apache.hadoop.hbase.mapreduce.TableInputFormat
    import org.apache.spark._
    
    
    
    object HBaseTestEMR {
      def main(args: Array[String]) {
        val sparkConf = new SparkConf().setAppName("HBaseTest").setMaster("local[4]")
        val sc = new SparkContext(sparkConf)
    
        val conf = HBaseConfiguration.create()
     
        val table_name="empl"
        conf.addResource(new Path("/home/spark/development/hbase/conf/hbase-site.xml"))
        conf.set(TableInputFormat.INPUT_TABLE, table_name)
        
        println("-------------1")
        val admin = new HBaseAdmin(conf)
        //println(admin.listTables())
        println("-------------2")
        if (admin.isTableAvailable(table_name))  println("la table existe")
        else println("la table n'existe pas")
        println("-------------3")
    
    
        sc.stop()
    
      }
    }

hbase-site.xml

<configuration>
  <property><name>fs.hdfs.impl</name><value>emr.hbase.fs.BlockableFileSystem</value></property>
  <property><name>hbase.regionserver.handler.count</name><value>100</value></property>
  <property><name>hbase.zookeeper.quorum</name><value>ec2-52-26-***-***.us-west-2.compute.amazonaws.com</value></property>
  <property><name>hbase.rootdir</name><value>hdfs://10.0.0.25:9000/hbase</value></property>
  <property><name>hbase.cluster.distributed</name><value>true</value></property>
  <property><name>hbase.tmp.dir</name><value>/mnt/var/lib/hbase/tmp-data</value></property>
</configuration>

and the message i get

15/06/10 12:00:28 INFO ZooKeeper: Client environment:java.io.tmpdir=/tmp
15/06/10 12:00:28 INFO ZooKeeper: Client environment:java.compiler=<NA>
15/06/10 12:00:28 INFO ZooKeeper: Client environment:os.name=Linux
15/06/10 12:00:28 INFO ZooKeeper: Client environment:os.arch=amd64
15/06/10 12:00:28 INFO ZooKeeper: Client environment:os.version=3.2.0-67-generic
15/06/10 12:00:28 INFO ZooKeeper: Client environment:user.name=spark
15/06/10 12:00:28 INFO ZooKeeper: Client environment:user.home=/home/spark
15/06/10 12:00:28 INFO ZooKeeper: Client environment:user.dir=/home/spark/projetWordCount
15/06/10 12:00:28 INFO ZooKeeper: Initiating client connection, connectString=ec2-52-26-***-***.us-west-2.compute.amazonaws.com:2181 sessionTimeout=90000 watcher=hconnection-0x7ecf3c090x0, quorum=ec2-52-26-***-***.us-west-2.compute.amazonaws.com:2181, baseZNode=/hbase
15/06/10 12:00:28 INFO ClientCnxn: Opening socket connection to server ec2-52-26-***-***.us-west-2.compute.amazonaws.com/52.26.***.***:2181. Will not attempt to authenticate using SASL (unknown error)
15/06/10 12:00:28 INFO ClientCnxn: Socket connection established to ec2-52-26-***-***.us-west-2.compute.amazonaws.com/52.26.***.***:2181, initiating session
15/06/10 12:00:28 INFO ClientCnxn: Session establishment complete on server ec2-52-26-***-***.us-west-2.compute.amazonaws.com/52.26.***.***:2181, sessionid = 0x14ddc7d70ed0023, negotiated timeout = 90000
-------------2

and then nothing happens

so, is it possible to do what i want ? and what part of my configuration is wrong ?

Comfrey answered 10/6, 2015 at 10:29 Comment(3)
used the public IP instead of public DNS on the hosts file and it worked just fine on an ec2 instance with hbase installed on it but didn't work on Emr clusterComfrey
Late response, but I'm wondering if you ever figured this out in EMR? I'm running into the same problem trying to get Spark to connect to HBase located on the same EMR cluster. I'm actually using Phoenix to talk to HBase from Spark but haven't found a way to connect.Coquette
I'm encountering a similar problem, accessing HBase from Spark on EC2 (not using EMR though). Running the same HBase code from outside of Spark, on the same machine, works fine. Would love to hear of any solutions...Hedron

© 2022 - 2024 — McMap. All rights reserved.