GlusterFS as the backend for Hadoop

About

Asked 12/11, 2015 at 6:52 Answered 8/12, 2015 at 12:0

I've seen redhat has come up one possible solution with GlusterFS working as the backend for hadoop. In this case, you can get ride of the namenode/datanode architecture and replace it with glusterfs, meanwhile you still have Hadoop Mapreduce api-compatibility.

Just wondering how does the performance compare against native-HDFS? Is it really production ready? Does it support all the hadoop ecosystem as well? e.g. Solr Cloud, Spark, Impala etc etc.

Broughton answered 12/11, 2015 at 6:52 Comment(0)

disclaimer: I work for Storage vendor. Well. I don't know much about GlusterFS in particular but i can speak about Lustre as it's POSIX at the end of the day. It's parallel filesystem, but the benchmarks i looked into recently showed it does outperform HDFS. but it's definitely a production ready alternative that offers a single name space for your data (no more HDFS ingestion)

What does work from Hadoop ecosystem today? what I've seen in the production today is Spark,Hive,Hbase. Imapala looks to me it require certain parts of HDFS, this is why it doesn't work with POSIX FS and it's not HCFS. I did a quick test and i was able to create the database and everything, but i wasn't able to fetch any rows.

Let me if you need further help.

Bog answered 8/12, 2015 at 12:0 Comment(3)

Can you be a bit more specific on why it outperform HDFS? and what what parts of HDFS are required for some frameworks, e.g. Impala. – Broughton 10/12, 2015 at 3:41

The benchmarks that I've seen shows that Lustre have less query execution time compared to HDFS. the whole idea of going with POSIX filesystem mainly focus on the following points 1- You skip the part of ingesting data into HDFS (this can take forever if you have a very large dataset). 2- you loose a lot of disk capacity with HDFS, POSIX FS implementation rely on enterprise RAID protection. For Impala, i am not sure what parts of the code need HDFS but i don't know as of today any Imapla runs on POSIX FS – Bog 13/12, 2015 at 18:10

Thanks very much for the explanation. – Broughton 14/12, 2015 at 5:43

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags