Advantages of Sequence file over hdfs textfile
Asked Answered
C

3

21

What is the advantage of Hadoop Sequence File over HDFS flat file(Text)? In what way Sequence file is efficient?

Small files can be combined and written into a sequence file, but the same can be done for a HDFS text file also. Need to know the difference between the two ways. I have been googling about this for a while, would be helpful if i get clarity on this?

Chamberlain answered 2/8, 2012 at 13:40 Comment(3)
Just some questions for you: Does your textfile has checksums? Does your textfile can be split easily if the records are not in a single line? That's actually the advantage of a sequence file. Besides that your text file are only strings, where you can serialize arbitrary data types in a sequence file.Pierpont
doesn't any block in HDFS have a checksum ?Etti
Yep you're right, that is a feature of the ChecksumFileSystem.Pierpont
E
26
  1. Sequence files are appropriate for situations in which you want to store keys and their corresponding values. For text files you can do that but you have to parse each line.
  2. Can be compressed and still be splittable which means better workload. You can't split a compressed text file unless you use a splittable compression format.
  3. Can be approached as binary files => more storage efficient. In a text file a double will be a number of chars => large storage overhead.
Etti answered 2/8, 2012 at 13:48 Comment(0)
N
2

Advantages of Hadoop Sequence files ( As per Siva's article from hadooptutorial.info website)

  1. More compact than text files
  2. Provides support for compression at different levels - Block or Record etc.
  3. Files can be split and processed in parallel
  4. They can solve large number of small files problem in Hadoop where Hadoop main advantage is processing large file with Map reduce jobs. It can be used as a container for large number of small files
  5. Temporary output of Mapper can be stored in sequential files

Disadvantages:

  1. Sequential files are append only
Northington answered 18/2, 2016 at 10:22 Comment(0)
F
0

Sequence files are intermediate files generated during mapper and reducer phase of MapReduce processing. Sequence file are compressible and fast in processing it is used to write output during mapper and reducer reds from it. There are APIs in Hadoop and Spark to read/write sequence files

Fanjet answered 3/1, 2017 at 12:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.