hdfs Questions

7

I opened up localhost:9870 and try to upload a txt file to the hdfs. I see the error message below Failed to retrieve data from /webhdfs/v1/?op=LISTSTATUS: Server Error
Blessed asked 11/2, 2018 at 20:4

2

Solved

I know that both Under-replicated blocks and Mis-replicated blocks occur due to lesser data node count with respect to replication factor set. But what is the difference between them? On re-sett...
Kaka asked 13/10, 2016 at 9:52

4

Solved

Hadoop has configuration parameter hadoop.tmp.dir which, as per documentation, is `"A base for other temporary directories." I presume, this path refers to local file system. I set this value to /...
Unfledged asked 1/3, 2010 at 8:15

2

When configuring my hadoop namenode for the first time, I know I need to run bin/hadoop namenode -format but running this a second time, after loading data into HDFS, will wipe out everything an...
Amherst asked 11/3, 2011 at 20:4

2

I install spark on three nodes successfully. I can visit spark web UI and find every worker node and master node is active. I can run the SparkPi example successfully. My cluster info: 10.45.10.3...
Bluegill asked 12/9, 2016 at 12:3

6

I want to convert a .sas7bdat file to a .csv/txt format so that I can upload it into a hive table. I'm receiving the .sas7bdat file from an outside server and do not have SAS on my machine.
Winslow asked 23/10, 2014 at 16:17

16

I am trying to install hadoop on ubuntu 16.04 but while starting the hadoop it will give me following error localhost: ERROR: Cannot set priority of datanode process 32156. Starting secondary nam...
Obadiah asked 18/9, 2017 at 15:58

4

I have multiple small parquet files generated as output of hive ql job, i would like to merge the output files to single parquet file? what is the best way to do it using some hdfs or linux comman...
Spohr asked 27/7, 2016 at 10:49

3

Solved

I have a large image classification dataset stored in the format .hdf5. The dataset has the labels and the images stored in the .hdf5 file. I am unable to view the images as they are store in form ...
Schleswigholstein asked 2/12, 2023 at 7:34

6

Solved

While running the wordcount example in Hadoop, I am facing the following error. saying "JAR does not exist or is not a normal file: /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduceexamp...
Summerwood asked 15/6, 2018 at 11:21

8

Solved

Is there a way to delete files older than 10 days on HDFS? In Linux I would use: find /path/to/directory/ -type f -mtime +10 -name '*.txt' -execdir rm -- {} \; Is there a way to do this on HDFS...
Valgus asked 29/5, 2017 at 5:15

14

I am getting this error when I try and boot up a DataNode. From what I have read, the RPC paramters are only used for a HA configuration, which I am not setting up (I think). 2014-05-18 18:05:00,5...
Division asked 18/5, 2014 at 8:19

2

Solved

I am working on hadoop apache 2.7.1 and I have a cluster that consists of 3 nodes nn1 nn2 dn1 nn1 is the dfs.default.name, so it is the master name node. I have installed httpfs and started it o...
Cymar asked 11/4, 2017 at 8:10

4

Solved

I'm using pydoop to read in a file from hdfs, and when I use: import pydoop.hdfs as hd with hd.open("/home/file.csv") as f: print f.read() It shows me the file in stdout. Is there any way for...
Predella asked 26/2, 2016 at 1:57

3

Solved

I know I can connect to an HDFS cluster via pyarrow using pyarrow.hdfs.connect() I also know I can read a parquet file using pyarrow.parquet's read_table() However, read_table() accepts a filepat...
Liddy asked 22/11, 2017 at 20:10

3

Solved

I learned that if you want to copy multiple files from one hadoop folder to another hadoop folder you can better create one big 'hdfs dfs -cp' statement with lots of components, instead of creating...
Demetria asked 16/12, 2016 at 13:52

4

Can one use Delta Lake and not being dependent on Databricks Runtime? (I mean, is it possible to use delta-lake with hdfs and spark on prem only?) If no, could you elaborate why is that so from tec...
Marko asked 23/3, 2020 at 16:5

2

Solved

Is there a way to acquire lock on a directory in HDFS? Here's what I am trying to do: I've a directory called ../latest/... Every day I need to add fresh data into this directory, but before I co...
Cheung asked 19/2, 2014 at 0:20

6

I am running hadoop with default configuration with one-node cluster, and would like to find where HDFS stores files locally. Any ideas? Thanks.
Accused asked 1/3, 2010 at 19:19

8

I am using Cloudera on a VM machine that I am playing around with. Unfortunately I am having issues copying data to the HDFS, I am getting the following: [cloudera@localhost ~]$ hadoop fs -mkdir i...
Meatman asked 27/3, 2014 at 1:31

1

Hadoop defintive guide says - Each Namenode runs a lightweight failover controller process whose job it is to monitor its Namenode for failures (using a simple heartbeat mechanism) and ...
Rentfree asked 23/10, 2015 at 21:21

5

Solved

Some characteristics of Apache Parquet are: Self-describing Columnar format Language-independent In comparison to Apache Avro, Sequence Files, RC File etc. I want an overview of the formats. I ha...
Darwen asked 24/4, 2016 at 10:59

10

Solved

I have a directory of directories on HDFS, and I want to iterate over the directories. Is there any easy way to do this with Spark using the SparkContext object?
Norval asked 19/11, 2014 at 18:1

3

Solved

I am unable to read a file from HDFS using Java: String hdfsUrl = "hdfs://<ip>:<port>"; Configuration configuration = new Configuration(); configuration.set("fs.defaultFS", hdfsUrl); F...
Articulation asked 18/8, 2015 at 17:0

3

Solved

I'm trying to restore some historic backup files that saved in parquet format, and I want to read from them once and write the data into a PostgreSQL database. I know that backup files saved using...
Dobrinsky asked 10/11, 2019 at 8:5

© 2022 - 2024 — McMap. All rights reserved.