Databricks dbfs file read issue
Asked Answered
A

1

3

I am trying to open a file that i uploaded to the dbfs location. However, I get error while trying to open the file but I can see the file when I do a ls. Also there is no issue while reading the file to a RDD. Can someone explain the behavior of dbfs? I tried several times after going through the documentation aswell. This is the documentation I followed.

  #ls
  dbutils.fs.ls("/tmp/sample.txt")
Out[82]: [FileInfo(path='dbfs:/tmp/sample.txt', name='sample.txt', size=46044136)]

  #creating RDD from the txt file
  data_file = "/tmp/sample.txt"
  raw_data = sc.textFile(data_file)
  raw_data.take(1)
Out[99]: ["Oct 12 2009 \tNice trendy hotel location not too bad...........\t"]

 #open the txt file
  with open ("/tmp/sample.txt" , 'r') as f:
  for i, line in enumerate (f):
      if (i%10000==0):
        print("read {0} reviews".format(i))
        print (gensim.utils.simple_preprocess(line))
FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/tmp/sample.txt' 

#as per documentation
  with open ("/dbfs/tmp/sample.txt" , 'r') as f:
  for i, line in enumerate (f):
      if (i%10000==0):
        print("read {0} reviews".format(i))
        print (gensim.utils.simple_preprocess(line))
FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/tmp/sample.txt'

Been scratching my head on this. Any help will be greatly appreciated.

P.S. I am using community edition of Databricks if that helps.

Astray answered 24/3, 2021 at 15:48 Comment(0)
P
1

This is a limitation of Community Edition with DBR >= 7.x. If you want to access that DBFS file locally then you can use dbutils.fs.cp('dbfs:/file', 'file:/local-path') (or %fs cp dbfs:/file file:/local-path) to copy file from DBFS to local file system where you can work with it.

Pupillary answered 27/3, 2021 at 19:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.