I am trying to open a file that i uploaded to the dbfs location. However, I get error while trying to open the file but I can see the file when I do a ls. Also there is no issue while reading the file to a RDD. Can someone explain the behavior of dbfs? I tried several times after going through the documentation aswell. This is the documentation I followed.
#ls
dbutils.fs.ls("/tmp/sample.txt")
Out[82]: [FileInfo(path='dbfs:/tmp/sample.txt', name='sample.txt', size=46044136)]
#creating RDD from the txt file
data_file = "/tmp/sample.txt"
raw_data = sc.textFile(data_file)
raw_data.take(1)
Out[99]: ["Oct 12 2009 \tNice trendy hotel location not too bad...........\t"]
#open the txt file
with open ("/tmp/sample.txt" , 'r') as f:
for i, line in enumerate (f):
if (i%10000==0):
print("read {0} reviews".format(i))
print (gensim.utils.simple_preprocess(line))
FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/tmp/sample.txt'
#as per documentation
with open ("/dbfs/tmp/sample.txt" , 'r') as f:
for i, line in enumerate (f):
if (i%10000==0):
print("read {0} reviews".format(i))
print (gensim.utils.simple_preprocess(line))
FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/tmp/sample.txt'
Been scratching my head on this. Any help will be greatly appreciated.
P.S. I am using community edition of Databricks if that helps.