It seems the format is, for every line, the string is like 'word number number .....'. So it easy to split it. But when I split them with the script below
import numpy as np
def loadGloveModel(gloveFile):
print "Loading Glove Model"
f = open(gloveFile,'r')
model = {}
for line in f:
splitLine = line.split()
word = splitLine[0]
embedding = np.array([float(val) for val in splitLine[1:]])
model[word] = embedding
print "Done.",len(model)," words loaded!"
return model
I load the glove 840B 300d.txt. but get error and I print the splitLine I got
['contact', '[email protected]', '0.016426', '0.13728', '0.18781', '0.75784', '0.44012', '0.096794' ... ]
or
['.', '.', '.', '.', '0.033459', '-0.085658', '0.27155', ...]
Please notice that this script works fine in glove.6b.*
['in', 'emailing', 'Email', 'email', 'At', 'at', 'by', 'to', 'in', 'or', '•', 'Contact','contact', 'is', 'on']
– Rosarosabelglove.6B.zip
is 862182613 bytes – Ovolo