Workaround for python MemoryError - McMap

About

Workaround for python MemoryError

Asked 11/11, 2018 at 14:20 Answered 12/11, 2018 at 4:34

Solved python keras sentiment-analysis

B

1

2

How can I change this function to make it more efficient? I keep getting MemoryError

def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.
    return results

I call the function here:

x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

Train and Test data are IMDB dataset for sentiment analysis, i.e.

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

EDIT: I am running this on 64 bit Ubuntu system with 4 GB RAM.

Here is the Traceback:

Traceback (most recent call last):

  File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 29, in <module>
    x_test = vectorize_sequences(test_data)
  File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 20, in vectorize_sequences
    results = np.zeros((len(sequences), dimension))
MemoryError

Brimstone answered 11/11, 2018 at 14:20 Comment(4)

Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this. – Land 11/11, 2018 at 14:27

Basically you have two options: use less memory or make more memory available. – Schuster 11/11, 2018 at 14:54

@JohnZwinck I have edited the question accordingly. Thanks – Brimstone 11/11, 2018 at 15:0

A related question: #68422910 – Guadalajara 17/7, 2021 at 16:55

L

2

Your array appears to be 10k x 10k which is 100 million elements of 64 bits each (because the default dtype is float64). So that's 800 million bytes, aka 763 megabytes.

If you use float32 it will cut the memory usage in half:

np.zeros((len(sequences), dimension), dtype=np.float32)

Or if you only care about 0 and 1, this will cut it by 88%:

np.zeros((len(sequences), dimension), dtype=np.int8)

Land answered 12/11, 2018 at 4:34 Comment(0)

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.