I am creating a shelve file of sequences from a genomic FASTA file:
# Import necessary libraries
import shelve
from Bio import SeqIO
# Create dictionary of genomic sequences
genome = {}
with open("Mus_musculus.GRCm38.dna.primary_assembly.fa") as handle:
for record in SeqIO.parse(handle, "fasta"):
genome[str(record.id)] = str(record.seq)
# Shelve genome sequences
myShelve = shelve.open("Mus_musculus.GRCm38.dna.primary_assembly.db")
myShelve.update(genome)
myShelve.close()
The file itself is 2.6Gb, however when I try and shelve it, a file of >100Gb is being produced, plus my computer will throw out a number of complaints about being out of memory and the start up disk being full. This only seems to happen when I try to run this under OSX Yosemite, on Ubuntu it works as expected. Any suggestions why this is not working? I'm using Python 3.4.2