Using the mrjob to run python code on Amazon's Elastic MapReduce I have successfully found a way to upgrade the EMR image's numpy and scipy.
Running from console the following commands work:
tar -cvf py_bundle.tar mymain.py Utils.py numpy-1.6.1.tar.gz scipy-0.9.0.tar.gz
gzip py_bundle.tar
python my_mapper.py -r emr --python-archive py_bundle.tar.gz --bootstrap-python-package numpy-1.6.1.tar.gz --bootstrap-python-package scipy-0.9.0.tar.gz > output.txt
This successfully bootstraps the latest numpy and scipy into the image and works perfectly. My question is a matter of speed. This takes 21 minutes to install itself on a small instance.
Does anyone have any idea how to speed up the process of upgrading numpy and scipy?