Are there any downsides to using virtualenv for scientific python and machine learning?

Asked 21/3, 2013 at 6:10 Answered 6/10, 2022 at 11:14

Solved python-2.7 virtualenv scientific-computing

I have received several recommendations to use virtualenv to clean up my python modules. I am concerned because it seems too good to be true. Has anyone found downside related to performance or memory issues in working with multicore settings, starcluster, numpy, scikit-learn, pandas, or iPython notebook.

Zamudio answered 21/3, 2013 at 6:10 Comment(0)

Virtualenv is the best and easiest way to keep some sort of order when it comes to dependencies. Python is really behind Ruby (bundler!) when it comes to dealing with installing and keeping track of modules. The best tool you have is virtualenv.

So I suggest you create a virtualenv directory for each of your applications, put together a file where you list all the 'pip install' commands you need to build the environment and ensure that you have a clean repeatable process for creating this environment.

I think that the nature of the application makes little difference. There should not be any performance issue since all that virtualenv does is to load libraries from a specific path rather than load them from the directory where they are saved by default.

In any case (this may be completely irrelevant), but if performance is an issue, then perhaps you ought to be looking at a compiled language. Most likely though, any performance bottlenecks could be improved with better coding.

Etalon answered 21/3, 2013 at 6:23 Comment(2)

Being aware of performance by itself a reason to go to a compiled language. I find working in a dynamically typed, interpreted language with strong functional features and simple syntax is the best way for me to express my thoughts in code. That often comes with a minimal performance penalty that I am happy to pay. But I still try not to use O(n^3) algorithms. – Zamudio 21/3, 2013 at 17:41

I think that using a language that you enjoy using and that does provide a rich set of libraries for your task is the most important issue. Performance, in most cases comes at a fairly late stage of the development cycle. You need to be able to build first what you want and then worry about making it faster / less resource hungry. So, my compiled language comment should be a secondary consideration. – Etalon 21/3, 2013 at 17:54

There's no performance overhead to using virtualenv. All it's doing is using different locations in the filesystem.

The only "overhead" is the time it takes to set it up. You'd need to install each package in your virtualenv (numpy, pandas, etc.)

Ribwort answered 21/3, 2013 at 6:23 Comment(0)

Virtualenvs do not deal with C dependencies which may be an issue depending on how how keen you are about reproducible builds and capturing all of the machine setup in one process. You might end up needing to install C libraries through another package manager such as brew apt or rpm, and these dependencies can be different between machine or change over time. To avoid this, you might end up using docker and friends - which then adds another layer of complexity.

conda goes tries to address the non-python dependencies. The issue is that it is bigger and slower.

Irresolute answered 6/10, 2022 at 11:14 Comment(0)

Recommended topics

Hot tags