Stop the “Works on My Machine” Syndrome

You are not an advanced programmer if you have never faced the Works on my machine syndrome (and a liar if you claim you never made that statement to others). If you work in big teams, you will always come across a problem where a certain code (written by you, or a critical third party component) only works in certain versions.

This scenario becomes a nightmare when system libraries and tools (like compilers) are thrown into the Only works with X version mix. One Python project I knew only worked with a five year old version of MySql, and a specific version of Perl. I never understood why they needed Perl for a Python project.

Some general tips

Every time you install a library, or use one provided by either the system or the compiler / IDE /etc, make sure you note its version. Even better, put it in source control. At one place I worked, we used a proprietary compiler (because we used a certain embedded processor). The company often updated libraries, without caring about backward compatibility. To make it worse, these libraries would be silently updated when you upgraded the IDE (which you had to do to use newer chips). We ended up adding every single library we used to source control.

Make sure all libraries are being called explicitly, so you know there are no hidden dependencies.

There is no simple solution, unfortunately. However, today we will only talk about isolating Python code. There are several ways to isolate your Python code, and you must be aware of all of them, just so that you can choose the best tool for the task.

Isolating Python code

Python stores its libraries in a central location, so you can’t really back them up.

Instead, you must create isolated instances of Python, which is possible as you can have multiple Python installs (including multiple versions) on the same machines. Here are a few ways to isolate your Python code:

1. Virtualenv

Virtualenv is the most popular and standard method of creating an isolated Python instance. Virtualenv installs Python in a local directory, and allows you to use that version of Python. That way, you can control which versions of libraries you are using.

Virtualenvs are activated using an activate command, which changes the path so that you end up using your local Python instead of the system Python.

Tthough on Linux you can just give the full path to your own Python version, like:

Virtuaenv is the most common way, so most people think there is no problems with it. Many Python programmers are primarily web programmers, and that is the only world they know. In that world, virtualenv is good enough.

The problems starts when you want to use Python for engineering/science. Many scientific libraries like Numpy need to be compiled from source, and pip doesn’t handle that quite well.  On Linux, you usually have all compilers and tools, but these are rarely present on Windows.

2. Conda

For people on Windows I usually recommend Anaconda. It solves most of the install problems for libraries like Scipy/Numpy.

Anaconda is based on Conda, which like Virtualenv, allows you to create boxed environments for running Python code. Unfortunately, while it works great on Linux, I couldn’t get it to work on Windows.

3. Virtual Machine

Virtual machines have several problems. They are several gigabytes in size (which make them hard to distribute). Not only that, many VMs come with a lot of decisions made for you-versions of important tools, databases etc. Virtual Machines weren’t really that easy to use.

Till Vagrant came along. That completely changed the game. Vagrant’s images are tiny, just a few hundred megabytes. You can choose to install the libraries you want, and even better, put the install scripts in version control. That means you can easily share your VM with others.

Vagrant also supports Puppet/Chef/Salt/Ansible, which means you can then deploy the code to the server without much effort either.

Vagrant is now my recommended choice. For small, personal projects, it doesn’t matter so much, but if you work in a team and want to share your code, there is nothing better than Vagrant.

If you are using Vagrant, make sure that you give a complete Vagrant file (ie, contains the scripts to install any libraries /tools you need), so that all the end user has to do is type vagrant up. Don’t make the user struggle with installing libraries, as that beats the purpose of using Vagrant.

 4. Docker

Docker provides another layer of isolation. Unlike a VM, it uses a lightweight container that runs on top on the operating system.

While I understand the theory behind Docker, I’m not really sure when/why you would use it. If you have used Docker and have a killer usecase for it, feel free to share it in the comments.

If I have missed any other way to isolate your code, please share that as well, and I’ll update the blog.

PS: Interested in leveling up your Python and getting a great job? Check out the Python Apprenticeship Program.

12 thoughts on “Stop the “Works on My Machine” Syndrome”

  1. Docker is like VirtualEnv on crack. It’s not a case of when to use Docker, it’s a case of when NOT to use Docker. You can run your whole infrastructure using containers. It solves so many problems (including the one this article is all about).
    1. Immutable deployments
    2. Fully portable code
    3. No dependency issues (Virtualenv can’t save you when you start having to interface with other languages/systems/etc, docker can)
    4. Versionable stacks using Dockerfiles
    5. Scalable
    6. Forces you to write modularised applications
    7. Allows for easy integration/regression testing
    8. Other things I’ve forgotten right now but are also awesome.

  2. A couple other benefits of using Conda:
    1) It language agnostic, so you can also use it to create packages for node.js, FORTRAN, go, R, and ruby packages.
    2) You browse packages on channels like https://binstar.org/javascript and then install them with conda install --channel iojs
    3) Like virtualenv, you can create an environment in conda with conda create --name shrubbery python=2.6 or, if you have an environment.yml file with all the packages you want to install listed, conda env create --file ~/path/to/environment.yml. To activate the environment, run source activate environmentname

    1. Thanks Andrew, I did not know that you could create packages for other languages.

      I do have problems with your other 2 points:

      1. Binstar.org doesn’t work so well. I tried to install OpenCV for it, but it never worked. I tried it on Linux, and it worked. But there was no documentation to suggest that the files were Linux only.

      2. I have never managed to get Conda environments to work on Windows, though it works on Linux.

  3. 5. Vendor your dependencies.

    Sticking all of your deps in your project makes them easy to search, deploy, and patch if necessary. It stops new developers from downloading from PyPI each time. It also means you don’t have to install the one thing that makes deployment and versioning a nightmare: Setuptools.

    Unfortunately this method does require some more manual labor when you first use a package, especially for packages with native extensions. But until we get a real solution to this problem, it’s the one with the least friction.

  4. Don’t forget Buildout! It’s similar to virtualenv, but keeps everything sandboxed in the project directory, and can run “recipes” to do common tasks like set-up databases, compile C libraries, etc.

    1. Thanks Philip.

      I looked at the official documentation, and I must confess I have no idea how it works, or what it does.

  5. I have been huge fan out isolated enviornments for development , experimentation or simple to get my hands on something i want to play around with. I spent nearly 10 years in Java space & continue to work professionally in this space. Although I am puzzled why i never picked up playing with python , which i have been doing recently for last 6 months or so. Let me cut to the chase here , I been using virtual box long before vagrant was used , vagrant saved lot of headaches of me dealing with creating vm choosing os etc.. which you already know.

    But what exactly docker has done is to create containers for me which i can reuse again & again with different application code base.
    lets say i deploy flask applications App A Code in vagrant running on port 8080 etc. now i want to add App B into the same shared folder in vagrant to be able to run , or else i will spin up another vagrant vm to have run my App B in there.

    Docker solves it that i will create a base image for Flask application , all i need to do is start container with volume mounting on it with different instance names instanceA running AppA & instanceB running with AppB. Isolation of code base running on same platform of flask running in two containers gives me isolation of code base & run time which is great for me as i do not want AppB code base into instance A and AppA code base in instance B.

    Long Story short I can run one vagrant box which can have multiple docker containers for playing around & destroying it. It more efficient to be doing all this on my mac with 8GB which i allocate vagrant instance 2 GB and i am done running multiple instances on it.

    Should i need to run two versions of vagrant with each 2GB i am eating up my host os which can show significant performance of runtime environment even for dev testing etc.

    I am soon gonna write it up in my blog although i must warn you that i am a no-vice blogger. I have seen huge gains in my process using vagrant + core os (which is linux os of choice as it has native support for docker , you can run on ubuntu as well) container i use to host docker containers based on images.

Leave a Reply