Friday, December 6, 2013

Why I promote conda

Anaconda users have been enjoying the benefits of conda for quickly and easily
managing their binary Python packages for over a year.  During that time conda
has also been steadily improving as a general-purpose package manager.  I
have recently been promoting the very nice things that conda can do for Python
users generally --- especially with complex binary extensions to Python as
exist in the NumPy stack.   For example, It is very easy to create python 3
environments and python 2 environments on the same system and install
scikit-learn into them.   Normally, this process can be painful if you
do not have a suitable build environment, or don't want to wait for
compilation to succeed.

Naturally, I sometimes get asked, "Why did you promote/write another
python package manager (conda) instead of just contributing to the
standard pip and virtualenv?"  The python packaging story is older and
more personal to me than you might think.  Python packaging has been a thorn
in my side personally since 1998 when I released my first Python extension
(called numpyio actually).  Since then, I've written and personally released
many, many Python packages (Multipack which became SciPy, NumPy, llvmpy,
Numba, Blaze, etc.).   There is nothing you want more as a package author than
users.  So, to make Multipack (SciPy), then NumPy available, I had to become a
packaging expert by experiencing a lot of pain with the lack of
suitable tools for my (admittedly complex) task.

Along the way, I've suffered through believing that distutils,
setuptools, distribute, and pip/virtualenv would solve my actual
problem.  All of these tools provided some standardization (at least around what somebody
types at the command line to build a package) but no help in actually doing the
build and no real help in getting compatible binaries of things like SciPy
installed onto many users machines.

I've personally made terrible software engineering mistakes because of the lack of
good package management.  For example, I allowed the pressure of "no ABI
changes" to severely hamper the progress of the NumPy API.  Instead of pushing
harder and breaking the ABI when necessary to get improvements into NumPy, I
buckled under the pressure and agreed to the requests coming mostly from NumPy
windows users and froze the ABI.  I could empathize with people who would spend
days building their NumPy stack and literally become fearful of changing it.
From NumPy 1.4 to NumPy 1.7, the partial date-time addition caused various
degrees of broken-ness and is part of why missing data data-types have never
showed up in NumPy at all.   If conda had existed back then with standard
conda binaries released for different projects, there would have been almost
no problem at all.   That pressure would have largely disappeared.   Just
install the packages again --- problem solved for everybody (not just the
Linux users who had apt-get and yum).

Some of the problems with SciPy are also rooted in the lack of good packages
and package  management.  SciPy, when we first released it in 2001 was
basically a distribution of multiple modules from Multipack, some new BLAS /
LAPACK and linear algebra wrappers and nascent plotting tools.  It was a SciPy
distribution masquerading as a single library.  Most of the effort spent was
a packaging effort (especially on Windows).  Since then, the scikits effort
has done a great job of breaking up the domain of SciPy into more manageable
chunks and providing a space for the community to grow.   This kind of re-
factoring is only possible with good distributions and is really only
effective when you have good package management.   On Mac and Linux
package managers exist --- on Windows things like EPD, Anaconda or C.
Gohlke's collection of binaries have been the only solution.

Through all of this work, I've cut my fingers and toes and sometimes face on
compilers, shared and static libraries on all kinds of crazy systems (AIX,
Windows NT, etc.).  I still remember the night I learned what it meant to have
ABI incompatibilty between different compilers (try passing structs
such as complex-numbers between a file compiled with mingw and a library compiled with
Visual Studio).   I've been bitten more than once by unicode-width
incompatibilities, strange shared-library incompatibilities, and the vagaries
of how different compilers and run-times define the `FILE *` file pointer.

In fact, if you have not read "Linkers and Loaders", you should actually do
that right now as it will open your mind to that interesting limbo between
"developer-code" and "running process" overlooked by even experienced
developers.  I'm grateful Dave Beazley recommended it to me over 6 years ago.
Here is a link:

We in the scientific python community have had difficulty and a rocky
history with just waiting for the community to solve the
problem.  With distutils for example, we had to essentially re-write
most of it (as numpy.distutils) in order to support compilation of
extensions that needed Fortran-compiled libraries.  This was not an
easy task.  All kinds of other tools could have (and, in retrospect,
should have) been used.  Most of the design of distutils did not help
us in the NumPy stack at all.  In fact, numpy.distutils replaces most
of the innards of distutils but is still shackled by the architecture
and imperative approach to what should fundamentally be a declarative
problem.  We should have just used or written something like waf or
bento or cmake and encouraged its use everywhere.  However, we buckled
under the pressure of the distutils promise of "one right way to do
it" and "one-size fits all" solution that we all hoped for, but
ultimately did not get.  I appreciate the effort of the distutils
authors.  Their hearts were in the right place and they did provide a
useful solution for their use-cases.  It was just not useful for ours,
and we should not have tried to force the issue.  Not all code is
useful to everyone.  The real mistake was the Python community picking
a "standard" that was actually limiting for a sizeable set of users.
This was the real problem --- but it should be noted that this
"problem" is only because of the incredible success and therefore
influence of python developers and  With this influence, however,
comes a certain danger of limiting progress if all advances have to be
made via committee --- working out specifications instead of watching for
innovation and encouraging it.

David Cooke and many others finally wrestled numpy.distutils to the
point that the library does provide some useful functionality for
helping build extensions requiring NumPy.  Even after all that effort,
however, some in the Python community who seem to have no idea of the
history of how these things came about and simply claim that
files that need numpy.distutils are "broken" because they import numpy
before "requiring" them.  To this, I reply that what is actually
broken is the design that does not have a delcarative meta-data file
that describes dependencies and then a build process that creates the
environment needed before running any code to do the actual build.
This is what `conda build` does and it works beautifully to create any
kind of binary package you want from any list of dependencies you may
have.  Anything else is going to require all kinds of "bootstrap"
gyrations to fit into the square hole of a process that seems to
require that all things begin with the python incantation.

Therefore, you can't really address the problem of Python packaging without
addressing the core problems of trying to use distutils (at least for the
NumPy stack).  The problems for us in the NumPy stack started there and have
to be rooted out there as well.  This was confirmed for me at the first PyData
meetup at Google HQ, where several of us asked Guido what we can do to fix
Python packaging for the NumPy stack.   Guido's answer was to "solve the
problem ourselves".  We at Continuum took him at his word.  We looked at dpkg,
rpm, pip/virtualenv, brew, nixos, and 0installer, and used our past experience
with EPD.  We thought hard about the fundamental issues, and created the conda
package manager and conda environments.  We who have been working on this for
the past year have decades of Python packaging experience between us: me,
Peter Wang, Ilan Schnell, Bryan Van de Ven, Mark Wiebe, Trent Nelson, Aaron
Meurer, and now Andy Terrel are all helping improve things.  We welcome
contributions, improvements, and updates from anyone else as conda is BSD
licensed and completely open source and can be used and re-used by
anybody.  We've also recently made a mailing list which is open to anyone to join and participate:!forum/conda

Conda pkg files are similar to .whl files except they are Python-agnostic.  A
conda pkg file is a bzipped tar file with an 'info' directory, and then
whatever other directory structure is created by the install process in
"prefix".   It's the equivalent of taking a file-system diff pre and post-
install and then tarring the result up.  It's more general than .whl files and
can support any kind of binary file.    Making conda packages is as simple as making a recipe for it.   We make a growing collection of public-domain, example recipes available to everyone and also encourage attachment of a conda recipe directory to every project that needs binaries.

At the heart of conda package installation is the concept of environments.
Environments are like namespaces in Python -- but for binary packages.  Their
applicability is extensive.  We are using them within Anaconda and Wakari for
all kinds of purposes (from testing to application isolation to easy
reproducibility to supporting multiple versions of packages in different
scripts that are part of the same installation).  Truly, to borrow the famous
Tim Peters' quip: "Environments are one honking great idea -- let's do more of
those".  Rather than tacking this on after the fact like virtualenv does to
pip, OS-level environments are built-in from the beginning.  As a result,
every conda package is always installed into an environment.  There is a
default (root) environment if you don't explicitly specify another one.
Installation of a package is simply merging the unpacked binary into the union
of unpacked binaries already at the root-path of the environment.   If union
filesystems were better implemented in different operating systems, then each
environment would simply be a union of the untarred binary packages.  Instead
we accomplish the same thing with hard-linking, soft-linking, and (when
necessary) copying of files.

The design is simple, which helps it be easy to understand and easy to
mix with other ideas.  We don't see easily how to take these simple,
powerful ideas and adapt them to .whl and virtualenv which are trying
to fit-in to a world created by distutils and setuptools.  It was
actually much easier to just write our own solution and create
hundreds of packages and make them available and provide all the tools
to reproduce what we have done inside conda than to try and untangle
how to provide our solution in that world and potentially even not
quite get the result we want (which can be argued is what happened
with numpy.distutils).

You can use conda to build your own distribution of binaries that
compete with Anaconda if you like.  Please do.  I would be completely
thrilled if every other Python distribution (, EPD,
ActiveState, etc.) just used conda packages that they build and in so
doing helped improve the conda package manager.  I recognize that
conda emerged at the same time as the Anaconda distribution was
stabilizing and so there is natural confusion over the two.  So,
I will try to clarify: Conda is an open-source, general,
cross-platform package manager.  One could accurately describe it as a
cross-platform hombrew written in Python.  Anyone can use the tool and
related infrastructure to build and distribute whatever packages they

Anaconda is the collection of conda packages that we at Continuum provide for
free to everyone, based on a particular base Python we choose (which you can
download at as Miniconda).  In the past it has
been some work to get conda working outside Miniconda or Anaconda because our
first focus was creating a working solution for our users.  We have been
fixing those minor issues and have now released a version of conda that can be
'pip installed'.   As conda has significant overlap with virtualenv in
particular we are still working out kinks in the interop of these two
solutions.   But, it all can and should work together and we fix issues as
quickly as we can identify them.

We also provide a service called (register with beta-code
"binstar in beta") which allows you to host your own binary conda packages.
With this missing piece, you just tell people to point their conda
repositories to your collection -- and they can easily install everything you
want them to.  You can also build your own conda repositories and host them on
your own servers.  It all works, today, now -- for hundreds of thousands of
people.  In this context, Anaconda could be considered a "reference"
distribution and a proof of concept of how to use the conda package manager.
Wakari also uses the conda package manager at its core to share bundles.
Bundles are just conda packages (with a set of dependencies) and capture the
core problems associated with reproducible computing in a light-weight and
easily reproduced way.  We have made the tools available for *anyone* to re-
create this distribution pretty easily and compete with us.

It is very important to keep in mind that we created conda to solve
the problem of distributing an environment to end-users that allow
them do to advanced data analytics, scientific discovery, and general
engineering work.  Python has a chance to play a major role in this
space.  However, it is not the only player.  Other solutions exist in
the space we are targeting (SAS, Matlab, SPSS, and R).  We want Python
to dominate this space.  We could not wait for the packaging solution
we needed to evolve from the lengthy discussions that are on-going
which also have to untangle the history of distutils, setuptools,
easy_install, and distribute.  What we could do is solve our problem
and then look for interoperability and influence opportunities once we
had something that worked for our needs.   That the approach we took
and I'm glad we did.  We have a working solution now which benefits
hundreds of thousands of users (and could benefit millions more if
IT administrators recognized conda as an acceptable packaging approach
from others in the community).

We are going to keep improving conda until it becomes an obvious
solution for everyone: users, developers, and IT administrators alike.
We welcome additions and suggestions that allow it to interoperate
with anything else in the Python packaging space.   I do believe that the group of people working on Python packaging and Nick Coghlan in particular are doing a valuable service.  It's a very difficult job to take into account the history of Python packaging, fix all the little issues around it, *and* provide a binary distribution system that allows users to not have to think about packaging and distribution.    With our resources we did just the latter.   I admire those who are on the front lines of the former and look to provide as much context as I can to ensure that any future decisions take our use-cases into account.   I am looking forward to continuing to work with the community to reach future solutions that benefit everyone.

If you would like to see more detail about conda and how it can be used here are some

Talk at PyData NYC 2013:
 - Slides:
 - Video:

Blog Posts:

Mailing list:


  1. Note that if you use the sysconfig "data" directory in wheels, then it will reproduce the *exact* layout used inside the wheel on the target filesystem (relative to the installation root). Most of the time people don't want that though - they want the cross-platform abstraction provided by the sysconfig schema that allows installers to take care of mapping from semantic labels to filesystem locations.

  2. Thanks for pointing that out. I believe there are a lot of similarities between .whl and conda packages. In fact, there should be a way for the two formats to be understood by installers of each.

    The .whl specification has been evolving at the same time and the two formats can learn from each other --- for example, conda is growing that same semantic label notion --- to allow one package to be built and re-used across multiple platforms where it is possible.

  3. Thanks Travis,

    Another great post. I'm hoping that we're going to be able to figure out how to leverage HashDist with and for Conda, and I am also for helping the scientific community standardize around Conda. It is a much bigger solution than just Python, and I am hoping we can get there as soon as possible.

  4. Hi, thanks for the great post!
    One thing that's keeping me from using conda is that the packages always seem a bit behind (in time) of what you would get when using pip. Can you actually mix pip and conda or is it not adviseable?

    1. It's completely fine to mix pip and conda. We do it all the time. For example, I start with a conda environment and then pip install things that might not have conda equivalents.

      But, lately, I try conda build --build-recipe and typically can get a conda package if it's a straightforward pip package.

      That way I also get conda packages that people can relocate and create environments with easily.

  5. "Anaconda is the collection of conda packages that we at Continuum provide for free to everyone, based on a particular base Python we choose (which you can download at as Miniconda)." What does "which you can download" refer back to? "base Python"?, or "conda packages"? Or "Anaconda"? None of these seem quite right, particularly since "conda packages" is apparently what Anaconda is, not Miniconda. Confused!

  6. What I am looking for is some documentation & examples (hopefully written by Anaconda experts) about how to make use of conda in the context of Apache/mod_wsgi - similar to the way virtualenv is used here:


    my name is Miss Fatima ,i was married to my husband for 5 years we were living happily together for this years and not until he traveled to Australia for a business trip where he met this girl and since then he hate me and the kids and love her only. so when my husband came back from the trip he said he does not want to see me and my kids again so he drove us out of the house and he was now going to Australia to see that other woman. so i and my kids were now so frustrated and i was just staying with my mum and i was not be treating good because my mother got married to another man after my father death so the man she got married to was not treating her well, i and my kids were so confuse and i was searching for a way to get my husband back home because i love and cherish him so much,so one day as i was browsing on my computer i saw a testimony about this spell caster DR Sunny, testimonies shared on the internet by a lady and it impress me so much i also think of give it a try. At first i was scared but when i think of what me and my kids are passing through so i contacted him and he told me to stay calm for just 24 hours that my husband shall come back to me and to my best surprise i received a call from my husband on the second day asking after the kids and i called DR Sunny and he said your problems are solved my child. so this was how i get my family back after a long stress of brake up by an evil lady so with all this help from DR Sunny, i want you all on this forum to join me to say a huge thanks to DR Sunny and i will also advice for any one in such or similar problems or any kind of problems should also contact him his email is )( he is the solution to all your problems and predicaments in life. once again his email address is (


    (1) If you want your ex back.

    (2) if you always have bad dreams.

    (3) If you want to be promoted in your office.

    (4) If you want women/men to run after you.

    (5) If you want a child.

    (6) If you want to be rich.

    (7) If you want to tie your husband/wife to be yours forever.

    (8) If you need financial assistance.

    (9) How you been scammed and you want to recover you lost money.

    (10) if you want to stop your divorce.

    (11) if you want to divorce your husband.

    (12) if you want your wishes to be granted.

    (13) Pregnancy spell to conceive baby

    (14) Guarantee you win the troubling court cases & divorce no matter how what stage

    (15) Stop your marriage or relationship from breaking apart.

    (16) if you have any sickness like ( H I V ), (CANCER) or any sickness.

    (17) if you need prayers for deliverance for your child or yourself.

    once again make sure you contact him if you have any problem he will help you. his email address is ( contact him my number on +2348077620669

  8. So would conda be a good way to set up a development environment for SciPy? says to use virtualenv, but it seems like conda does the same thing as virtualenv but better? It would be a better fit for trying the same code out in multiple versions of the dependencies? I've tried to set this up on Windows and it failed, couldn't find packages for ATLAS, etc. If you could write up a walkthrough on how to create a Windows development environment for SciPy using the github master and conda that would be very helpful.