Python & Virtual Environment#
Introduction#
Python versions and Virtual environment#
These days, Python is probably the most popular programming language – as a researcher, we want to be rigorous so let’s keep the word ‘probably’ here. In fact, according to the programming language ranking from multiple resources, Python is indeed #1 in terms of the popularity, e.g., the one by TIOBE Index
[TIOBE, 2024] and IEEE Spectrum
[Cass, 2024]. We love Python because it is easy to learn and use. Meanwhile, we have a large Python community supporting the development, package sharing and Q&A. Together with its popularity and large community support comes enormous tools (which then unavoidably comes with a lot of jargons). One side, this is definitely a good thing since we have a lot of tools to turn to, but the down side is also obvious – we have too many to choose from. As a non-programmer, quite often we found ourselves not even knowing what those tools are for – the only thing, we, as researchers, care about is probably just that we have a Python script and when we do python my_script.py
, it gives me a meaningful output following the physics we put into the script, and that’s it. So, why should we care about those tools and jargons? Yes, that is true – we really don’t need to care about those annoying stuff, until we have to. Why? Well, there are several obvious reasons that immediately pop up,
The language is actively progressing. Developers are working on adding in more functionalities, making it easier to use, and probably faster to run. Meanwhile, we are also progressing. The requirement for features in the language on our side as users will also change from time to time, or from here to there. Therefore, quite often we would find that some useful features that we need are only available in a certain version of Python but meanwhile some other features that we also need at some point are only available in another version of Python. In this case, what do we do?
Sometimes, even the feature requirement does not change, there will be factors that may force us to make changes. One typical reason is
security
. For example, Python2.0
was first released in2000
and finally came to its end of life in2020
. People have been using it for a long time but as it is coming to its end of life, no support will be provided by the community and therefore vulnerabilty will never be patched, which will bring in serious security concerns if we keep using it. In this case, there is nothing we can do but to switch to a higher version of Python. Then at the transition stage, say, we still need some features in Python2.0
for some of our daily routines while being forced to switched to Python3.x
, what do we do?The Python community is providing us with a lot of modules (e.g.,
matplotlib
for plotting,numpy
andscipy
for scientific calculations) as building blocks so that we don’t need to always reinvent the wheel. Again, the language itself is progressing, the modules themselves are also progressing, which naturally brings in the version compatability issue. If we need the feature available innumpy
versionx.y.z
which is only compatible with Python version<=i.j.k
but we also need the feature available innumpy
versionp.q.s
which is only compatible with Python version>=l.m.n
. Then, what do we do?
To deal with such complex scenarios, the virtual environment
comes to rescue. It basically puts a certain version of Python and all the compatible dependencies in a self-contained
environment and we can have multiple such environments that are independent from each other so they are not interfering each other. Probably, we all know the conda environment
with this regard. We all know it because it is probably the most popular one, among all the alternatives. In fact, for many people, conda environment
would be used interchangeably with virtual environment
. Anyhow, it is an environment
and who cares whether it is donda environment
, fonda environment
, or whatever environment
… Yes, that is true and indeed we don’t care, until we have to.
Tools for virtual environment management#
Concerning the virtual environment
, we have multiple alterntive tools to set up such self-contained virtual environments
. Typical examples are virtualenv
, mamba
, pixi
, peotry
and uv
. Maybe there are more, but those are what I am aware of. conda
is one of those tools and it is a bit special and is worth some special notes here. We have two different conda
’s – one is the conda
from the Anaconda
company and the other is the conda
from the conda-forge
community. It is something associated with the Anaconda
version of conda
that is becoming non-free now (though, it is not the conda
tool that is becoming non-free, as we will see later), but the conda-forge
community version of conda
is still all free. In terms of the usage, the two versions of condas
do not really make any differences. The conda
commands that we execute on a daily base are basically the same between the two versions. Another tool to specially mention is mamba
– it is also coming from conda-forge
and works very much like conda
. For many typical commands we were executing before with conda
, like conda env list
, conda activate
, etc., we only need to replace conda
with mamba
.
Packages#
It is actually not true that the Anaconda
and conda-forge
version of conda
is all the same. The major difference is on the channel to pull the conda packages
from. Here, we come across with the terminology of conda package
. Literally, it means a package – developers implement some modules for certain purposes, like numpy
for scientific calculation, and they package the module up in a certain way and host it somewhere on the cloud so that users can install on their machines. conda
is one of the ways for packaging up modules and again, it is probably the most popular one. For conda packages
, they are hosted in the cloud in channels
which is just like a big pool of conda packages
. When uploading or downloading conda packages
, we need to specify the channel
to push into or pull from. If no channel
is specified for the uploading or downloading, thta means we have some default channel
being used there. If we are using the Anaconda
version of conda
, the default channel is just called defaults
. If the conda-forge
version of conda
is used, the default channel is conda-forge
. Now, we are ready to mention that it is those packages included in the defaults
channel maintained by Anaconda
that becomes non-free. This means,
If
conda packages
are installed from channels other than thedefaults
channel maintained byAnaconda
, they can be used, for free, and theAnaconda
version of theconda
tool can still be used – yes, theconda
tool itself is still free even though you installed it fromAnaconda
.If
conda packages
are installed from thedefaults
channel maintained byAnaconda
, theycannot
be used for free, for organizations with 200 or more people.
Apart from conda packages
, there are other ways to package up Python modules to be released for people to downloand and install. Another typical type of packaging is through the The Python Package Index (PyPI)
and the tool associated with the package installation is also very popular, namely pip
. One thing to note is, pip
is only a package installation tool but not
a virtual environment management tool. That means it does not have as powerful compatability checking capability as those environment management tools, such as conda
. Therefore, when using pip
to install packages into a virtual environment, it is possible that packages with conflicting dependencies can be installed into the same environment, which then will break the environment due to the conflicts.
Q & A#
Q: What is ‘conda’?
A: When we say conda
, we may refer to,
the
conda
tool for virtual environment management – basically, it is just theconda
command. As mentioned above, theconda
tool carries two possible meanings,the
conda
tool provided byAnaconda
, which uses thedefaults
channel maintained byAnaconda
as the default channel.the
conda
tool provided byconda-forge
, which uses theconda-forge
channel maintained by the community as the default channel.
the
conda
environment, i.e., the self-contained environment for Python and modules to live in, independentlythe
conda
package, i.e., the package likenumpy
that incorporates the modules to be used in our Python script regarding certain functionalities.
Q: How is the commercial to non-commercial switching for the virutal environment tools relevant to me?
A: It depends. Two scenarios here,
On ORNL Analysis, many data reduction and analysis software are deployed with
conda environment
and the switching to non-commercial tools will have impacts on all of them. However, in this case, as general users of those software, we are not supposed to see any differences in our daily routines. For example, if we want to launch theMantid Workbench
, we still domantidworkbench
.If we want to set up virtual environments on our own, we need to take some actions – details will be covered in the next question.
Q: What should I do regarding the commercial to non-commercial switching?
A: First, we need to make it clear that the conda
tool itself is free
and it is only the packages in the defaults
channel maitained by Anaconda
is non-free
. So, no matter whether the conda
tool was installed via Anaconda
or conda-forge
, the conda
command can still be used without any problems. Action is potentially needed regarding those packages we previously installed via conda
. For example, I have a conda environment
called my_env
and I was ever installing some packages through the defaults
channel into the my_env
environment, then I can no longer use those packages in the my_env
environment. If it is just several packages that become non-usable, we can first execute conda list --show-channel-urls
to see what packages were installed via the defaults
channel and then use conda remove the_package
to remove those packages. Otherwise, we need to remove the my_env
environment, create a new one and reinstall those packages via the conda-forge
channel or whatever other channels. It can be tedious if we reinstall all those packages one by one. The following steps could be followed as a quicker solution,
Export the existing conda environment to a
YAML
file for later use for rebuilding the environment. Assuming the conda environment is calledmy_env
, the export command will be,conda env export -n my_env > environment.yml
Although the
conda
command installed viaAnaconda
is still free, I personally think it is better to uninstall it and install theconda-forge
version ofconda
, the reason being that theAnaconda
version ofconda
is using thedefaults
channel by default and it is highly possible then to hit thenon-free
zone if we are still using theAnaconda
version ofconda
for virtual environments management and packages installation. To uninstallconda
, refer to the instructions here.Install a new virtual environment management tool. The
conda-forge
version ofconda
is for sure a very good solution, since most of theconda
commands that we used before will stay the same. For the installation, see here.Rebuild the environment using the new installed
conda
with the previously exportedYAML
file,conda env create -f environment.yml
Another alternative environment too to choose is
mamba
, which is also provided byconda-forge
. Installation instructions can be found here. The procedure for rebuilding conda environments will be the same as above,except
that we need to replaceconda
withmamba
in those commands.
Q: I heard of tools like ‘pixi’, etc. Should we worry about using them?
A: No, if you don’t want to. Having the conda-forge
version of conda
or the mamba
tool should be good enough for our needs. But if you do want to give it a try, check out their website here. Also, quite a few other options exist, including poetry and uv.