Dear Python enthusiasts,
we post in this thread the transcript of the Questions & Answers sessions that took place during the live webinars on “Interactive Bioimage Analysis with Python and Jupyter” from the NEUBIAS Academy. In this first post you will find the content of the May 7th 2020 session. The content of the May 13th session can be found further down in this thread.
To simplify browsing, questions have been grouped thematically and the answers slightly edited or completed. Credit for answering all questions goes to the moderating team:
- Mykhaylo Vladymyrov, Theodor Kocher Institut, Bern University
- Cédric Vonesch, Science IT Support, Bern University
- Dominik Kutra, Kreshuk Group, EMBL Heidelberg
Don’t hesitate to post your comments, ask for clarifications or correct any errors we might have done!
The webinar session can be viewed on Youtube here for the May 7th session and here for the May 13th session. The complete course material is available on the neubias_academy_biapy GitHub repository and the course slides are also available online. Note also that you will be able to run the course material interactively also beyond the live course period via mybinder.org or Google Colab.
Table of contents
Can jupyter show the list of variable and function defined by the user, like other IDE do (Matlab, Spider)?
Is there a quick way to know what’s the variable states (as in matlab for example)?
Jupyter has an extension for this: https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/varInspector/README.html. Jupyterlab also has such an extension: https://github.com/lckr/jupyterlab-variableInspector
Can I also execute code other than python, such as shell script, in the notebook?
You can execute any shell command, if you start the line with exclamation mark: !ls. You can also run an entire cell as shell commands if the first line is e.g. %%bash. Alternatively any process can be started using python methods for spawning a process
How can I get the kernels in different conda environments displayed in the jupyter notebook?
You cannot switch environment from jupyter. Instead you spawn a Jupyter instance from the desired environment. You can however switch kernels (e.g. python versions) and have them displayed as options following these instructions: https://ipython.readthedocs.io/en/latest/install/kernel_install.html#kernels-for-different-environments
Somehow this should be super easy, but I cannot find it: how can you change to another harddisk with Jupyter Notebooks?
There is a configuration file in your user directory (under Windows: C:\Users\username.jupyter) called jupyter_notebook_config.py where you can define the base directory for Jupyter using the variable c.NotebookApp.notebook_dir
Hello. Are there any advantages for jupyter notebook over google colabs?
For the present course: the Jupyter solution on Binder is completely “installation-free”, no login, to packages to install, no data to download. This comes at the price of shorter sessions and less computing power. In general: you can’t install colab on your computer or on a cluster, so you are limited by what Google is offering you.
If I have another code done using another notebook (Atom for example), how can we take it to and run it from jupyter? thanks
If you have python code, you can either import functions from it as a module to your Jupyter notebook, or directly copy the code, if you want to continue developing this part.
So what is actually saved if the variables are not defined? what I save is not the code?
The source code is saved. But if you don’t have a running instance of python in the backend, the variables (allocated memory and variable state) do not exist. They are created when this code is run in a python instance.
How do I tell Jupyter where it should run (locally, server, cloud)?
Basically Jupyter is a kernel + a web server. It can be set up locally or on a server. Then you connect to it via a browser.
Let’s say that someone shared the code with me, it has different sections/function with the main one. how can I run this in Jupyter? I run only the main part? or all the sections/functions separately?
Typically you would go through each cell starting from the beginning by typing Shift+Enter. In the Kernel menu you can also use Restart & Run All to evaluate the entire notebook.
Can you run r code together with python (in a different chunk) with jupyter notebooks?
Yes you can mix R and Python code in a single notebook. You can simply run a cell as R using %%R in the first line of the cell. But using the rpy2 package (https://pypi.org/project/rpy2/) you can even share variables between R and Python.
Is there a way to import an already existing python code to a notebook of Jupyter?
There are multiple ways, as always. If you have only some code, then you might consider copy-pasting it. If you want to use your code in many notebooks, then it might make sense to package it properly. You can build your own conda packages! Also, if you have installed conda-build (also needed for building packages) you can “install” your package in development mode to make it available (conda develop .). If you have your code in the same file as the notebook, you can also import functions, classes from it with import statements.
Are two different notebooks totally hermetic entities or can I access variables defined in a notebook from a second notebook ?
They are independent. Good practice is to make a module out of the code part which is reused in multiple notebooks and then just import them.
Are there any unique advantages to using python over other programming languages, such as Java?
There are many reasons to use the one or the other programming language. Python might not produce the fastest code, but Python has several advantages to other languages. One of the main reasons to use it is the big community. There are so many open source packages, people that might have run into the same problem as you (so there might be help online already). Also Python is considered to have a relatively intuitive syntax. But if you work in an institution where everyone uses a e.g. Java, it might make more sense to go for that, because you can get help more easily, locally.
Is there an online jupyter to avoid local installation?
Yes, there is binder. You can follow the complete course there.
Does the repo needs to be in public on GitHub to be able running binder?
In mybinder.org - only public ones. Check here: https://stackoverflow.com/questions/54648514/how-can-one-use-binder-mybinder-org-with-private-github-repositories
Is binder only for python?
Can you use Binder to make versioning of a pipeline directly back to GitHub?
Technically possible but not recommended, see https://mybinder.readthedocs.io/en/latest/faq.html#can-i-push-data-from-my-binder-session-back-to-my-repository
The my_array[1,:] returns a list or a tuple?
You will get a numpy.array
Can you run napari in Jupyter Notebook?
Napari uses Qt as a graphical user interface, so it will not run in the browser. You can still launch napari from jupyter, it will open in a new window.
Can we run napari remotely, in a local server (i.e. HPC) ?
Not via X11 forwarding. Apparently developers are working on a solution, but currently not, unless you do remote desktop to server.
For video it is only possible to work with using napari?
None of the presented packages is designed to work with videos. Such data should first be converted to series of images. For more details see e.g. https://scikit-image.org/docs/dev/user_guide/video.html
Does Cellpose run in 3D, very large (i.e. 50GB) datasets, assuming that we do have enough hardware and multiple GPUs?
Yes cellpose can run in 3D. The computational time will only depend on your computational resources. But keep in mind that 3D segmentation is very computing intensive.
Can we use jupyter for imageJ macro scripting?
Yes, as ImageJ has a python interface. This will be briefly covered in the sessions
How use numpy with ImageJ ?
You would use a library called pyimagej https://pypi.org/project/pyimagej/
For pyimagej - do you need to write the fiji macro in python or can you use fiji macro language?
You can write a macro in pure Fiji-macro language as a string and execute it using pyimagej.
How robust is PyimageJ ? Because MIJI (imageJ-Matlab) is sometimes buggy…
PyImageJ is still in development so stability cannot be guaranteed. If you run into trouble, post an issue directly on GitHub (https://github.com/imagej/pyimagej) or on the image.sc forum.
Why to use PyImageJ and not python-based Fiji scripting
In Python-based Fiji scripting, you do not have access to Python packages such as Numpy, you just use Python as a scripting language. So the gain in functionality is very small.
Thanks for the very insightful answer! As far as you know, do you think most java plugins available in Fiji can be called from python ?
They should work as long as you can use them in headless mode. There are apparently issues with some ImageJ1-based plugins (see e.g. https://github.com/imagej/pyimagej/issues/22)
On specific 2D and 3D functions - 1) Fourier transform 2) skeletonizing 3) distance maps 4) Voronoi tessellation 5) triangulation 6) graph theory; is it all well compatible? Thanks!
There are libraries available in python to perform these operations. Some of them are directly implemented in scitkit-image (e.g. skeletonize), several others in the scipy package (e.g. distance transform https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.ndimage.morphology.distance_transform_edt.html). Specific functions for graph theory are available in the Networkx package (https://networkx.github.io/). Also some operations are available directly from the ImageJ python API, which will be presented in the course
Which package would you recommend to select and work with ROI in an image?
Currently the best solution to do interactive work directly on images, such as manually selecting ROIs, is napari.
Is it possible to do cytometry analysis in python?
This package allows to do flow cytometry analyis: https://github.com/bpteague/cytoflow
Are there any packages for 3D point cloud rendering?
Both ipyvolume and napari can render point clouds.
Will the presenter adn NEUBIAS team be willing to do a third session for phenograph and other similar packages in python for Maldi-Cytof?
We do not plan at the moment to address such specific topics.
Do you know any 3D neuron segmentation algorithms?
3D segmentation is a very hard problem in general and neurons are especially difficult to segment because of their shape. The best methods of course rely today on deep learning. You can find here an example of such an approach: https://www.biorxiv.org/content/10.1101/200675v1.full and code https://github.com/google/ffn. These methods are difficult to use but there are attempts at making them more user-friendly, see here for example: https://github.com/urakubo/UNI-EM
Can we use these tools for microscopy videos (large time-stack of images)? Any recommendations for this?
Yes you can use these tools on very large datasets. The only limitations is the computing power you have access to.
In Matlab, the support of both multi-core and GPU parallelization is very good. One can even apply them together if GPU memory is enough. How is that in Python? Thanks!
Python itself is a multi-purpose language. In general Python code is bound to be run in a single thread. However, many of the widely-used libraries are written in lower level languages, compiled for the specific platform and accessible from Python. These often are optimized/parallelized (e.g. numpy, scikit-image). With numpy, for example, you have to employ specific strategies to write fast running Python code (avoid loops, vectorization - but same is true for Matlab). Furthermore, Python offers some nice ways to parallelize python code as well (e.g. Cython, where you can write parallel for-loops with ease - not that this code then has to be compiled). There are also several libraries that use the GPU (e.g. pytorch). A very nice package to exploit mullti-cores efficiently irrespectively of whether you run code on a laptop or a cluster is Dask https://dask.org/
This is very stupid question but why choose Jupyter over Python? Isn’t it faster to run directly in Python programming environment or even in Ipython?
There are not stupid questions. This goes a bit beyond the course scope. Jupyter gives you the convenience of coding in the browser. You can conveniently edit/re-edit cells. Furthermore, it allows inline plotting, documentation with markdown. So it might be easier to go back to it and rerun. Running python might be faster, also creating command line tools is not that hard, so you can probably apply such a command line tool to new data more easily. Ipython has the same basic infrastructure, so you also connect to a kernel.
Is it possible to compile and run code on GPU rather than CPU locally? I think colab has the option but does jupyter notebook handle this? Secondly, is it beneficial to compile on GPU?
You can (of course if you have an appropriate GPU). There are python interface in CUDA itself, if you want to go to the low level, and many libraries have GPU support as well. GPUs are an inescapable tool for massive processing if the algorithm can be parallelized. image processing and deep nets training are just few examples. Tensorflow and PyTorch are examples of Python software that can use GPU computing.
I’d like to know what you think about managing multi-TB imaging data sets, if you don’t otherwise have a cluster, etc., available
Handling such large datasets without larger-scale computing resources is going to be very difficult.
Maybe early to ask about this, but I am wondering about using this approach with 5D imaging data (XYZTC). Representing 2D (XY) images as matrices is pretty straightforward, but I’ve not found much in the way of documentation for importing and processing 3D+ data
Numpy has a native support for multidimensional arrays (ndarray). Many processing functions have both 2D and 3D versions. There are examples in the course =)
Can we work on notebook with raw data directly instead of .TIFT data.? (eg. from LSM780 confocal microscope, or Nikon Ti-widefield)
There is a Python wrapper for the BioFormats library, which should allow you to import most microscopy image formats. Also there are specific data importers for most image types (see the notebook about importing package).
Is it possible to use scikit-image in anaconda?
Yes, scikit-image is included in the standard anaconda distribution
Can you please provide more details about deploying a Jupyter instance on a server/cluster?
This goes a bit beyond the course scope. A good starting point can be found here: https://jupyter-docker-stacks.readthedocs.io/en/latest/. If you only wish to run Jupyter for yourself on a cluster, you can just install it from the command line via conda for example.
If I want users, biologists with basic programming experience, actively use such notebooks. What is the best way to do that in your experience?
Two choices: either entirely skip the installation problem and setup a JupyterHub either on a local IT resource or on a remote machine (e.g. Google Compute Engine, AWS) and have people simply login to the service. One of the simplest ways to install a JupyterHub is The Littlest JupyterHub (TLJH) (http://tljh.jupyter.org/en/latest/index.html. Or have them install Anaconda. This is really like installing any other software. And then send them e.g. en envrionment.yml file to install the correct packages (see https://docs.anaconda.com/anaconda/navigator/tutorials/manage-environments/#importing-an-environment)
Installing packages will install it permanently or only for the current session?
It depends in what situation: 1. if you install packages on your own computer it is permanent. Depending whether you use virtual environments (e.g. as those provided by conda) you can install multiple versions of the same package in each environment. 2. If you install packages on Binder or Colab they are only available for that specific session.
So pip is through the notebook and conda through the terminal? But using conda - do you still have to refer to packages in the notebook? Or they are just there?
You can run both from jupyter. Basically you can run any shell command form jupyter, if you start the line with exclamation mark: !ls. Note that if you use pip from the notebook, installation will be done in the environment that was activated to run Jupyter. Also if you use conda in the notebook, note that you should use the -y flag to agree to installation by default, as you can’t type an answer from the notebook.
What are the basic differences between virtual environment in python and conda?
Conda both manages the virtual environment and installs packages. It also allows you to install non-Python software. Finally conda takes care of analyzing the dependencies of the packages you want to install to avoid version conflicts.
What is an ‘environment’?
An environment is a set of packages at specific versions. Pip and Conda allow you to work with different environments, if you have multiple projects that depend on different versions of a given package.
Can I create a virtual environment just by writing the that conda line on a python file?
You can run any shell command form Jupyter, if you start the line with exclamation mark: !ls or !pip instal …, So you can CREATE an environment. Yet you cannot SWITCH to that environment. You would need to spawn an instance of Jupyter in that environment.
As user in anaconda, do you recommend to use the root session or always duplicate one? I had issues running some packages when using the root
If you work on independent projects it’s a good idea to have an environment for each, so versions of modules don’t conflict and get messed up. So it’s about keeping environments tidy rather than not using root. Amended by @jni: Virtually all experienced Python users I know use tidy, specific environments for only specific tasks, but generally work in a “catch-all” environment where everything gets installed and updated to the latest versions, since this works well most times. However, it’s possible to break an environment with sufficient hammering with new and experimental packages, particularly when mixing conda and pip installs, and when it does break, it’s good if that isn’t the base/root environment, because then it is very cheap to burn it down and start again. So I would indeed recommend users to have a clean base miniconda install, where only conda and pip ever get updated, and then use another environment as their “quasi-root”. And this environment can be activated automatically with their bashrc.
Do you have any recommendation, which Jupyter version is better to use (or shall we just take the latest one )?
Using the latest anaconda distribution is always a good bet.
Is Jupyter better than Spyder? What are the main differences?
Not enough experience with Spyder to properly answer.
What would be your suggestion as folder root and structure ? I see you are using Onedrive. Is this OK when you switch from laptop to desktop, from work to home, etc… I had actually a hard time setting up Jupyter if I have files at several places.
The installed modules and their versions might differ on different systems, but you can setup environments with required modules and versions to keep environment same. In terms of python files themselves, it’s rather matter of taste. But it’s always a good idea to use a version control such a s git
Do I need to use Binder or Google Colab if I already have Anaconda installed?
You can of course run on your local anaconda installation.
Main reasons you would you recommend Python for an intensive Fiji user?
The main reason is to be able to exploit the data science capabilities of Python. Usually Fiji users export data in an other software (R, Excel) to do the data analysis and plotting part of a project. With Python you can go from image import to final plot within a single notebook. Of course using PyimgeJ you can integrate your Fiji routines into a notebook.
What is the advantage of using scikit over other image processing tools such as Fiji/ImageJ?
Each language has it’s own libraries. If you are proficient in Fiji and Java, no reason to switch. If you want to explore Python, scikit-image is the reference.
What about R shiny to develop app, do you think widget in python are better to make user interactive ?
If your stack is based on R then R shiny is the way to go. R is very good to work with tabular data. In Python you also can work very efficiently with tabular data, but also have access to other libraries, e.g. for image processing and combine this data. Interaction should be comparable.