[NEUBIAS Academy@Home] Webinar “Interactive Bioimage Analysis with Python and Jupyter” + Questions & Answers

Dear Python enthusiasts,

we post in this thread the transcript of the Questions & Answers sessions that took place during the live webinars on “Interactive Bioimage Analysis with Python and Jupyter” from the NEUBIAS Academy. In this first post you will find the content of the May 7th 2020 session. The content of the May 13th session can be found further down in this thread.

To simplify browsing, questions have been grouped thematically and the answers slightly edited or completed. Credit for answering all questions goes to the moderating team:

  • Mykhaylo Vladymyrov, Theodor Kocher Institut, Bern University
  • Cédric Vonesch, Science IT Support, Bern University
  • Dominik Kutra, Kreshuk Group, EMBL Heidelberg

Don’t hesitate to post your comments, ask for clarifications or correct any errors we might have done!

The webinar session can be viewed on Youtube here for the May 7th session and here for the May 13th session. The complete course material is available on the neubias_academy_biapy GitHub repository and the course slides are also available online. Note also that you will be able to run the course material interactively also beyond the live course period via mybinder.org or Google Colab.

Table of contents

Jupyter

Can jupyter show the list of variable and function defined by the user, like other IDE do (Matlab, Spider)?

Is there a quick way to know what’s the variable states (as in matlab for example)?

Jupyter has an extension for this: https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/varInspector/README.html. Jupyterlab also has such an extension: https://github.com/lckr/jupyterlab-variableInspector

Can I also execute code other than python, such as shell script, in the notebook?

You can execute any shell command, if you start the line with exclamation mark: !ls. You can also run an entire cell as shell commands if the first line is e.g. %%bash. Alternatively any process can be started using python methods for spawning a process

How can I get the kernels in different conda environments displayed in the jupyter notebook?

You cannot switch environment from jupyter. Instead you spawn a Jupyter instance from the desired environment. You can however switch kernels (e.g. python versions) and have them displayed as options following these instructions: https://ipython.readthedocs.io/en/latest/install/kernel_install.html#kernels-for-different-environments

Somehow this should be super easy, but I cannot find it: how can you change to another harddisk with Jupyter Notebooks?

There is a configuration file in your user directory (under Windows: C:\Users\username.jupyter) called jupyter_notebook_config.py where you can define the base directory for Jupyter using the variable c.NotebookApp.notebook_dir

Hello. Are there any advantages for jupyter notebook over google colabs?

For the present course: the Jupyter solution on Binder is completely “installation-free”, no login, to packages to install, no data to download. This comes at the price of shorter sessions and less computing power. In general: you can’t install colab on your computer or on a cluster, so you are limited by what Google is offering you.

If I have another code done using another notebook (Atom for example), how can we take it to and run it from jupyter? thanks

If you have python code, you can either import functions from it as a module to your Jupyter notebook, or directly copy the code, if you want to continue developing this part.

So what is actually saved if the variables are not defined? what I save is not the code?

The source code is saved. But if you don’t have a running instance of python in the backend, the variables (allocated memory and variable state) do not exist. They are created when this code is run in a python instance.

How do I tell Jupyter where it should run (locally, server, cloud)?

Basically Jupyter is a kernel + a web server. It can be set up locally or on a server. Then you connect to it via a browser.

Let’s say that someone shared the code with me, it has different sections/function with the main one. how can I run this in Jupyter? I run only the main part? or all the sections/functions separately?

Typically you would go through each cell starting from the beginning by typing Shift+Enter. In the Kernel menu you can also use Restart & Run All to evaluate the entire notebook.

Can you run r code together with python (in a different chunk) with jupyter notebooks?

Yes you can mix R and Python code in a single notebook. You can simply run a cell as R using %%R in the first line of the cell. But using the rpy2 package (https://pypi.org/project/rpy2/) you can even share variables between R and Python.

Is there a way to import an already existing python code to a notebook of Jupyter?

There are multiple ways, as always. If you have only some code, then you might consider copy-pasting it. If you want to use your code in many notebooks, then it might make sense to package it properly. You can build your own conda packages! Also, if you have installed conda-build (also needed for building packages) you can “install” your package in development mode to make it available (conda develop .). If you have your code in the same file as the notebook, you can also import functions, classes from it with import statements.

Are two different notebooks totally hermetic entities or can I access variables defined in a notebook from a second notebook ?

They are independent. Good practice is to make a module out of the code part which is reused in multiple notebooks and then just import them.

Python

Are there any unique advantages to using python over other programming languages, such as Java?

There are many reasons to use the one or the other programming language. Python might not produce the fastest code, but Python has several advantages to other languages. One of the main reasons to use it is the big community. There are so many open source packages, people that might have run into the same problem as you (so there might be help online already). Also Python is considered to have a relatively intuitive syntax. But if you work in an institution where everyone uses a e.g. Java, it might make more sense to go for that, because you can get help more easily, locally.

Binder

Is there an online jupyter to avoid local installation?

Yes, there is binder. You can follow the complete course there.

Does the repo needs to be in public on GitHub to be able running binder?

In mybinder.org - only public ones. Check here: https://stackoverflow.com/questions/54648514/how-can-one-use-binder-mybinder-org-with-private-github-repositories

Is binder only for python?

Not only: https://mybinder.readthedocs.io/en/latest/using.html

Can you use Binder to make versioning of a pipeline directly back to GitHub?

Technically possible but not recommended, see https://mybinder.readthedocs.io/en/latest/faq.html#can-i-push-data-from-my-binder-session-back-to-my-repository

Numpy

The my_array[1,:] returns a list or a tuple?

You will get a numpy.array

napari

Can you run napari in Jupyter Notebook?

Napari uses Qt as a graphical user interface, so it will not run in the browser. You can still launch napari from jupyter, it will open in a new window.

Can we run napari remotely, in a local server (i.e. HPC) ?

Not via X11 forwarding. Apparently developers are working on a solution, but currently not, unless you do remote desktop to server.

For video it is only possible to work with using napari?

None of the presented packages is designed to work with videos. Such data should first be converted to series of images. For more details see e.g. https://scikit-image.org/docs/dev/user_guide/video.html

Cellpose

Does Cellpose run in 3D, very large (i.e. 50GB) datasets, assuming that we do have enough hardware and multiple GPUs?

Yes cellpose can run in 3D. The computational time will only depend on your computational resources. But keep in mind that 3D segmentation is very computing intensive.

PyImageJ

Can we use jupyter for imageJ macro scripting?

Yes, as ImageJ has a python interface. This will be briefly covered in the sessions

How use numpy with ImageJ ?

You would use a library called pyimagej https://pypi.org/project/pyimagej/

For pyimagej - do you need to write the fiji macro in python or can you use fiji macro language?

You can write a macro in pure Fiji-macro language as a string and execute it using pyimagej.

How robust is PyimageJ ? Because MIJI (imageJ-Matlab) is sometimes buggy…

PyImageJ is still in development so stability cannot be guaranteed. If you run into trouble, post an issue directly on GitHub (https://github.com/imagej/pyimagej) or on the image.sc forum.

Why to use PyImageJ and not python-based Fiji scripting

In Python-based Fiji scripting, you do not have access to Python packages such as Numpy, you just use Python as a scripting language. So the gain in functionality is very small.

Thanks for the very insightful answer! As far as you know, do you think most java plugins available in Fiji can be called from python ?

They should work as long as you can use them in headless mode. There are apparently issues with some ImageJ1-based plugins (see e.g. https://github.com/imagej/pyimagej/issues/22)

Applications

On specific 2D and 3D functions - 1) Fourier transform 2) skeletonizing 3) distance maps 4) Voronoi tessellation 5) triangulation 6) graph theory; is it all well compatible? Thanks!

There are libraries available in python to perform these operations. Some of them are directly implemented in scitkit-image (e.g. skeletonize), several others in the scipy package (e.g. distance transform https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.ndimage.morphology.distance_transform_edt.html). Specific functions for graph theory are available in the Networkx package (https://networkx.github.io/). Also some operations are available directly from the ImageJ python API, which will be presented in the course

Which package would you recommend to select and work with ROI in an image?

Currently the best solution to do interactive work directly on images, such as manually selecting ROIs, is napari.

Is it possible to do cytometry analysis in python?

This package allows to do flow cytometry analyis: https://github.com/bpteague/cytoflow

Are there any packages for 3D point cloud rendering?

Both ipyvolume and napari can render point clouds.

Will the presenter adn NEUBIAS team be willing to do a third session for phenograph and other similar packages in python for Maldi-Cytof?

We do not plan at the moment to address such specific topics.

Do you know any 3D neuron segmentation algorithms?

3D segmentation is a very hard problem in general and neurons are especially difficult to segment because of their shape. The best methods of course rely today on deep learning. You can find here an example of such an approach: https://www.biorxiv.org/content/10.1101/200675v1.full and code https://github.com/google/ffn. These methods are difficult to use but there are attempts at making them more user-friendly, see here for example: https://github.com/urakubo/UNI-EM

Can we use these tools for microscopy videos (large time-stack of images)? Any recommendations for this?

Yes you can use these tools on very large datasets. The only limitations is the computing power you have access to.

Computing

In Matlab, the support of both multi-core and GPU parallelization is very good. One can even apply them together if GPU memory is enough. How is that in Python? Thanks!

Python itself is a multi-purpose language. In general Python code is bound to be run in a single thread. However, many of the widely-used libraries are written in lower level languages, compiled for the specific platform and accessible from Python. These often are optimized/parallelized (e.g. numpy, scikit-image). With numpy, for example, you have to employ specific strategies to write fast running Python code (avoid loops, vectorization - but same is true for Matlab). Furthermore, Python offers some nice ways to parallelize python code as well (e.g. Cython, where you can write parallel for-loops with ease - not that this code then has to be compiled). There are also several libraries that use the GPU (e.g. pytorch). A very nice package to exploit mullti-cores efficiently irrespectively of whether you run code on a laptop or a cluster is Dask https://dask.org/

This is very stupid question but why choose Jupyter over Python? Isn’t it faster to run directly in Python programming environment or even in Ipython?

There are not stupid questions. This goes a bit beyond the course scope. Jupyter gives you the convenience of coding in the browser. You can conveniently edit/re-edit cells. Furthermore, it allows inline plotting, documentation with markdown. So it might be easier to go back to it and rerun. Running python might be faster, also creating command line tools is not that hard, so you can probably apply such a command line tool to new data more easily. Ipython has the same basic infrastructure, so you also connect to a kernel.

Is it possible to compile and run code on GPU rather than CPU locally? I think colab has the option but does jupyter notebook handle this? Secondly, is it beneficial to compile on GPU?

You can (of course if you have an appropriate GPU). There are python interface in CUDA itself, if you want to go to the low level, and many libraries have GPU support as well. GPUs are an inescapable tool for massive processing if the algorithm can be parallelized. image processing and deep nets training are just few examples. Tensorflow and PyTorch are examples of Python software that can use GPU computing.

I’d like to know what you think about managing multi-TB imaging data sets, if you don’t otherwise have a cluster, etc., available

Handling such large datasets without larger-scale computing resources is going to be very difficult.

Data

Maybe early to ask about this, but I am wondering about using this approach with 5D imaging data (XYZTC). Representing 2D (XY) images as matrices is pretty straightforward, but I’ve not found much in the way of documentation for importing and processing 3D+ data

Numpy has a native support for multidimensional arrays (ndarray). Many processing functions have both 2D and 3D versions. There are examples in the course =)

Can we work on notebook with raw data directly instead of .TIFT data.? (eg. from LSM780 confocal microscope, or Nikon Ti-widefield)

There is a Python wrapper for the BioFormats library, which should allow you to import most microscopy image formats. Also there are specific data importers for most image types (see the notebook about importing package).

Installation

Is it possible to use scikit-image in anaconda?

Yes, scikit-image is included in the standard anaconda distribution

Can you please provide more details about deploying a Jupyter instance on a server/cluster?

This goes a bit beyond the course scope. A good starting point can be found here: https://jupyter-docker-stacks.readthedocs.io/en/latest/. If you only wish to run Jupyter for yourself on a cluster, you can just install it from the command line via conda for example.

If I want users, biologists with basic programming experience, actively use such notebooks. What is the best way to do that in your experience?

Two choices: either entirely skip the installation problem and setup a JupyterHub either on a local IT resource or on a remote machine (e.g. Google Compute Engine, AWS) and have people simply login to the service. One of the simplest ways to install a JupyterHub is The Littlest JupyterHub (TLJH) (http://tljh.jupyter.org/en/latest/index.html. Or have them install Anaconda. This is really like installing any other software. And then send them e.g. en envrionment.yml file to install the correct packages (see https://docs.anaconda.com/anaconda/navigator/tutorials/manage-environments/#importing-an-environment)

Installing packages will install it permanently or only for the current session?

It depends in what situation: 1. if you install packages on your own computer it is permanent. Depending whether you use virtual environments (e.g. as those provided by conda) you can install multiple versions of the same package in each environment. 2. If you install packages on Binder or Colab they are only available for that specific session.

So pip is through the notebook and conda through the terminal? But using conda - do you still have to refer to packages in the notebook? Or they are just there?

You can run both from jupyter. Basically you can run any shell command form jupyter, if you start the line with exclamation mark: !ls. Note that if you use pip from the notebook, installation will be done in the environment that was activated to run Jupyter. Also if you use conda in the notebook, note that you should use the -y flag to agree to installation by default, as you can’t type an answer from the notebook.

What are the basic differences between virtual environment in python and conda?

Conda both manages the virtual environment and installs packages. It also allows you to install non-Python software. Finally conda takes care of analyzing the dependencies of the packages you want to install to avoid version conflicts.

What is an ‘environment’?

An environment is a set of packages at specific versions. Pip and Conda allow you to work with different environments, if you have multiple projects that depend on different versions of a given package.

Can I create a virtual environment just by writing the that conda line on a python file?

You can run any shell command form Jupyter, if you start the line with exclamation mark: !ls or !pip instal …, So you can CREATE an environment. Yet you cannot SWITCH to that environment. You would need to spawn an instance of Jupyter in that environment.

As user in anaconda, do you recommend to use the root session or always duplicate one? I had issues running some packages when using the root

If you work on independent projects it’s a good idea to have an environment for each, so versions of modules don’t conflict and get messed up. So it’s about keeping environments tidy rather than not using root. Amended by @jni: Virtually all experienced Python users I know use tidy, specific environments for only specific tasks, but generally work in a “catch-all” environment where everything gets installed and updated to the latest versions, since this works well most times. However, it’s possible to break an environment with sufficient hammering with new and experimental packages, particularly when mixing conda and pip installs, and when it does break, it’s good if that isn’t the base/root environment, because then it is very cheap to burn it down and start again. So I would indeed recommend users to have a clean base miniconda install, where only conda and pip ever get updated, and then use another environment as their “quasi-root”. And this environment can be activated automatically with their bashrc.

Do you have any recommendation, which Jupyter version is better to use (or shall we just take the latest one )?

Using the latest anaconda distribution is always a good bet.

General

Is Jupyter better than Spyder? What are the main differences?

Not enough experience with Spyder to properly answer.

What would be your suggestion as folder root and structure ? I see you are using Onedrive. Is this OK when you switch from laptop to desktop, from work to home, etc… I had actually a hard time setting up Jupyter if I have files at several places.

The installed modules and their versions might differ on different systems, but you can setup environments with required modules and versions to keep environment same. In terms of python files themselves, it’s rather matter of taste. But it’s always a good idea to use a version control such a s git

Do I need to use Binder or Google Colab if I already have Anaconda installed?

You can of course run on your local anaconda installation.

Main reasons you would you recommend Python for an intensive Fiji user?

The main reason is to be able to exploit the data science capabilities of Python. Usually Fiji users export data in an other software (R, Excel) to do the data analysis and plotting part of a project. With Python you can go from image import to final plot within a single notebook. Of course using PyimgeJ you can integrate your Fiji routines into a notebook.

What is the advantage of using scikit over other image processing tools such as Fiji/ImageJ?

Each language has it’s own libraries. If you are proficient in Fiji and Java, no reason to switch. If you want to explore Python, scikit-image is the reference.

What about R shiny to develop app, do you think widget in python are better to make user interactive ?

If your stack is based on R then R shiny is the way to go. R is very good to work with tabular data. In Python you also can work very efficiently with tabular data, but also have access to other libraries, e.g. for image processing and combine this data. Interaction should be comparable.

8 Likes

Thanks all for this wonderful resource! I would personally amend one of the answers:

Virtually all experienced Python users I know use tidy, specific environments for only specific tasks, but generally work in a “catch-all” environment where everything gets installed and updated to the latest versions, since this works well most times. However, it’s possible to break an environment with sufficient hammering with new and experimental packages, particularly when mixing conda and pip installs, and when it does break, it’s good if that isn’t the base/root environment, because then it is very cheap to burn it down and start again. So I would indeed recommend users to have a clean base miniconda install, where only conda and pip ever get updated, and then use another environment as their “quasi-root”. And this environment can be activated automatically with their bashrc.

So, to be short, I would say that it is also about not using root, so that root always works.

1 Like

Thank you very much for the course. I found it really interesting. Indeed, I’m trying to start using the notebooks on my data. However, I got stuck on the first cell :sweat_smile:

I would like to use Binder in my own GitHub repo. I have tried, but I am not able to import modules such as panda, numpy… I have seen that I need to create an environment.yml file on my repo to do that. I tried just downloading the one from your course repo to add it to mine (just changed the name). However, it still doesn’t work. I would really appreciate if you could let me know what I’m missing.

Thank you very much in advance.

You need to place the environment file either in the main directory or in a directory called binder (your folder is called Environment). These are the places where the binder service is looking for installation files (see https://mybinder.readthedocs.io/en/latest/introduction.html#what-is-a-binder). Note that if you only need 1-2 packages like numpy and pandas, you can also just open a notebook in your session and install from there by typing:

!pip install pandas
!pip install numpy

Now it works, thank you very much

Hi Guillaume,

I was working through ‘04-Image Import’ notebook and got a bit confused with input-output. In section 4.2.1 you demonstrate two ways to pull a subset of images containing d.tif in the filename. However, the display of the images appear in a different order for each of the methods. I was curious as to why and did my best to add the image title to each image plot. I failed to do this, but did manage to list the raw image data for each title. Using this, the data appear to be read into the array in the same order, yet the images are plotted in a different order. How would one know which is which i such a case?

Hi Glyn,

Yes indeed the two methods sort the file names differently. In the first case you can see the filenames in the d_channel variable which is a list of paths. This is also the variable that you should use to add as a title and not channel1_list which is a list of images (numpy arrays). In the second method you can recover the filenames list using the files attribute, i.e. mycollection.files. I’m attaching a screenshot with the relevant code.

Guillaume

That makes sense, I knew I was doing something silly! Many thanks. I enjoyed the course and thought it was well put together- many thanks for doing it.

Glyn

Dear Python enthusiasts,

here is the transcript of the Questions & Answers session that took place on May 13th during the second part of the live webinar on “Interactive Bioimage Analysis with Python and Jupyter” from the NEUBIAS Academy @Home series.

Table of contents

Jupyter

Is it possible to run rmarkdown in Jupyter? So we can output everything nice formatted in PDF (throughout Latex)?

The markdown cells in Jupiter natively support latex formatting. Also the whole notebook can be exported as pdf, html, tex, etc.

Guillaume answered in PART 1 that actually stopping/closing the notebook cause variable values loss. Is there any extension in JupyterLab or do you see a practicable way to save “project” status to allow storage of variables like in MATLAB workspace?

There are no extensions to do that. A very common solution for that problem is to use pickle to save a series of variables that can then be reloaded at a later time point. https://docs.python.org/3/library/pickle.html

I keep having an error in Jupyter lab when I want to open a notebook from there. Clicking the launcher symbol for a new notebook creates a new one and starts the notebook in the background, but only results in a pop up window saying:
"Launcher Error
nb.rendermime is undefined
Can you suggest something on this please? Thanks"

Try reinstalling the environment.

scikit-image

I have some questions about the materials. First of all, regarding the thresholding: what does the “selem” argument in some skimage.filters functions do exactly? (ie, skimage.filters.median)

You can specify region surrounding the pixel which will be used for analysis. It stands for structure element. See here https://scikit-image.org/docs/dev/api/skimage.filters.html#skimage.filters.median and here https://scikit-image.org/docs/0.7.0/api/skimage.morphology.selem.html

Does scikit have functions to analyze spot properties? For example, if I track single molecules, could I fit them to a 2D gaussian to get the centroid position?

Function fitting is usually done using scipy. See for example https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html. Note that there are many packages specifically dedicated to single molecule microscopy, see http://bigwww.epfl.ch/smlm/software/

PyImageJ

Can I mix imagej scripting (in ij macro language or whatever language) with python ? Can I also call imageJ plugins?

You can use the ij.py.run_macro function to execute a macro from python. See the clij notebook for example.

Does the macro have to be written in Jython or can it be written in the imageJ macro language? Or either?

You can use the ij.py.run_macro function to execute a macro from python. See the clij notebook for example.

Visualization

Unfortunately I did not watch first part. Could you explain way to visualize and process 3d volume like micro-CT

Please feel free to go through the slides and the Jupyter notebooks (you can find links in chat). There are several several tools such as ipyvolume and napari which can be used for visualization of 3d volumes.

How can you see the labels in the image?

You can display the overlay as shown in chapter 6:
image_labeled = label(image)
plt.imshow(image)
plt.imshow(image_labeled, cmap = cmap);

Going a little ahead of what is being shown right now, but will we go over how to display time series and z-series in a interactive way using Python? (Like in ImageJ, with a slider allowing to go through all the images easily).

You can use ipywidgets as shown in 04-Image_import.ipynb to introduce interactivity.
Napari is another good tool for visualizations.

Applications

Could you suggest a strategy to register images from different imaging modes (ie, EM and Fluorescence, or scattering and fluorescence)?

Very difficult to find a generic answer. Usually you would have to do two things: Find identical points in both images and then use a registration algorithm to match deform the image to a common base. There are some approaches I know from medical imaging (e.g. https://iplab.dmi.unict.it/miss14/MISS2014-ReadingGroup00-All-Paper.pdf) . If you can add fiducials in your data that are visible in all modalities, this makes the problem a lot easier. A term to search for is “multimodality registration” - maybe add microscopy to get some more specific results.

Can you recommend a ‘best’ tool (that uses machine learning) for segmentation of nuclei of human cells? I have tried Cellpose (via their website) & StarDist (on ImageJ) which seem to work very well. However, I have installation issues for both of these when I tried to implement these tools in Python/Jupyter. Is there a way to do similarly performing segmentation with Python packages (like scikit-learn for example)? I’ve also tried ilastik, but its segmentation performance is not as good as StarDist & Cellpose. I need to have well segmented images to allow good tracking of my cells

For nuclei stardist is a really good option, you can use it directly from python, you first need to install it of course:
pip install stardist. If you want to improve the stardist performance, you’ll need to retrain on your own data (with some annotations). Have a look at the stardist webinar! Cellpose also performs very well for nuclei and cells. If you have trouble with installation you can either write a post on the image.sc forum or directly post an “issue” on their respective GitHub repositories.

Is it possible to use ITK filters for 3d analysis here with scikit-image?

Yes, there are two main packages to run itk from python: itk and simpleitk. I would probably start with simpleitk. The last time I checked both packages have a sort of un-pythonic API, but the simpleitk one is a easier to use (but has limitations).

Regarding the Machine learning STARDIST and CellPose, is it possible to find training sets online?

The datasets use to train these algorithms are publicly available, and the references can be found in the corresponding publications. One main source for nuclei segmentation is the 2018 Data Science Bowl dataset then can be found here: https://data.broadinstitute.org/bbbc/BBBC038/

Data

What does the first dimension mean, which is set to have only 1 value?

STCZYX - sites, time, channel, zyx. Sites - are or multiposition locations

I’m trying to import AICSImage on my local system, but I can’t find any library starting with AICS in Anaconda. Any ideas? Thanks.

You can install it as described here: https://pypi.org/project/aicsimageio/. The AICSimageio package is not yet available on conda.

How do you access the metadata (such as information about scale, microns per pixel, microscope) in an image file (such as a TIFF) using a python program?

for tiff images, you can use the tifffile package. In particular, you can open the image as a tifffile.TiffFile - there you can access metadata, see e.g.
Python: copy all metadata from one multipage tif to another. The AICSimageio also provides access to metadata, as long as those are properly formatted see e.g. https://allencellmodeling.github.io/aicsimageio/#metadata-reading

How to ask for time stepping and xyz scale?

This depends on the format of the image. This information is usually supplied as metadata and you need a way to access it. This is slightly different for every file format. The AICSimageio package provides for example access to these sort of metadata.

How to import czi files or lif files for instance?

These formats should be supported via the Python wrapper for the BioFormats library https://pythonhosted.org/python-bioformats/

For BioFormats like Nikon’s .nd2 would you recommend converting the file into a sequence of TIFF files (for example one TIFF for each frame so that it would look like today’s example)? I read that Nikon Elements Viewer does this conversion for .nd2 and I imagine other companies have similar software. How is this different from using the Python wrapper for the BioFormats library and where can I find this wrapper?

Conceptually these are interchangeable. You should decide what better fits your purpose. You should avoid forcing users to perform extra conversions themselves (prone to errors and potentially loss of metainformation). Also extra conversion would lead to an increase of disk space usage. If you want to perform conversion as a part of performance optimization for (your) downstream processing, you should rather consider hdf5. Note that there is also a specific .nd2 reader for Python https://rbnvrw.github.io/nd2reader/

Does using Colab mean that you are making your data available to Google and is the data properly encrypted? I am asking in terms of recommended policies for data protection given by my university.

You don’t transfer any rights on the data. Yet you do upload it on a public service, so it might be prone to security issues. please consult your university’s legal service if you are dealing with sensitive data.

Installation

What’s the difference between anaconda and conda?

conda is the software that creates and manages environments. Anaconda is a collection of software that includes conda as well as many other scientific packages.

What should I do, if I want to use two packages with incompatible dependencies in one project?

There is always the option to build the package with conda-build yourself and try to resolve (change the code) to be compatible with a version of the package that is causing the problem. Conda will hint you at dependencies that cause conflicts. Another possibility is to use two separate environments and exchange data via files, or use inter-process communication with zeromq and the likes.

What if you have Jupyterlab installed on your conda base environment with extensions of preference and need to access packages in specific project environments? If we are told not to use the base env, does this mean we need to install jupyter lab in each project env and spawn it from there for package access?

You can indeed re-install Jupyterlab in each environment. This will ensure that you don’t have conflicts between extensions. However you can also install Jupyterlab and all your extensions in a given environment, activate it and start a Jupyterlab session and THEN change the kernel used in your notebook. To make kernels of other environments available in any Jupyter session, see https://ipython.readthedocs.io/en/stable/install/kernel_install.html#kernels-for-different-environments

How to create a yml environment file?

The environment.yml file is just a text file with a series of packages that should be installed. See here: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-file-manually

When I install a package using the pip command in the Terminal, does it install the package for the python in the conda base or does it only install the package for the python on my computer?

When you open a terminal on linux or macOSX or the Anaconda Prompt on Windows, in principle it starts in the base environment (you should see (base) at the beginning of each line in the terminal). If you use pip there, the package is installed in the base environment. To install in a specific environment, you need to first conda activate it. If you use another type of terminal, e.g. powershell on windows, and have another Python installation available there, pip will install the package for that Python installation.

What do I have to consider when installing Jupyter and having to access computational resources on a remote server to which I can connect with ssh?

Installing Jupyter on a remote server or cluster works exactly as on a laptop. Again, the best is to install miniconda and then create environments. Here’s an example of a guide on how to then run Jupyter remptely https://www.digitalocean.com/community/tutorials/how-to-install-run-connect-to-jupyter-notebook-on-remote-server

When using environments do you actually download and reinstall the modules each time you set up a new environment? Isn’t this taking up large disk space quickly ?

Yes, packages are downloaded every time you include them into an environment. The disk usage stays reasonable considering the typical amounts of data one has to deal with. You can also easily suppress environments once you don’t need them anymore.

General

Will these notebooks and materials be made available longer ? Because I couldn’t finish reading everything…

We don’t plan to remove material from github. It might be also updated in the future. The material will also remain available in interactive form for as long as the relevant services (Binder, Colab) are available.

Would you recommend changing the brightness/contrast of images before beginning the image processing pipeline (filtering, thresholding, etc) so that the images are easier to see? How do you do this in python? Does this have an effect on results if we only want to measure ROI properties like size/particle tracking?

Such corrections are a PART of the processing pipeline. You can do all image corrections directly there.
Then you build the pipeline introducing the necessary (pre)processing steps so that you achieve the desired performance.

Does the addition of the path to look for the course function not have to go before the import of the function?

The path to your python modules that you are going to import should be added to sys.path before the import.
Paths to all the different pieces of data can be introduced at any time.

Is there an advantage of using thresholding instead of edge detection algorithms?

Thresholding usually produces masks, so “areas”, which are more robust to work with in comparison to edges (thin elements, how do you join them to objects…)

What is a data frame? Is it the same as in R?

Yes, this is similar to the R dataframe. The Python implementation of dataframe is in the pandas package.

Would you advised to write the pipeline inside a function?

To make the code reusable it’s a good practice to wrap the pipeline in a function, (or a set of functions) and put it in a module.

Can you use jupyter notebook for a module with many functions, put it in your path , and access from different jupyter notebook?

This is a matter of opinion, but we rather not recommend using a jupyter notebook to “store” a module with many functions and access to it from another notebook. You should create your own package/library of Python functions and import it into multiple Jupyter notebooks. This will make your code much easier to maintain, notably becasuse notebooks don’t integrate very well with git tracking. Note that you can import notebooks in other notebooks: https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Importing%20Notebooks.html. Amended thanks to @VolkerH: There is a tool that let’s you build proper python packages including documentation, tests etc. from notebooks. You can then upload these packages onto PyPi (making them pip installable) and import them into other notebooks. It is called nbdev https://nbdev.fast.ai/.

When using bash, why ‘awk’ is not supported?

Didn’t personally try, but here seems to be a solution: https://stackoverflow.com/questions/34972035/awk-print-with-pipes-not-working-ipython-in-jupyter-notebook

Does bash even work in Windows?

In windows you can execute the any shell (cmd) commands in the same way.

What are the major differences between python and R? or what are the benefits to use R instead of python?

Python and R are both widely used for data science. R is particularly good for statistical analysis but packages in Python such as statsmodels https://www.statsmodels.org/dev/index.html are excellent too. In the area of image processing Python currently remains the dominant language.

Can we use jupyter to run Matlab?

No but you can run an Octave kernel in Jupyter. You can also run any headless Matlab command as a command line.

4 Likes

This FAQ is a real tour de force, amazing work @guiwitz.

A comment on importing notebooks into notebooks:

There is a tool that let’s you build proper python packages including documentation, tests etc. from notebooks. You can then upload these packages onto PyPi (making them pip installable) abd import them into other notebooks. It is called nbdev https://nbdev.fast.ai/ .

I am just mentioning this for completeness not necessarily as an endorsement. It is very much subjective and also project-dependent whether this is a good way to work. You are depriving yourself of some great programming support present in modern IDE’s such as Visual Studio code or PyCharm, which can do linting (automatically finding potential erros in your code) and much better auto-completion than notebooks.

Thanks, great point! I amended this answer in the Q&A. I actually installed that package when it came out but never got around to use it seriously. One of the interesting features is the ability to easily create documentation including examples, plots etc. So for larger projects, an ideal usage is probably to use it directly within VSCode which can also display notebooks, but also offers all the other “goodies”.

This isn’t strictly true. Conda uses hardlinks to prevent duplicating packages. When the packages are the same versions in two different environments, they share the same disk space. See this SO answer: