Looking for life scientists to collaborate on scikit-image tutorials

Dear community,

The scikit-image team strives to cater to an ever larger number of scientists, whether students or senior investigators, whether beginners or experienced practitioners of image processing. Life sciences are of particular importance to us. Indeed, biological research keeps growing and branching out, it is heavily funded, and it brings into play rich datasets (2 or 3 spatial dimensions, time dimension, multiple colour channels, various colour models).

In consequence, we have been meaning to expand the scikit-image gallery with examples pertaining to biological applications and/or featuring biomedical images. As of today, interested users are invited to consult the following tutorials:

With this post, we want to reach out to the broader life science communities. We would like to collaborate on new examples which would reflect the practices, needs, and challenges faced by life scientists when analyzing image data. We expect that these interactions would lead to emulation, pushing us to implement, say, missing filters or options, while providing the collaborating scientist with more performant and/or more elegant image processing strategies.

In practice, we would be happy to write up short, self-contained tutorials based on your real-world analyses of microscopy, ultrasound, or CT scan images. We would of course credit you and cite you, if applicable.

Naturally, feedback on existing materials would be most welcome as well. Let us mention that we also curate a repository of datasets, where additional images obtained from biological and medical imaging would be greatly appreciated. Please contribute images under license CC0 or in the public domain. We are also interested in collaborating on demos involving larger cloud-based datasets that would not be suitable for running in real time as part of our documentation build process. These larger-scale demos would illustrate solutions to real-world challenges in working with large data, but we would not host the datasets for these in our repository.

We look forward to hearing from you! Don’t hesitate to reply to this post with your example idea. Alternatively, you can email us at scikit-image@python.org or create an issue at our GitHub repository.

Thank you for your attention!

Marianne, for the scikit-image core team

13 Likes

During my PhD I developed algorithms to analyse data from OCT acquisition of living cells and tissue and render them in HSV color space where each channel carries some physical property, see example below on liver cells.

Of course to generate it you need more than just scikit-image because there is Fourier analysis and SVD filtering. The method is explained in detail here: https://www.nature.com/articles/s41377-020-00375-8.

Do you think that could be worth putting in the galley of examples/tutorials?

I also have an example to remove the dust particles (that we couldn’t remove physically) from the image in post processing by using inpainting on the detected dust mask (see here).

2 Likes

Here are two more resources that might be worth mentioning in the context of publicly available image datasets.

https://scif.io/images/

  • The list of public data sets on imagej.net:
5 Likes

Thanks, Jules!

We can make use of some things from outside scikit-image or its direct dependencies (SciPy, NumPy, etc) in gallery examples or demos. For example, we use scikit-learn in at least one image classification demo. The demo should make use of scikit-image functionality for a decent portion of the implementation, though. I will take a closer look at the reference you sent, but it sounds interesting. Approximately what size of data would be involved (and would we be able to redistribute it?).

Regarding inpainting, I see you used OpenCV’s cv2.INPAINT_NS method. Was this due to relatively slow performance of scikit-image’s inpaint_biharmonic? I just happened to submit a performance related PR yesterday that updates the implementation in scikit-image to have similar performance to OpenCV. In some of my test cases, the result of the biharmonic method looks subjectively better to me than the ones from OpenCV, but I don’t know if the difference would be noticeable in this application.

For the first example, in addition to scikit-image, I only used numpy and scikit-learn (for L1 normalization to obtain a quantitative colormap, but it can be done with numpy easily). The raw data would be a (512, 2048, 2048) array of uint16 so it’s approximately 2Go. We could restrict it to (64, 2048, 2048) but the SNR would suffer then.

Regarding the second example, you guessed right! At the time I used it the scikit_learn implementation was crazy slow so I used opencv but if I were to make a tutorial I would use scikit-image implementation. I don’t like opencv that much since it is not pythonic and is not straightforward to install. I’m curious to know how you improved the performance?

Dear Marianne,

I wonder if you have already seen our elife article about 3D Arabidopsis ovule Atlas. What we have is an entire developmental series of confocal microscopic images of Arabidopsis ovules that are processed through machine learning based cell boundary prediction using PlantSeg and segmented in 3D. It encompasses around 160 z stacks processed similarly. The dataset is also tissue labelled.

Please let us know if this dataset is of interest to scikit-image core team.

Here is a video about the same- https://www.youtube.com/watch?v=QJ9ymZXke40&feature=youtu.be&ab_channel=SchneitzLabatTUM

article- https://elifesciences.org/articles/63262
plantseg article- https://elifesciences.org/articles/57613

Best,
Athul

1 Like

@mkcor et al,

Most of the 11+M images in the the IDR are CC-BY if not CC-0. If anyone is in need of data of a particular type or size, happy to provide a query to find a matching image. Also, if it’s useful we can provide cloud-based versions of the images so that they can be (kindly) hosted on the EBI’s S3 servers for free rather than needing to download them to GitHub.

All the best,
~Josh

4 Likes

I’m curious to know how you improved the performance?

The fast inpaint_biharmonic PR is here: Fast biharmonic inpainting by grlee77 · Pull Request #5240 · scikit-image/scikit-image · GitHub. It is still under review, but will hopefully be in the next release. The first comment in that issue has many details on specific optimizations that were done. It is a combination of reorganizing some loops, moving one of the loops used to build the sparse matrix to Cython and precomputing some things. The actual algorithm is the same as before.

For the first example, in addition to scikit-image, I only used numpy and scikit-learn (for L1 normalization to obtain a quantitative colormap, but it can be done with numpy easily). The raw data would be a (512, 2048, 2048) array of uint16 so it’s approximately 2Go. We could restrict it to (64, 2048, 2048) but the SNR would suffer then.

Okay, this falls under the “larger-scale” category, then (too large to process online every time we build the docs). It could work for a notebook-based example that users could run offline. There would still need to be a way for users to download the data on request, though.

Fantastic initiative!

It would be great to see examples that cover digital pathology whole slide image processing (IF and/or BF). These formats are quickly (if not already) becoming the standard for thin section tissue-based image analysis. The pyramidal structure of the image formats along with their large size present some interesting technical challenges. It would be nice to create an analysis workflow which bridges segmentation performed at various pyramidal levels. For example, region of interest detection at low res, followed by high-res tile-based processing (sequential or parallel). More specifically, tumor/stroma region segmentation followed by cell segmentation/classification.

Happy to collaborate on this effort if there is interest.

Thank you very much, @Jules! Let’s see if we can reproduce some analyses of yours with @grlee77’s new, fast implementation of inpaint_biharmonic:wink:

And, since you wrote:

may I ask for your expertise on this naive question I had?

Do you have any information regarding the raw data? As far as I understand it is confocal microscopy with 3 channels that are merged in an RGB image?

@Jules for sure, it is “Mouse kidney tissue on a pre-prepared slide imaged with confocal fluorescence microscopy (Nikon C1 inverted microscope). Image shape is (16, 512, 512, 3). That is 512x512 pixels in X-Y, 16 image slices in Z, and 3 color channels (emission wavelengths 450nm, 515nm, and 605nm, respectively). Real space voxel size is 1.24 microns in X-Y, and 1.25 microns in Z. Data type is unsigned 16-bit integers.”
Reference: Description under kidney-tissue-fluorescence.tif

@mkcor I am not a biologist but I guess they used 3 different fluorophores to target 3 different biological structures, hence why you see 3 “tissue parts”. It’s standard procedure to merge color channel to colocalize proteins/membranes/etc but each separate channel carry a lot of information in itself. Let me know if that answers your question.

Dear Marianne @mkcor ,

may I also add an idea for a beginners scripting example we used in the past in some courses and book chapters of the #NEUBIAS society:

There is a nice biological application presented in this publication by Andrea Boni showing how to measure intensity changes over time in the nuclear envelope of a cell. An important key message here is related to good scientific practice: You should segment one channel to measure intensities in another channel.


[Courtesy: Courtesy: Andrea Boni, EMBLHeidelberg / Viventis]

That example workflow was used in this publication by @Kota and in a very similar workflow in this online tutorial by @aklemm. Furthermore, Daniela Vorkel and myself used it in this preprint to demonstrate GPU-accelerated CLIJ macro programming. The original IJ macro can be found here and our modification here.

I’d assume if there was a scikit-image example scipt doing the same, that would help spreading the word about python in the life sciences / bio-image analysis community.

Let me know if I can help. :slight_smile:

Cheers,
Robert

3 Likes

Dear @Athul_r.vijayan,

Wow! Impressive work. Thank you for sharing.

I have found the raw 3D datasets (HDF5 files at OSF | Ovules). Is the final dataset (of 158 hand-curated 3D digital ovules) publicly available? Are intermediate outputs available somewhere?

I guess I could recompute them myself, using the raw data and GitHub - hci-unihd/plant-seg: A tool for cell instance aware segmentation in densely packed 3D volumetric images, but I don’t think I have enough computing resources! I’m thinking of maybe focusing on the labelling step (and probably just a small sample of the 3D data) to reproduce with scikit-image this semi-automated and manual processing step… I would simplify the point and bring it to a more general audience, always keeping you in the loop of course.

Would this make sense?

Thanks again,
Marianne

Hi Marianne,

Thank you. The datasets used in the study Vijayan et al., 2021 - “A digital 3D reference atlas reveals cellular growth patterns shaping the Arabidopsis ovule” is all available for public download. It contains the raw image, plantseg boundary prediction and corrected segmentation (not the raw segmentation from plantseg). Please find it here:-

High-quality dataset includes 158 wild-type 3D digital ovules. Additional dataset contains another 85 early-stage WT ovules- https://www.ebi.ac.uk/biostudies/studies/S-BSST475?query=athul

118 3D digital ino-5 mutant ovules- https://www.ebi.ac.uk/biostudies/studies/S-BSST497?query=athul

69 3D digital ovules of pWUS fluorescent reporter expression- https://www.ebi.ac.uk/biostudies/studies/S-BSST498?query=athul

*this is the number of ovules, not the number of images, we have several of multiple numbers of ovuler per image stack. As mentioned in the manuscript, we have plantseg prediction and segmentation for most of the images. Some were already good at first place that they might not have a prediction, some were done according to the “plantseg-MGX hybrid method” as described in the manuscript.

Hope that it helps

Best,
Athul