[NEUBIAS Academy@Home] Webinar "Image Big Data III: Frameworks for the quantitative analysis of large images" Questions & Answers

NEUBIAS Webinar Q&A

Hi everyone,

Third NEUBIAS Big-Data Webinar “Image Big Data III: Frameworks for the quantitative analysis of large images” is now on YouTube. This post contains the questions there were asked during the webinar, together with the answers :wink:

We’d like to give a huge thanks to @MarionLouveaux, @aklemm, @Ofra_Golani, @RoccoDAntuono and @Julien_Colombelli for all their work as moderators and in helping put this together, and @romainGuiet and everyone else at NEUBIAS who are making the webinars happen !

@akreshuk, @tinevez , @maarzt

Part 1: Labkit

Q: Where can I find addition information to Labkit?

Q: Can we load the tif image directly in Labkit?

Yes. Tif images are supported.

Q: Can the Labkit plugin only be used on images or also on timelapse videos?

Labkit works nicely with many image modalities. 2D / 3D, timelapse, color, multi channel, that’s all supported.

Q: Is the block size user defined?

No, the block size is fixed. But it should work nicely on all computers. If you need to change the blocksize, you could have a look into the Labkit source code. If you observe problems with the blocksize, please describe it in a new forum thread.

Q: Is color segmentation included? Or each channel must be processed independantly?

The random forest will look at all channels at once. So yes, color images can be segmented nicely.

Q: You mentioned the image used was 4GB and class that as a big image file. How will the toolkit cope with images upwards of 500GB?

Labkit can easily open 500 GB of images. If a few foreground background labels are enough, then training will work nicely and fast. You should get a preview of the segmentation quickly. However calculating the segmentation for the entire image might take a day or so.

Q: Does the Labkit segmentation work on the highest resolution available, or are multiple classifiers trained for different levels of the scale pyramid?

The classifier only uses the highest resolution level.

Q: Do you take advantage of the pyramidal resolution levels for the segmentation / annotation?

No only the highest resolution is used for the segmentation.

Q: Does Labkit support Object Classification after Pixel Classification?

No, Labkit does pixel classification only.

Q: Thanks for the talk, Mathias, I dont see Labkit-preview website in the ImageJ Updater, only Labkit appears. Any ideas?

Yes the “Labkit-Preview” update site is not in the official list of update sites. You need to add it manually: https://sites.imagej.net/Labkit-Preview/

Q: Great Talk. Problem: several z-stacks —> collapsing for analysis in one plane —> overlapping DAPI signal —> how good is Labkit in separation of nearby/ partially overlapping signals?

Give it a try. But that sounds like hard image for Labkit. If it’s hard to see for a human, then Labkit won’t be successful.

Q: Hi there, not using big data images yet but to make sure I understand the Labkit plugin. So if you have a 3D Sim data set for example and need to segment in the 3 dimensions. Do you need to determine the background for each slices? I guess this could be different on the differencent slices of the image. How do you practically do this?

I would pick maybe five representative slices, and mark a few pixels as background and forground on each of the five slices.

Q: Is Labkit scriptable by imagej macros?

The Labkit-Preview update site has a macro recordable command for segmenting an image with Labkit.

Q: For cluster computing: what needs to be installed on the cluster nodes? Only Java?

Java 8 or newer needs to be installed. And snakemake if you want to use it. You can also write your one script, if you don’t have snakemake.

Q: Why the choice of the random forest classifier, and what other classifiers would you consider?

The random forest can be trained and used very quickly. I would use neural network based methods for hard to segment images. Maybe DenoiSeg or StarDist.

Q: Is it possible to use Labkit with a GPU cluster?

Labkit can run on a cluster using the GPUs for computation. This will actually save you a lot of time if you have very large data. (see GitHub - maarzt/labkit-command-line: Labkit Command Line - Segment huge images on a cluster)

Part 2: Ilastik

Q: Regarding ilastik, is there a heuristic for how much RAM you’d want for an X-GB sized 3D dataset? What about for a FIB-SEM dataset?

In interactive mode, the ammount of RAM you have primarily influences how much of the intermediate results ilastik can cache and does not recalculate if something has changed (e.g. if the annotations are added, features for those annotated pixels are ideally cached, so the update is fast). The more RAM the more you can cache here. For Pixel Classification with 3D data I’d say you should have 8-16GB of RAM for descend performance. If you intend to do the Autocontext for example, then you’d need much more.
In Headless mode, the ammount of RAM influences block size. In general, the bigger the block, the smaller the overhead (each block needs to see a little more than just the block itself).

Q: Where can we find the file-conversion notebook example?

ilastik/notebooks at master · ilastik/ilastik · GitHub ; we are planning to add more soon!

Q: What release of Ilastik allows to zoom in and out?

All versions of ilastik support zooming. Ctrl + Mousewheel (on windows / linux), or COMMAND + Mousewheel (OSX). If it is not working for you please open a post at image.sc (with the ilastik tag) and we’ll try to sort it out!

Q: can ilastik be used on 2d images?

Yes

Q: I’ve been having issues with crashes in Object classification with pixel prediction maps at the thresholding/ segmentation step. Have you tried on larger data? My crash limit is an image of 250mb and 1Gb prediction map. On a 256GB ram workstation

With the images you are using this surely should not crash… Could you maybe open an issue about it either on the image.sc or at Issues · ilastik/ilastik · GitHub ?

Q: Can you say something about .h5 vs. .n5?

Both, hdf5 and n5 (and zarr) achieve blockwise access to multi-dimensional data. hdf5 manages blocks of data inside a single file, whereas n5 uses the file-system to do so. For very large data, and especially for highly parallel operations on that data, n5/zarr is preferrable.

Q: If i want to prepare .n5 data for ilastik processing, what is the way to do that?

There are multiple ways to convert your data to n5. If it is already in hdf5, you can use GitHub - saalfeldlab/n5-utils: simple standalone BigDataViewer for multiple N5 (or HDF5) datasets for copying between n5 and hdf5. There is also a plugin for imagej: GitHub - saalfeldlab/n5-ij: ImageJ convenience layer for N5. Another possibility is to use the data conversion workflow in ilastik.

Q: How does ilastik work with time lapse data (image with multiple time frames)? Do i need to split the data for segmentation?

You can load the time series into ilasitk. There is a slider to navigate between time frames in the bottom right. The classifier is trained with all annotations you add, no matter in which time frame they are.

Q: If I understood correctly, the “Live update” computation time in e.g. pixel classification shouldn’t differ when loading 10 individual timelapses consisting of 10 timeframes each vs. 1 timelapse of 100 timeframes?

Live update will do two things: 1) Trigger training of the classifier, combining all annotations that have been added (no matter on which time-frame, or which image they are), and 2) predict the current view port (this is only a single time frame, or image). Depending on the number of annotations you have, 2) is usually more time consuming and directly. Computation time during live update (so from, e.g. changing something until the whole viewport is predicted) is usually dependent on the size of the part of the image you are looking at. The more you zoom out, the longer it will take to predict. Number of time frames has little to no influence (depending on the file format…).

Q: Is it possible to use ilastik with GPU?

Ilastik does not support to use a GPU for the simple pixel classification workflow. It nethertheless provides high performance by using highly optimised C libraries.
The ilastik neural network classifcation uses GPUs though. (see ilastik - Neural Network Classification)

Part 3: Mastodon

Q: Where can I find additional information concerning “big data server”?

https://imagej.net/BigDataServer

Q: What is the name of the famous paper discussing “What is an analysis workflow?”

https://zenodo.org/record/1042570#.YBA2mFUzbuo

Q: Hi, thank you so much for this webinar! TrackMate actually has a part that automatically identified objects (one of the best if not THE best spot detection I know:D). Does this exist in Mastodon as well?

Thank you! Yes!

Q: Does Mastodon support GPUs?

No, no need. Maybe in the future we will use GPU for visualization. But the core of Mastodon doesn’t need a GPU.

Q: What kind of PC specifications are your benchmarks for processing time set around?

For Mastodon you don’t need a big computer. It is made to run on modest hardware. I could use it successfully on a 8GB laptop. Using a mouse is important for Mastodon however.

Q: Can mastodon be used also for not time data? for example, what could you extract from a big 3d dataset?

Yes it will work even if you don’t have a time-lapse. You can still use it for detection and inspection. I do this a lot actually.

Q: Will Mastodon replace MaMuT and TrackMate? (Is all functionality covered by MaMuT?)

It does replace MaMuT. I support the 3 software nonetheless.

Q: Fantastic toolset! We often see nucleus tracking, do you have other successful example of other particles tracked? Can it follow merging particles in the lineage?

I had it working in Cell Biology experiments too, following organelles. What matters for the detection is that the objects you want to detect resembles ‘blobs’: roundish and bright over black background. Also yes, Mastodon has basic tracking algorithms that can detect cell divisions (LAP tracker, with segment splitting).

9 Likes

Dear NEUBIAS team and speakers of this webinar,
thank you so much for this awesome overview and the details!

I have now successfully run Mastodon on a fused 1.2 TB Zeiss Lightsheet Multiview dataset (coming from BigStitcher) and am in awe of how fast particle detection and linking is. “No GPUs needed” is an understatement, I think it would even work on my microwave :star_struck: .

@tinevez, Regarding the Multiview dataset, I noticed that Mastodon also supports MultiView datasets before fusion. This is very interesting, since the fusion step is computationally very expensive and takes a long time.
What is the workflow for MultiView datastes?
I noticed that during spot detection one needs to select a view. Do I do spot detection for every view separately one view at a time, and how are duplicated spots then handled?

Are there any cons when using datasets before fusion?

Thank you very much for you help!

3 Likes

Oh thank you! This feedback is really great for us.
As for the microwave I don’t know bt I know that some people managed to have Doom running on a pregnancy test.

When it comes to your question about multi-views vs fusion:
Actually using the multiple views is an important but imho understated ability of all the BDV-based tools.

When we were working on the MaMuT project [1], Anastassios and Carsten found out tat realistically, even the fused, deconvolved view was not good enough for some difficult cases.
Initially, on the MaMuT dataset (Parhyale hawaiensis embryo followed over 1 week), we had 3 then 5 raw views of the embryo. They were registered, but not deconvolved nor denoised.
For late development, the cells become really small and numerous. And at some locations in the embryo, some views had a really low quality. Because we had 5 views, there was always at least one view in which any cell was ‘good looking’. So the initial way of working was to follow a cell in a raw view until the quality degraded so much we could not tell them apart. Then we simply switched view to a better one and resumed lineaging.

Tassos and friends managed to perform the multi-view deconvolution on the dataset. It generated really high quality of the whole embryo. For instance Tassos used the deconvolved view to generate this movie:

I thought this new view could replace the 5 others, but Tassos told me there were some cases, very late in development where even the deconvolved view was contaminated by the low quality of some raw views. So he was happy to be able to switch to the one raw view of good quality when needed.

So in the MaMuT dataset we kept all the views:

  • the 3 raw views (later 5 views)
  • the deconvolved one.
    Because the BDV format is so good and flexible it is possible to have them all registered to a common referential, even despite different angle, pixel size etc…
  1. https://elifesciences.org/articles/34410
2 Likes

Hi @tinevez , thank you for your reply and the very impressive movie! :smiley:

Concerning the MultiViews, just to confirm that I understood your post properly using an example:

Lets say you have a dataset with 5 views snd 100 time points in total, then the procedure you have outlined is to detect the spots always in the “best view for a given range of time points”, correct?

For example:

  • view 1, tp 0-24
  • view 2, tp 25-50
  • view 3, tp 51-75
  • view 4, tp 76-90
  • view 5, tp, 91-99

After the (automatic) detection of all spots is done, one can simply go for the “linking” in Mastodon, and it will automatically link the spots for the entire range of time points across the different views?
Or is the linking also “per view and tp range”, and if so, how are the tracks linked between the views?

Thank you again for your help!

Hello

Alors what is in Mastodon that was not in MaMuT is the fully automated detection.

If you run the detection 5 times, you run the risk of having several spots for one cell.
Fortunately, we thought of that (somehow): If you selected the Advanced DoG detector you will see that you can specify what to do if a detector finds a spot close to an existing spot. You can specify for instance to not create a new spot or replace the old one by the new one.

With this strategy, you can simply run the detection once per view on all the time-points. With this you should build a detection for all the cells that are correctly imaged in at least one view.

(Please let us know how it works I really want to know now)

1 Like