NEUBIAS Webinar Q&A
Third NEUBIAS Big-Data Webinar “Image Big Data III: Frameworks for the quantitative analysis of large images” is now on YouTube. This post contains the questions there were asked during the webinar, together with the answers
We’d like to give a huge thanks to @MarionLouveaux, @aklemm, @Ofra_Golani, @RoccoDAntuono and @Julien_Colombelli for all their work as moderators and in helping put this together, and @romainGuiet and everyone else at NEUBIAS who are making the webinars happen !
Part 1: Labkit
Q: Where can I find addition information to Labkit?
- Labkit - ImageJ
- GitHub - maarzt/imglib2-labkit: Advanced Tool for Labeling And Segmentation
- GitHub - maarzt/labkit-command-line: Labkit Command Line - Segment huge images on a cluster
Q: Can we load the tif image directly in Labkit?
Yes. Tif images are supported.
Q: Can the Labkit plugin only be used on images or also on timelapse videos?
Labkit works nicely with many image modalities. 2D / 3D, timelapse, color, multi channel, that’s all supported.
Q: Is the block size user defined?
No, the block size is fixed. But it should work nicely on all computers. If you need to change the blocksize, you could have a look into the Labkit source code. If you observe problems with the blocksize, please describe it in a new forum thread.
Q: Is color segmentation included? Or each channel must be processed independantly?
The random forest will look at all channels at once. So yes, color images can be segmented nicely.
Q: You mentioned the image used was 4GB and class that as a big image file. How will the toolkit cope with images upwards of 500GB?
Labkit can easily open 500 GB of images. If a few foreground background labels are enough, then training will work nicely and fast. You should get a preview of the segmentation quickly. However calculating the segmentation for the entire image might take a day or so.
Q: Does the Labkit segmentation work on the highest resolution available, or are multiple classifiers trained for different levels of the scale pyramid?
The classifier only uses the highest resolution level.
Q: Do you take advantage of the pyramidal resolution levels for the segmentation / annotation?
No only the highest resolution is used for the segmentation.
Q: Does Labkit support Object Classification after Pixel Classification?
No, Labkit does pixel classification only.
Q: Thanks for the talk, Mathias, I dont see Labkit-preview website in the ImageJ Updater, only Labkit appears. Any ideas?
Yes the “Labkit-Preview” update site is not in the official list of update sites. You need to add it manually: https://sites.imagej.net/Labkit-Preview/
Q: Great Talk. Problem: several z-stacks —> collapsing for analysis in one plane —> overlapping DAPI signal —> how good is Labkit in separation of nearby/ partially overlapping signals?
Give it a try. But that sounds like hard image for Labkit. If it’s hard to see for a human, then Labkit won’t be successful.
Q: Hi there, not using big data images yet but to make sure I understand the Labkit plugin. So if you have a 3D Sim data set for example and need to segment in the 3 dimensions. Do you need to determine the background for each slices? I guess this could be different on the differencent slices of the image. How do you practically do this?
I would pick maybe five representative slices, and mark a few pixels as background and forground on each of the five slices.
Q: Is Labkit scriptable by imagej macros?
The Labkit-Preview update site has a macro recordable command for segmenting an image with Labkit.
Q: For cluster computing: what needs to be installed on the cluster nodes? Only Java?
Java 8 or newer needs to be installed. And snakemake if you want to use it. You can also write your one script, if you don’t have snakemake.
Q: Why the choice of the random forest classifier, and what other classifiers would you consider?
The random forest can be trained and used very quickly. I would use neural network based methods for hard to segment images. Maybe DenoiSeg or StarDist.
Q: Is it possible to use Labkit with a GPU cluster?
Labkit can run on a cluster using the GPUs for computation. This will actually save you a lot of time if you have very large data. (see GitHub - maarzt/labkit-command-line: Labkit Command Line - Segment huge images on a cluster)
Part 2: Ilastik
Q: Regarding ilastik, is there a heuristic for how much RAM you’d want for an X-GB sized 3D dataset? What about for a FIB-SEM dataset?
In interactive mode, the ammount of RAM you have primarily influences how much of the intermediate results ilastik can cache and does not recalculate if something has changed (e.g. if the annotations are added, features for those annotated pixels are ideally cached, so the update is fast). The more RAM the more you can cache here. For Pixel Classification with 3D data I’d say you should have 8-16GB of RAM for descend performance. If you intend to do the Autocontext for example, then you’d need much more.
In Headless mode, the ammount of RAM influences block size. In general, the bigger the block, the smaller the overhead (each block needs to see a little more than just the block itself).
Q: Where can we find the file-conversion notebook example?
ilastik/notebooks at master · ilastik/ilastik · GitHub ; we are planning to add more soon!
Q: What release of Ilastik allows to zoom in and out?
All versions of ilastik support zooming. Ctrl + Mousewheel (on windows / linux), or COMMAND + Mousewheel (OSX). If it is not working for you please open a post at image.sc (with the ilastik tag) and we’ll try to sort it out!
Q: can ilastik be used on 2d images?
Q: I’ve been having issues with crashes in Object classification with pixel prediction maps at the thresholding/ segmentation step. Have you tried on larger data? My crash limit is an image of 250mb and 1Gb prediction map. On a 256GB ram workstation
With the images you are using this surely should not crash… Could you maybe open an issue about it either on the image.sc or at Issues · ilastik/ilastik · GitHub ?
Q: Can you say something about .h5 vs. .n5?
Both, hdf5 and n5 (and zarr) achieve blockwise access to multi-dimensional data. hdf5 manages blocks of data inside a single file, whereas n5 uses the file-system to do so. For very large data, and especially for highly parallel operations on that data, n5/zarr is preferrable.
Q: If i want to prepare .n5 data for ilastik processing, what is the way to do that?
There are multiple ways to convert your data to n5. If it is already in hdf5, you can use GitHub - saalfeldlab/n5-utils: simple standalone BigDataViewer for multiple N5 (or HDF5) datasets for copying between n5 and hdf5. There is also a plugin for imagej: GitHub - saalfeldlab/n5-ij: ImageJ convenience layer for N5. Another possibility is to use the data conversion workflow in ilastik.
Q: How does ilastik work with time lapse data (image with multiple time frames)? Do i need to split the data for segmentation?
You can load the time series into ilasitk. There is a slider to navigate between time frames in the bottom right. The classifier is trained with all annotations you add, no matter in which time frame they are.
Q: If I understood correctly, the “Live update” computation time in e.g. pixel classification shouldn’t differ when loading 10 individual timelapses consisting of 10 timeframes each vs. 1 timelapse of 100 timeframes?
Live update will do two things: 1) Trigger training of the classifier, combining all annotations that have been added (no matter on which time-frame, or which image they are), and 2) predict the current view port (this is only a single time frame, or image). Depending on the number of annotations you have, 2) is usually more time consuming and directly. Computation time during live update (so from, e.g. changing something until the whole viewport is predicted) is usually dependent on the size of the part of the image you are looking at. The more you zoom out, the longer it will take to predict. Number of time frames has little to no influence (depending on the file format…).
Q: Is it possible to use ilastik with GPU?
Ilastik does not support to use a GPU for the simple pixel classification workflow. It nethertheless provides high performance by using highly optimised C libraries.
The ilastik neural network classifcation uses GPUs though. (see ilastik - Neural Network Classification)
Part 3: Mastodon
Q: Where can I find additional information concerning “big data server”?
Q: What is the name of the famous paper discussing “What is an analysis workflow?”
Q: Hi, thank you so much for this webinar! TrackMate actually has a part that automatically identified objects (one of the best if not THE best spot detection I know:D). Does this exist in Mastodon as well?
Thank you! Yes!
Q: Does Mastodon support GPUs?
No, no need. Maybe in the future we will use GPU for visualization. But the core of Mastodon doesn’t need a GPU.
Q: What kind of PC specifications are your benchmarks for processing time set around?
For Mastodon you don’t need a big computer. It is made to run on modest hardware. I could use it successfully on a 8GB laptop. Using a mouse is important for Mastodon however.
Q: Can mastodon be used also for not time data? for example, what could you extract from a big 3d dataset?
Yes it will work even if you don’t have a time-lapse. You can still use it for detection and inspection. I do this a lot actually.
Q: Will Mastodon replace MaMuT and TrackMate? (Is all functionality covered by MaMuT?)
It does replace MaMuT. I support the 3 software nonetheless.
Q: Fantastic toolset! We often see nucleus tracking, do you have other successful example of other particles tracked? Can it follow merging particles in the lineage?
I had it working in Cell Biology experiments too, following organelles. What matters for the detection is that the objects you want to detect resembles ‘blobs’: roundish and bright over black background. Also yes, Mastodon has basic tracking algorithms that can detect cell divisions (LAP tracker, with segment splitting).