[NEUBIAS Academy@Home] Webinar “Big Data IV: Visualizing, Sharing and Annotating Large Image Data in the Cloud” + Questions & Answers

Hi everyone,

The recording of the BigData IV webinar of NEUBIAS Academy@Home is now on YouTube. The webinar took place on February 3 2021 and focused on the software projects CATMAID and MoBIE.

The questions asked and answered during the presentations are listed below, following a structure similar to previous parts of this series (1, 2, 3). For continued discussion, feel free to either start a new topic in the forum or ask right here.

A big thank you goes to @Julien_Colombelli, @RoccoDAntuono, @aklemm and @romainGuiet for all their help in organizing and moderating this session! Thanks also to the whole NEUBIAS team for doing this webinar series as well as to @albertcardona and @clbarnes, who helped answering live questions.

@Christian_Tischer, @Kimberly_Meechan, @constantinpape, @tomkazimiers

Slides for each part


MoBIE part

Questions and answers

Table of contents


Available datasets

Q1: Which animal species are available in the project? and does it only incorporate central nervous system?

CATMAID as software is organism-agnostic and has been used for a number of different animals, including C. elegans and platynereis. Its largest user base is currently in the Drosophila community, and there are some public instances available at virtualflybrain.org. In the specific datasets Tom is presenting, it’s all Drosophila. There are also papers using CATMAID for e.g. organizing and annotating collections of LSM stacks of fly ovaries. We also use it for mapping a whole first instar larva, complete with body wall, muscles, gut, and all.

Hardware and connectivity requirements

Q2: In your experience, what are the minimum vs ideal graphical/bandwidth requirements for a fast communication between your laptop/desktop and the data server?

On the minimal bandwidth: ideally as fast as you can get. We worked with fibre optic cables directly to our desktops, for as fast as possible browsing. But much less fast connections will do fine.

Data formats

Q3: Can CATMAID deal with time series of 3D datasets?

There was an adaptation of CATMAID to handle 4D Data (look up a paper by Fernando Amat from the Keller lab), but presently it doesn’t.

Q4: How is CATMAID N5 support? Is it possible to have different viewing angles?

CATMAID supports N5 natively, and via a server-side program which (very quickly) generates and serves JPEG tiles. In both forms, arbitrary axes can be sliced through for orthogonal views. However, unlike BigDataViewer, these slices are axis-aligned.

Data loading

Q5: Regarding the image data and assuming you are not changing the zoom-level, can you preload images if you have the bandwidth to increase loading speed?

Preloading of images is done in e.g. the Review widget (there is a checkbox for that). When using e.g. an N5 as an image backend, if each cube is e.g. 256x256x256 px, you’d be loading at once 256 layers. There is code to preload ahead of time the next set of cubes, in a form of predictive loading (depending on how you are browsing), but it’s still in a development branch if I remember correctly.

As an example of a low bandwidth situation, I routinely reconstruct neurons from home, with a flimsy internet connection and multiple zoom sessions ongoing simultaneously. The biggest data transfer are the images; wise use of image mirrors can mitigate the situation a lot. One possible mirror is local: plug e.g. a USB-3 1TB SSD external drive containing the images of the subvolume you’ll be working on.

Q6: Do you see a difference in image loading speed between 512x512 JPG (fly larva) versus 1024x1024 (FAFB)?

No, because modern HTTP protocol keeps the connection open (it used to be the case that each image would be loaded via a new connection with older HTTP versions). Also, use the “tile efficiency fraction”: prevent loading image tiles when e.g. less than 20% of the area of the image tile would be rendered. It’s located in the menu activated by the lower left button in the stack viewer widget.

There may be a small difference when it comes to your field of view - CATMAID will load enough tiles to fill the FOV. Larger tiles means more data outside of the FOV is loaded; smaller tiles give you more granularity. However, too small and you incur more overhead from network round trips.


Q7: How to search for a widget?

Or the control+space keyboard binding, to select or search for a widget by name. Alternatively, click the first icon button in the top toolbar (“Open Widget Dialog”).

Q8: Does the authorisation for edit or view be given to specific person or group?

Yes, there is an admin interface (A webpage) to specify permissions with some granularity.

Q9: Can I “blast” in CATMAID neuron that I have segmented elsewhere, and find the more similar neurons in the CATMAID dataset?

Yes. NBLAST uses point clouds rather than segmentations or even skeletons for its comparisons. You can upload CSVs to use as point clouds in the neuron similarity widget. These can also then be visualised in the 3D viewer.


Data access and storage

Q10: Can the ‘in the cloud’ solution be restricted access?

Yes, the access to the actual image data can be restricted via the object store access credentials.
The metadata is currently always publicly readable, but write permissions can be restricted. Once we support everything on the object store it can also be fully private.

Q11: Did I get that right: the (raw, binary) image data is stored inside a git database?

No, only the metadata (e.g. bookmarks) are stored on github. The raw data is stored on an object store, e.g. amazon aws s3.

Q12: How far off is ZARR support, or how difficult would it be to adapt this to zarr currently?

Very close! If you install the MoBIE plugin we already have: Plugins › BigDataViewer › OME ZARR › Open OME ZARR From S3…

Q13: Could MoBIE image data be from an OMERO server?

Very interesting thought! Currently not implemented! If the data are stored in data format in OMERO that allows lazy loading then it would be possible. In fact this my happen at some point in a not so far future.

Q14: Is the CATMAID N5 data be readable on MOBIE or vice versa? It seems that they both use N5 natively?

Both use different specifications of dataset layout and metadata built on top of the N5 container format. CATMAID’s is very slim but MoBIE has a lot more metadata to manage so they’re unlikely to be directly compatible.

Data updates:

Q15: Is there any mechanism for dynamic updates of the dataset. E.g. re-running a segmenation and adding it

You can always add images to the dataset and then restart the viewer. If, by dynamic, you mean “during a running viewer session”, then no, we don’t have that at the moment. But we are happy to discuss this if you need it.


Q16: How do you pre-compute geen expression?

For the gene expression computation please see this publication: https://www.pnas.org/content/114/23/5878

MoBIE usage and features

Q17: Are gene expression levels predicted from the image before clustering the cells?

Yes, the gene expression levels are “the input” (precomputed as binary masks) and the clustering is “the output”.

Q18: What means the “Affine” you enter when creating datasets in MoBIE?

This is a transformation that is applied to the image data on the fly. This means you can determine the transformation to register your images (e.g. in elastix or BigWarp) and then apply it in MoBIE without resaving the data.

Q19: Where were the affine transforms determined?

They are stored inside small xml files that go along with each n5 image. The xml files are a standard BigDataViewer file format feature.

Q20: Do you have plans to support Thin Plate Spline transforms?

I think there may something existing already, implemented by John Bogovic (@bogovicj) from the Saalfeld lab at Janelia.