Creating Plugin for NeuroData Without Borders TwoPhotonSeries

I am interested in building a plugin to read NWB TwoPhotonSeries files. These are hdf5 files where potentially very large data is stored in a slightly customizable way.

The metadata should be parsable in a standard way. The data itself could be stored in several ways, and I’d like to support two of them (in priority order):

  • a giant array, loaded as a virtual stack. I’d like to have options to choose the data layout (TZYX, ZTYX, etc)
  • a set of relative filenames denoting tiff files, which will need to be ordered.

What would be the right starting point? I see a HDF5 plugin, but it doesn’t understand the NWB hierarchy, plus it attempts to load the full dataset into memory. I’ve liked the experience of loading h5 stacks (ims files) where I get a lot of options about how to load the hyperstack, how to slice the data, etc.

2 Likes

Hi @chris.roat,

Welcome to the forum.

I’m not super deep in bio-formats, but I’d recommend using that infrastructure if you can because it will be easy to use by the largest number of people fairly quickly.

There’s a page about writing a new format reader, which is probably the right place to start. It’s a lot, but I think the learning curve will be worth it.

@joshmoore, @dgault are better resources than me.

John

Hi @chris.roat, just to better understand the problem, are you dealing with raw data and looking for a suitable format to store it in? Or are you dealing with an existing hdf5 file along with metadata (if so do you know how it is stored?) and looking for a way to read the data?

Either way I would recommend sticking to existing open formats if possible rather than reinventing the wheel.

For reading hdf5 data Bio-Formats currently supports a number of formats, Imaris, CellH5, Big Data Viewer, KLB etc. For writing we currently only have a writer for CellH5 though we are beginning to look at developing the next generation of open formats and it may be interesting to get involved in the discussion over at Next Generation File Formats for BioImaging

Thanks @bogovicj and @dgault,

We are dealing with raw data and want to find a better format than a large number of tiff files. I was contemplating the NWB format (which is an open format) because it seems to be an emerging standard. But I’m open to anything.

I don’t have experience with the pros/cons of the various formats.
For a large tiff stack, I’d lean toward Big Data Viewer because I’ve heard good things about using it with Fiji. Do either of you have opinions on pros/cons of that format versus the others?

Thanks,
Chris

I don’t have any experience with the NWB format so don’t know how it compares or if there is an easy way to read it in the FIJI environment. Most likely If you are looking to use FIJI then your main challenge will be actually writing the data and having the tools to do so.

In that sense if you are comfortable with sticking with TIFF for the time being then pyramidal OME-TIFF is a robust option for providing the tools and support to read and write in different environments. Though as the scale of your dimensions increase it will be less performant than say a HDF5 based format.

Once you can read images in FIJI (be it OME-TIFF or another format which Bio-Formats can read) then it is easy to export to the BDV format of H5 and XML through the Big Data Viewer plugin. This might be an option for converting the files if it is easier to start with TIFF than write directly to that format.

I have placed some links below which might be useful:
OME Position on file formats
File export code to OME-TIFF
KLB format
BDV format
Big Data Viewer in FIJI

1 Like

Hi @chris.roat, hi all, I should soon be looking into parsing NWB data to import into an omero server. As I understand it, NWB is more ‘tabular first’ than ome-tiff, but omero also provides a tables interface, also with underlying hdf5 data.
Anyway, I just wanted to let you know I’m interested in the topic, and if you start working on a plugin, I’d be happy to help.

Guillaume

Thanks for jumping in! I’m less involved in two-photon right now, but doing some larger confocal datasets. My experience might be helpful, though I needed to move the cloud storage and couldn’t find a way to use NWB or Fiji/ImageJ smoothly.

For processing, I’ve found the zarr format is very parallelizable and a lot of work has been done to make it efficient. For viewing, I have used zarr in N5 format and precomputed cloudvolumes, both of which the neuroglancer viewer can read.

These solutions are heavily python based and use dask for parallel processing.

Thanks for your answer. I guess the next step is to write a format reader in bioformats - I am told NWB is poised to become a standard in neuroscience, so I imagine there is an interest to it!

I’ll keep this thread updated - should tackle this in the next months.

@glyg: the #omero.tables interface is separate from the imaging data. Not to say you shouldn’t use it, but if you want images visualized, then #bio-formats is the way to go.

:+1:
~Josh

Thanks for the input,

Upon further discussion with Julien Denis who’s managing #NWB, my understanding is that #NWB are containers for whole experiments: images, but also electro-physiology time series, behavioral movies (in mice) and posture segmentation à la DeepLabCut. So the image is not a prime citizen here. All the data is encoded as a single HDF5 file.

I got out of this discussion with Julien with the feeling that an import/export tool between #NWB and OME-TIFF + tables would be nice (so that we could archive experiment data in an omero db and download from it as NWB). What is the current approach in the community for this type of highly heterogenous data? Maybe it is too out of scope, as microscopy is one in several experimental process here?

If there were a Bio-Formats reader for #NWB then you could either use it to generate OME-TIFF or to import the #NWB files directly into an OMERO. In the latter case, that could be done via symlinks so as not to duplicate the storage. This would essentially “register” the image portion of the files in an OMERO for visualization and management. Tighter integration would then likely be done via links in the metadata so you can navigate between different views of the experiment.

~Josh

1 Like

This sounds terrific, thanks. I’m not a Java dev, but it’s never too late to learn (plus I can find help).
Thank you Josh,

Guillaume