Speeding up Metamorph file reading with Bioformats

Hello @OMETeam,

I have a nd file (generated by metamorph) which is defining a dataset of:

  • 3 channels
  • 33 timepoints
  • ~ 100 positions

Leading to around 10 000 individual tiff files.

Reading/opening these stacks (time series) with bioformats is quite slow, because ( I guess ) a TIFFReader is generated for each individual tiff file - re-reading over and over again the same metadata.

Is there a way to make to process faster by avoiding the creation of a TIFFReader for each tiff ?

I tried to ‘memoize the nd file’ but there was no significant speed increase. Also - the goal here is to read tiff stacks in order to resave them to another file format - so only one read is performed for each tiff. (and the resaving should be fast as well)

Happy to get any hints on how to tackle this issue!

Hi @NicoKiaru

I was also worried about the slowness of reading the oib file (the content is tiff) for each time series created by Olympus’s multi-area time-lapse imaging.
So I created a plugin that simply merges open files one by one in parallel processing.
The load time seems to have improved significantly.
This makes it possible to read files that are divided into time series by oif.
In addition, because bio-formats makes a mistake in the order of numbers (specifically, a sort mistake that occurs when numbers such as 01, 02…10,11…), I also add an operation to absorb it.
I’m not sure if you can do that with a metamorph file, but in my case this has resolved my complaints.
Is this information helpful?

hwada

This seems to be a known issue.
There are several posts in the forum mentioning slow TIFF performance.

In https://imagej.net/TIFF you can find the hint:

The Bio-Formats plugins offer a more complete TIFF importer, accessible via the File › Import › Bio-Formats command.

  • Pro: The Bio-Formats TIFF reader can handle many more varieties of TIFF.
  • Con: The Bio-Formats TIFF support is not as speedy as ImageJ1’s TIFF reader.

and

The SCIFIO library … adapted from the Bio-Formats … supports a wider variety of TIFFs, but is less performant than the ImageJ 1.x reader.

I think we are facing a similar problem here trying to import a similar experiment (with a .nd file) to Omero.
around 9 GB and 18.000 individual files.
Normal time transfer server to server: around 10 minutes in my institute, importing in Omero I gave up after 45 minutes.

thanks @hwada and @phaub, this is some helpful information!

I think I’ll try a ‘brute’ approach by opening/concatenating these as raw data (these are uncompressed tiff) in order to see how much speed can be gained - and depending on the outcome I’ll either 1 - give up if it’s not worth 2 - or make my own plugin or 3 - try to create another optimized MetamorphReaderdedicated to this special case

@emartini Interesting to know! It’s probably a different issue because then it involves network transfer

I am really not sure that is a network transfer problem (maybe related, but not only) since we imported quite fast in Omero much larger experiment (by now up to 100 GB) with more or less the same time we were taking to transfer server to server.

It really seems to me that Omero is trying to do something with every single tif with bio-formats.
Anyway I will open a new thread for this issue, but I really think it’s related :wink:

No you’re right, but because the network is involved, there’s probably an extra layer of complexity

I’ll keep an eye on it as well!

1 Like

@NicoKiaru, what sort of read time are you roughly seeing?

If you are reading the nd file then there will indeed be a separate reader for each of the associated files and is indeed likely slowing the process. I quickly profiled a slightly smaller dataset (3000 tiff files) and the breakdown was roughly 50% of the time for the initial parsing and initialisation, 25% of the time on setting id for each of the individual readers and 25% on actually reading the pixel data.

1 Like

Hi @dgault,

Thanks for the benchmarking.

The data I have is a metamorph file (from this paper of the Derivery lab), downsized to 10 positions.

Reading and resaving tiff stacks (one stack per position - around 270 Mb per stack - 2.7 Gb total) with bioformats with this macro takes 4.8 s per image:

nPositions=10; //number of positions (I get it elsewhere)

filePath = "C:\\Users\\...";
name="NotchLanding_all36_control36.nd";

timeStart = getTime();
for (iPosition=1;iPosition<nPositions+1; iPosition++){
    //load the images corresponding to position iPosition
    run("Bio-Formats Importer", "open="+filePath+name+" color_mode=Default rois_import=[ROI manager] view=Hyperstack stack_order=XYCZT series_"+iPosition);
    //save
    saveAs("tiff", filePath+"position_"+iPosition);
    close();
}
timeEnd = getTime();

totalTime = ((timeEnd-timeStart)/1000)

print("Export took "+totalTime+" seconds for "+nPositions+" images");
print("Export took "+(totalTime/nPositions)+" seconds per image");

//Export took 47.676 seconds for 10 images
//Export took 4.7676 seconds per image

Using a direct Image Sequence (giving up metadata) read and save takes around 1.5 s per image:


setBatchMode(true);
filePath = "C:\\Users\\...";
filePrefix = "NotchLanding_all36_control36_";

nPositions=10; //number of positions (I get it elsewhere)
timeStart = getTime();
for (iPosition=1;iPosition<nPositions+1; iPosition++){
	run("Close All");
	run("Image Sequence...", "open=["+filePath+"NotchLanding_all36_control36_w1TIRF 637 single LP_s1_t1.TIF] file=["+filePrefix+"w1TIRF 637 single LP_s"+iPosition+"_t] number=33 sort");
	rename("637");
	run("Image Sequence...", "open=["+filePath+"NotchLanding_all36_control36_w1TIRF 637 single LP_s1_t1.TIF] file=["+filePrefix+"w3TIRF 488 single_s"+iPosition+"_t] number=33 sort");
	rename("488");
	run("Image Sequence...", "open=["+filePath+"NotchLanding_all36_control36_w1TIRF 637 single LP_s1_t1.TIF] file=["+filePrefix+"w2TIRF 561 single_s"+iPosition+"_t] number=33 sort");
	rename("561");
	run("Merge Channels...", "c1=561 c2=488 c3=637 create");
	saveAs("Tiff", filePath+filePrefix+iPosition+".TIF");
}

timeEnd = getTime();

totalTime = ((timeEnd-timeStart)/1000)

print("Export took "+totalTime+" seconds for "+nPositions+" images");
print("Export took "+(totalTime/nPositions)+" seconds per image");

// use = virtual
// Export took 15.93 seconds for 10 images
// Export took 1.593 seconds per image

// not virtual
// Export took 15.127 seconds for 10 images
// Export took 1.5127 seconds per image

On my machine there’s a factor 3.2 between bioformats and a ‘direct read’. Maybe some of this (bioformats and ‘image sequence’) can be even faster with parallelization but I’m not sure and I don’t really know in which way to go to do that.

Running the importer multiple times will add a significant overhead as the metadata parsing and initialisation is taking place for each position. I can put together a jython script that would eliminate that duplication and provide some improvement.

If the goal is to convert each position to its own tiff then the bftools (https://www.openmicroscopy.org/bio-formats/downloads/) can also perform that conversion if thats an option (./bfconvert path/to/myFile.nd path/to/position_%s.tiff).

1 Like

I need to try! However it’s not working on the linked data currently because of this bug : Error during opening an .nd file. For the tests in this thread I compiled the gpl readers library with your fix @dgault , but I think I’ll wait until the fix makes its way to bftools to try it

Hello @dgault,

facing the same issue here. I also see the extremely low reading with files coming from Abberior (obj files). The uploaded file is 1.6 MB but requires a lot of time (several seconds) before showing the series option
IMG0010_SNAPTMR_50nM_610CP_50nM_10uMVera (2).zip (1.6 MB)

For metamorph file I have a time lapse of 179 time points and 128 positions, each time point is a separate stk file with 3 Z-slices a 28 MB. All in all over 22000 files.
Just the opening of the series choice menu takes between 1 and 2 hour.
Currently I am reverting to importing each position as image sequence with IJ1.
I was wondering what Bioformat is doing on the back? Loading and computing the thumbnails? reading the metadata of each single file from the 22000 files?
could one not disable some of the features ?

I tried to use the reader directly. However, this does not help much as even in this case bioformats seems to reprocess part of the metadata.

I guess bfconvert will also do the first reading of the file.

Thanks

Antonio

1 Like

Hi @apoliti, the performance will vary depending on the particular format and numerous other factors. For the uploaded file, the majority of the time is spent retrieving the schema and validating the XML contained within. If you are using the reader directly for files such as this one then using the Memoizer reader wrapper to cache the reader will see an improvement in reducing the initialisation step.

For the Metamorph files it will be quite different. I don’t have a MetaMorph dataset as large as that and none that take anywhere near that length of time to load. I suspect in that scenario it may be the thumbnails that are taking most of the time. The metadata from each individual file should only be accessed when the pixel data for that particular file is loaded. To check if it is thumbnail related, if you use the command line tools from https://www.openmicroscopy.org/bio-formats/downloads/, and run showinf -nopix path/to/myFile.nd, does that take significantly less time?

There are some options we could add to the Bio-Formats plugin to try and improve things, such as disabling thumbnails and adding the Memoizer caching.

2 Likes

Hello @dgault,
I will try, but the download is currently not working. Somehow it gets a time-out after a while. Not sure if it is from my location or the bioformat website has an issue.

Antonio

Yeah, unfortunately the website was down(see OME Resources Down due to UoD outage for full details), the download should now be back working at https://downloads.openmicroscopy.org/bio-formats/6.5.1/artifacts/bftools.zip.

Hello @dgault,

I finally had to time to test it. The command ./showinf -nopix /path/to/myFile.nd. gave an error

Initializing reader
MetamorphReader initializing D:/TMP/LI03/LI031.nd
Initializing D:/TMP/LI03/LI031.nd
Looking for STK file in D:\TMP\LI03
Failure during the reader initialization

I was able to load the accompanying stk file and yes the usage of -nopix makes it faster.

I tested with other files (nd files from Nikon system). Here again the -nopix option makes it faster.
I will let you know the difference.

Performing the command on a 60 GB file with 1134 planes in total (multi-position, CZT):

  1. swhoinf -nopix 2.5 sec
  2. showinf 1.5 min. In imageJ the reading is faster

Every time it reads the whole metadata and creates the thumbnails we observe a considerably longer loading time.

The Memoizer I was not able to use in the right way. To open a series as image plus I was using BF.openImagePlus. The reader can help in reading the metadata but not sure how to use in order to open a complete series with all channels and Z-stacks.

Also note that i had to increase the heapsize up to 5 GB when thumbnails are created.
Overall I think that a loading option without thumbnails would help to speed up the reading.

Thanks

antonio

Thanks for testing and providing the feedback. It does look like having the option to disable thumbnail generation would be required here. We have an existing GitHub Issue with this feature request which I have updated to link to this thread: https://github.com/ome/bioformats/issues/3574