Speeding up Metamorph file reading with Bioformats

Hello @OMETeam,

I have a nd file (generated by metamorph) which is defining a dataset of:

  • 3 channels
  • 33 timepoints
  • ~ 100 positions

Leading to around 10 000 individual tiff files.

Reading/opening these stacks (time series) with bioformats is quite slow, because ( I guess ) a TIFFReader is generated for each individual tiff file - re-reading over and over again the same metadata.

Is there a way to make to process faster by avoiding the creation of a TIFFReader for each tiff ?

I tried to ‘memoize the nd file’ but there was no significant speed increase. Also - the goal here is to read tiff stacks in order to resave them to another file format - so only one read is performed for each tiff. (and the resaving should be fast as well)

Happy to get any hints on how to tackle this issue!

Hi @NicoKiaru

I was also worried about the slowness of reading the oib file (the content is tiff) for each time series created by Olympus’s multi-area time-lapse imaging.
So I created a plugin that simply merges open files one by one in parallel processing.
The load time seems to have improved significantly.
This makes it possible to read files that are divided into time series by oif.
In addition, because bio-formats makes a mistake in the order of numbers (specifically, a sort mistake that occurs when numbers such as 01, 02…10,11…), I also add an operation to absorb it.
I’m not sure if you can do that with a metamorph file, but in my case this has resolved my complaints.
Is this information helpful?

hwada

This seems to be a known issue.
There are several posts in the forum mentioning slow TIFF performance.

In https://imagej.net/TIFF you can find the hint:

The Bio-Formats plugins offer a more complete TIFF importer, accessible via the File › Import › Bio-Formats command.

  • Pro: The Bio-Formats TIFF reader can handle many more varieties of TIFF.
  • Con: The Bio-Formats TIFF support is not as speedy as ImageJ1’s TIFF reader.

and

The SCIFIO library … adapted from the Bio-Formats … supports a wider variety of TIFFs, but is less performant than the ImageJ 1.x reader.

I think we are facing a similar problem here trying to import a similar experiment (with a .nd file) to Omero.
around 9 GB and 18.000 individual files.
Normal time transfer server to server: around 10 minutes in my institute, importing in Omero I gave up after 45 minutes.

thanks @hwada and @phaub, this is some helpful information!

I think I’ll try a ‘brute’ approach by opening/concatenating these as raw data (these are uncompressed tiff) in order to see how much speed can be gained - and depending on the outcome I’ll either 1 - give up if it’s not worth 2 - or make my own plugin or 3 - try to create another optimized MetamorphReaderdedicated to this special case

@emartini Interesting to know! It’s probably a different issue because then it involves network transfer

I am really not sure that is a network transfer problem (maybe related, but not only) since we imported quite fast in Omero much larger experiment (by now up to 100 GB) with more or less the same time we were taking to transfer server to server.

It really seems to me that Omero is trying to do something with every single tif with bio-formats.
Anyway I will open a new thread for this issue, but I really think it’s related :wink:

No you’re right, but because the network is involved, there’s probably an extra layer of complexity

I’ll keep an eye on it as well!

1 Like

@NicoKiaru, what sort of read time are you roughly seeing?

If you are reading the nd file then there will indeed be a separate reader for each of the associated files and is indeed likely slowing the process. I quickly profiled a slightly smaller dataset (3000 tiff files) and the breakdown was roughly 50% of the time for the initial parsing and initialisation, 25% of the time on setting id for each of the individual readers and 25% on actually reading the pixel data.

1 Like

Hi @dgault,

Thanks for the benchmarking.

The data I have is a metamorph file (from this paper of the Derivery lab), downsized to 10 positions.

Reading and resaving tiff stacks (one stack per position - around 270 Mb per stack - 2.7 Gb total) with bioformats with this macro takes 4.8 s per image:

nPositions=10; //number of positions (I get it elsewhere)

filePath = "C:\\Users\\...";
name="NotchLanding_all36_control36.nd";

timeStart = getTime();
for (iPosition=1;iPosition<nPositions+1; iPosition++){
    //load the images corresponding to position iPosition
    run("Bio-Formats Importer", "open="+filePath+name+" color_mode=Default rois_import=[ROI manager] view=Hyperstack stack_order=XYCZT series_"+iPosition);
    //save
    saveAs("tiff", filePath+"position_"+iPosition);
    close();
}
timeEnd = getTime();

totalTime = ((timeEnd-timeStart)/1000)

print("Export took "+totalTime+" seconds for "+nPositions+" images");
print("Export took "+(totalTime/nPositions)+" seconds per image");

//Export took 47.676 seconds for 10 images
//Export took 4.7676 seconds per image

Using a direct Image Sequence (giving up metadata) read and save takes around 1.5 s per image:


setBatchMode(true);
filePath = "C:\\Users\\...";
filePrefix = "NotchLanding_all36_control36_";

nPositions=10; //number of positions (I get it elsewhere)
timeStart = getTime();
for (iPosition=1;iPosition<nPositions+1; iPosition++){
	run("Close All");
	run("Image Sequence...", "open=["+filePath+"NotchLanding_all36_control36_w1TIRF 637 single LP_s1_t1.TIF] file=["+filePrefix+"w1TIRF 637 single LP_s"+iPosition+"_t] number=33 sort");
	rename("637");
	run("Image Sequence...", "open=["+filePath+"NotchLanding_all36_control36_w1TIRF 637 single LP_s1_t1.TIF] file=["+filePrefix+"w3TIRF 488 single_s"+iPosition+"_t] number=33 sort");
	rename("488");
	run("Image Sequence...", "open=["+filePath+"NotchLanding_all36_control36_w1TIRF 637 single LP_s1_t1.TIF] file=["+filePrefix+"w2TIRF 561 single_s"+iPosition+"_t] number=33 sort");
	rename("561");
	run("Merge Channels...", "c1=561 c2=488 c3=637 create");
	saveAs("Tiff", filePath+filePrefix+iPosition+".TIF");
}

timeEnd = getTime();

totalTime = ((timeEnd-timeStart)/1000)

print("Export took "+totalTime+" seconds for "+nPositions+" images");
print("Export took "+(totalTime/nPositions)+" seconds per image");

// use = virtual
// Export took 15.93 seconds for 10 images
// Export took 1.593 seconds per image

// not virtual
// Export took 15.127 seconds for 10 images
// Export took 1.5127 seconds per image

On my machine there’s a factor 3.2 between bioformats and a ‘direct read’. Maybe some of this (bioformats and ‘image sequence’) can be even faster with parallelization but I’m not sure and I don’t really know in which way to go to do that.

Running the importer multiple times will add a significant overhead as the metadata parsing and initialisation is taking place for each position. I can put together a jython script that would eliminate that duplication and provide some improvement.

If the goal is to convert each position to its own tiff then the bftools (https://www.openmicroscopy.org/bio-formats/downloads/) can also perform that conversion if thats an option (./bfconvert path/to/myFile.nd path/to/position_%s.tiff).

1 Like

I need to try! However it’s not working on the linked data currently because of this bug : Error during opening an .nd file. For the tests in this thread I compiled the gpl readers library with your fix @dgault , but I think I’ll wait until the fix makes its way to bftools to try it