Pyramid Creation: Unconverted Bioformats vs Ome-Tiff (was: Bioformats2Raw seems very slow on our data)

Hi all,

after we ran into some problems with pyramid generation times on our omero instance, we’re now trying to adjust our workflow to using the bioformat2raw → raw2ometiff → upload to OMERO approach. Still, the pyramid creation times of some typical images seem very long to me, if I consider statements like “[…] which may be as little as 5 minutes in total for 40 or more GiB of pixel data.”, from the Glencoe Website.

I am at the moment seeing conversion times of almost 10 minutes on 16 Threads of a Ryzen 3950X for an 6144x14336x3 .lsm image (140MB), with 512x512 tiles and 6 resolutions. This seems very long to me, especially considering that Converting the .lsm file to an ometiff with Fiji + Bioformats, and then using the BDV plugin to create N5 pyramids of the same image takes maybe 30s.

Am I right in my conclusion, that the process is quite slow for images of that size? If yes, are there known problems with certain types of bioformats or other known factors that slow down the conversion?

Feedback would be much appreciated and regardless thanks for the great tools you’re providing!

// Julian

After some further research I have some more Info:

I switched over to bfconvert from bftools since I found out, that it is able to save pyramids as well and we don’t need the additional readers provided by bioformats2raw. Here I noticed something strange to me:

If I run the tool directly on the original Data and create the pyramids on the fly, the process takes ages to complete, e.g. bftools/bfconvert -tilex 512 -tiley 512 -pyramid-scale 3 -pyramid-resolutions 3 -noflat Test.lsm Test.ome.tiff

On the other hand if I first convert the .lsm file to a temporary OME.tiff without pyramids (bftools/bfconvert Test.lsm Test.ome.tiff) and then run bftools/bfconvert -tilex 512 -tiley 512 -pyramid-scale 3 -pyramid-resolutions 3 -noflat Test.ome.tiff Test2.ome.tiff, the pyramid creation only takes a few seconds.

I suspect this is also the cause for our problems mentioned in PixelData threads and pyramid generation issues and PixelData / Pyramid Creation does not Multithread correctly.
Am I right in the assumption, that the OMERO pyramid creation is based on the same java library as the bfconvert tool and goes straight from the original format to the pyramidal .ome.tiffs?

//
Julian

Hi all again,

after some more research I could reproduce the very different pyramid creation times of unconverted bioformats vs. unpyramidal OME-Tiffs directly inside OMERO as well.

If I upload an unconverted .png file (10k x 15k px) to my test server, pyramid creation takes ca. 12 minutes with 1024x1024 tiles. If I convert the png to an unpyramidal .ome.tiff before, pyramid creation only takes 1.5 minutes.
For smaller tiles (512x512) this is even worse, with the original png taking 30 minutes, while the .ome.tiff only takes 2. Logs for both files are attached and I can provide the raw data as well upon request, if you want to reproduce.

For us this poses a bit of a problem: Of course it would be no big technical problem to convert all files to ome.tiff before uploading to the OMERO server. But this reduces one of the advantages OMERO has for us, in that the original proprietary data file is hard linked to the data and still accessible if needed.
Would it be an option to let OMERO convert the inputs to an unpyramidal OME.tiff first and then generate the pyramids? From my data this looks like it would speed up the whole pyramid process by orders of magnitude.

// Julian
Pixel512_log.txt (10.7 KB)
Pixel1024_log.txt (8.4 KB)

1 Like

Hi @JulianHn, thanks for raising the issue and carrying out some initial investigations. Would you be able to share either the png or lsm file mentioned so I can carry out some in depth profiling?

Hi @dgault: Yes I’ll write you a PM with a download link for both files.

Thanks Julian for supplying the sample files.

What looks to be happening here is that the original images are compressed as a full plane, so when tiling is being used then the entire image is being decompressed each time a tile is being fetched. This will occur whether generating pyramids or not.

The reason you aren’t seeing the same slowness when you converted first to OME-TIFF is because the decompression happens once in this case and then each tile can be fetched much quicker. You could try performing the initial conversion and pyramid generation without tiling enabled, this will perform much faster but will require more memory to complete.

1 Like

Thanks David for taking time out of your day to profile that and for the explanation.

Our OMERO-Server has plenty of memory so that should in theory not be a problem. Is there a configuration option I’M unaware of to let OMERO generate the pyramids without tiling? Otherwise this would then still require us to go through an external conversion workflow.

// Julian

When OMERO is generating pyramids it actually shouldn’t be using the tiling options which should mean it will perform similar to bfconvert without the tiling. It will still take time to carry out the decompression step for each pyramid layer though, and large compressed planes are always going to perform slowly here. Pre-converting to OME-TIFF will mean that the decompression is only occurring once but obviously add an extra step onto your workflow.

1 Like

Okay then I seem to have a fundamental misunderstanding of what the PixelData Process is doing.

My understanding was, that it will create pyramids from the uploaded image with the highest resolution layer being defined by tiling the image into the tiles of size specified by omero.pixeldata.tile_width/height.
This is what I thought I can see in the PixelData-0.log with entries like:

2021-05-04 20:22:21,985 INFO  [                ome.io.nio.PixelsService] (1-thread-2) Pyramid creation for Pixels:14812 193/960 (20%).
2021-05-04 20:23:39,364 INFO  [                ome.io.nio.PixelsService] (1-thread-3) Pyramid creation for Pixels:14809 231/1152 (19%).
2021-05-04 20:23:47,691 INFO  [                ome.io.nio.PixelsService] (1-thread-5) Pyramid creation for Pixels:14807 231/1152 (19%).
2021-05-04 20:24:22,200 INFO  [                ome.io.nio.PixelsService] (1-thread-4) Pyramid creation for Pixels:14811 241/1200 (20%).
2021-05-04 20:26:01,683 INFO  [                ome.io.nio.PixelsService] (1-thread-8) Pyramid creation for Pixels:14808 303/3024 (9%).
2021-05-04 20:27:10,548 INFO  [                ome.io.nio.PixelsService] (1-thread-2) Pyramid creation for Pixels:14812 289/960 (30%).
2021-05-04 20:28:01,965 INFO  [                ome.io.nio.PixelsService] (1-thread-2) Pyramid creation for Pixels:14812 385/960 (40%).
2021-05-04 20:28:08,331 INFO  [                ome.io.nio.PixelsService] (1-thread-2) Pyramid creation for Pixels:14812 481/960 (50%).
2021-05-04 20:28:14,985 INFO  [                ome.io.nio.PixelsService] (1-thread-2) Pyramid creation for Pixels:14812 577/960 (60%).
2021-05-04 20:28:20,087 INFO  [                ome.io.nio.PixelsService] (1-thread-2) Pyramid creation for Pixels:14812 673/960 (70%).
2021-05-04 20:28:23,505 INFO  [                ome.io.nio.PixelsService] (1-thread-2) Pyramid creation for Pixels:14812 769/960 (80%).
2021-05-04 20:28:27,301 INFO  [                ome.io.nio.PixelsService] (1-thread-2) Pyramid creation for Pixels:14812 865/960 (90%).
2021-05-04 20:28:31,051 INFO  [                ome.io.nio.PixelsService] (1-thread-2) SUCCESS -- Pyramid created for pixels id:14812

Since the right number in the progress bars got smaller when I reduced the tile size, this seemed logical to me.
You are now saying that the pyramid creation should not use the tiling. If that is the case, what is happening during the time I see this in my logs? And why does tile size change the number of pyramids OMERO is creating if it does not use tiling?

Thanks and best regards

// Julian

Sorry, it was me who misunderstood how OMERO was handling the pyramid creation. The actual reading is actually using the tiling so the slow import process would make sense in this case. There is a post process step which recompresses the image in a single block, but the important step for this case is reading and decompressing the original image and this indeed using tiling.

2 Likes

Hi David, thanks for the clarification!

Okay, so to summarize:

  1. The problem with our images is that the whole planes are compressed and OMERO has to uncompress the whole plane for every tile it tries to fetch.
  2. OMEROs pyramid creation process indeed uses tiles and runs in this exact problem since it has to fetch a lot of tiles and in turn decompress a lot.
  3. Preconverting to OME-Tiff fixed this, since the image is now uncompressed
  4. There does not seem to be a way to tell the OMERO PixelData Process to uncompress the whole image before performing the tile fetching in pyramid creation

What I find confusing about this whole ordeal is that this happens to us with files that come straight from commercial microscopes, so I would suspect that we can’t be the only group to run into this problem.

In the end this is very unfortunate for us, since this means we probably need to go through a pre-conversion workflow (can’t wait 5 hours per image to convert) and therefore necessitate us to run a secondary repository with the original files and a database to link to the OMERO image.

Unfortunately my Java is absolutely not good enough to gather how complicated it would be to implement an intermediary uncompression step of the whole image in the PixelData Process that people with sufficient RAM can activate to speed up the import process, let alone implementing it myself. It would be on my wishlist, but since I have not seen this specific problem pop up elsewhere here, I suspect we might be the only ones :see_no_evil:

// Julian

Yeah I think thats a fair summary of what is happening. Though you certainly aren’t the only one to encounter this issue, there are plenty of others who are struggling with performance for large compressed planes across a variety of different formats. In many cases conversion to a more suitable format is often the solution.