Cropping large pyramidal TIFFs with bfconvert crashes

I seem to have run into a similar issue to that in post 32163, but as I have to crop lots of images, I would like to automate it.

Task: Given a large pyramidal TIFF, crop out a specified piece of the image, returning a pyramidal TIFF while maintaining the original metadata (and ideally the original associated slide overview image).

Here’s what I did and what happened (and I’m using a freshly-downloaded bioformats, version 6.5.1, and OpenJDK 14.0.2):

euler: $ bfconvert -crop 24064,0,45392,27760 "1266 HE.tif" 1266_15_HE.tif
1266 HE.tif
VentanaReader initializing 1266 HE.tif
Reading IFDs
Populating metadata
Populating OME metadata
[Ventana .bif] -> 1266_15_HE.tif [Tagged Image File Format]
Switching to BigTIFF (by file size)
Tile size = 1024 x 1024
	Series 0: converted 1/1 planes (100%)
Tile size = 1024 x 1024
Exception in thread "main" loci.formats.FormatException: Invalid tile size: x=38400, y=0, w=1024, h=1024
	at loci.formats.FormatTools.checkTileSize(FormatTools.java:1026)
	at loci.formats.FormatTools.checkPlaneParameters(FormatTools.java:1002)
	at loci.formats.in.VentanaReader.openBytes(VentanaReader.java:205)
	at loci.formats.FormatReader.openBytes(FormatReader.java:878)
	at loci.formats.ImageReader.openBytes(ImageReader.java:449)
	at loci.formats.tools.ImageConverter.getTile(ImageConverter.java:1076)
	at loci.formats.tools.ImageConverter.convertTilePlane(ImageConverter.java:853)
	at loci.formats.tools.ImageConverter.convertPlane(ImageConverter.java:790)
	at loci.formats.tools.ImageConverter.testConvert(ImageConverter.java:718)
	at loci.formats.tools.ImageConverter.main(ImageConverter.java:1095)

Hmm, I’ve no idea where this invalid tile size comes from. So I tried some experiments. For example, if I try specifying the tile size, I still get the same error:

euler: $ bfconvert -tilex 512 -tiley 512 -crop 24064,0,45392,27760 "1266 HE.tif" 1266_15_HE.tif
1266 HE.tif
VentanaReader initializing 1266 HE.tif
Reading IFDs
Populating metadata
Populating OME metadata
[Ventana .bif] -> 1266_15_HE.tif [Tagged Image File Format]
Switching to BigTIFF (by file size)
Tile size = 512 x 512
	Series 0: converted 1/1 planes (100%)
Tile size = 512 x 512
Exception in thread "main" loci.formats.FormatException: Invalid tile size: x=38400, y=0, w=512, h=512
	at loci.formats.FormatTools.checkTileSize(FormatTools.java:1026)
	at loci.formats.FormatTools.checkPlaneParameters(FormatTools.java:1002)
	at loci.formats.in.VentanaReader.openBytes(VentanaReader.java:205)
	at loci.formats.FormatReader.openBytes(FormatReader.java:878)
	at loci.formats.ImageReader.openBytes(ImageReader.java:449)
	at loci.formats.tools.ImageConverter.getTile(ImageConverter.java:1076)
	at loci.formats.tools.ImageConverter.convertTilePlane(ImageConverter.java:853)
	at loci.formats.tools.ImageConverter.convertPlane(ImageConverter.java:790)
	at loci.formats.tools.ImageConverter.testConvert(ImageConverter.java:718)
	at loci.formats.tools.ImageConverter.main(ImageConverter.java:1095)

while if I use a nice power of 2 as the crop size and start in the top corner, it works fine:

euler: $ bfconvert -crop 0,0,65536,65536 "1266 HE.tif" 1266_15_HE.tif
1266 HE.tif
VentanaReader initializing 1266 HE.tif
Reading IFDs
Populating metadata
Populating OME metadata
[Ventana .bif] -> 1266_15_HE.tif [Tagged Image File Format]
Switching to BigTIFF (by file size)
Tile size = 1024 x 1024
	Series 0: converted 1/1 planes (100%)
Tile size = 1024 x 1024
	Series 1: converted 1/1 planes (100%)
Tile size = 1024 x 1024
	Series 2: converted 1/1 planes (100%)
Tile size = 1024 x 1024
	Series 3: converted 1/1 planes (100%)
Tile size = 1024 x 1024
	Series 4: converted 1/1 planes (100%)
	Series 5: converted 1/1 planes (100%)
	Series 6: converted 1/1 planes (100%)
	Series 7: converted 1/1 planes (100%)
	Series 8: converted 1/1 planes (100%)
	Series 9: converted 1/1 planes (100%)
	Series 10: converted 1/1 planes (100%)
	Series 11: converted 1/1 planes (100%)
	Series 12: converted 1/1 planes (100%)
[done]
407.465s elapsed (121.15385+31190.0ms per plane, 406ms overhead)

However, if I specify the tile size, it doesn’t:

euler: $ bfconvert -tilex 512 -tiley 512 -crop 0,0,65536,65536 "1266 HE.tif" 1266_15_HE.tif 
1266 HE.tif
VentanaReader initializing 1266 HE.tif
Reading IFDs
Populating metadata
Populating OME metadata
[Ventana .bif] -> 1266_15_HE.tif [Tagged Image File Format]
Switching to BigTIFF (by file size)
Tile size = 512 x 512
	Series 0: converted 1/1 planes (100%)
Tile size = 512 x 512
	Series 1: converted 1/1 planes (100%)
Tile size = 512 x 512
	Series 2: converted 1/1 planes (100%)
Tile size = 512 x 512
	Series 3: converted 1/1 planes (100%)
Tile size = 512 x 512
	Series 4: converted 1/1 planes (100%)
Tile size = 512 x 512
	Series 5: converted 1/1 planes (100%)
Tile size = 512 x 512
	Series 6: converted 1/1 planes (100%)
Tile size = 512 x 512
	Series 7: converted 1/1 planes (100%)
Tile size = 304 x 512
Exception in thread "main" java.lang.IndexOutOfBoundsException: Range [0, 0 + 155648) out of bounds for length 29184
	at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
	at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckFromIndexSize(Preconditions.java:82)
	at java.base/jdk.internal.util.Preconditions.checkFromIndexSize(Preconditions.java:343)
	at java.base/java.util.Objects.checkFromIndexSize(Objects.java:425)
	at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:129)
	at java.base/java.io.DataOutputStream.write(DataOutputStream.java:106)
	at loci.formats.tiff.TiffSaver.writeImage(TiffSaver.java:354)
	at loci.formats.tiff.TiffSaver.writeImage(TiffSaver.java:278)
	at loci.formats.out.TiffWriter.saveBytes(TiffWriter.java:288)
	at loci.formats.tools.ImageConverter.convertTilePlane(ImageConverter.java:922)
	at loci.formats.tools.ImageConverter.convertPlane(ImageConverter.java:790)
	at loci.formats.tools.ImageConverter.testConvert(ImageConverter.java:718)
	at loci.formats.tools.ImageConverter.main(ImageConverter.java:1095)

(Another issue, which is probably that I’m missing a command-line switch, is that on those occasions where it works, I just get a single resolution image TIFF, not a pyramidal TIFF, at least as far as OpenSlide believes, and I lose the associated image and a fair bit of the original metadata.)

Now here is some information about the original file “1266 HE.tif”. OpenSlide lists the following dimensions:

>>> slide.level_dimensions
((77632, 137424),
 (38816, 68712),
 (19408, 34360),
 (9704, 17184),
 (4856, 8592),
 (2432, 4296),
 (1216, 2152),
 (608, 1080),
 (304, 544),
 (152, 272),
 (80, 136))

and if I run showinf, I get the following:

euler: $ showinf -nopix "1266 HE.tif"
Checking file format [Ventana .bif]
Initializing reader
VentanaReader initializing 1266 HE.tif
Reading IFDs
Populating metadata
Populating OME metadata
Initialization took 0.162s

Reading core metadata
filename = 1266 HE.tif
Series count = 13
Series #0 :
	Image count = 1
	RGB = true (3) 
	Interleaved = false
	Indexed = false (true color)
	Width = 77632
	Height = 137424
	SizeZ = 1
	SizeT = 1
	SizeC = 3 (effectively 1)
	Tile size = 1024 x 1024
	Thumbnail size = 75 x 128
	Endianness = intel (little)
	Dimension order = XYCZT (uncertain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = false
	-----
	Plane #0 <=> Z 0, C 0, T 0

Series #1 :
	Image count = 1
	RGB = true (3) 
	Interleaved = false
	Indexed = false (true color)
	Width = 38816
	Height = 68712
	SizeZ = 1
	SizeT = 1
	SizeC = 3 (effectively 1)
	Tile size = 1024 x 1024
	Thumbnail size = 75 x 128
	Endianness = intel (little)
	Dimension order = XYCZT (uncertain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = true
	-----
	Plane #0 <=> Z 0, C 0, T 0

Series #2 :
	Image count = 1
	RGB = true (3) 
	Interleaved = false
	Indexed = false (true color)
	Width = 19408
	Height = 34360
	SizeZ = 1
	SizeT = 1
	SizeC = 3 (effectively 1)
	Tile size = 1024 x 1024
	Thumbnail size = 75 x 128
	Endianness = intel (little)
	Dimension order = XYCZT (uncertain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = true
	-----
	Plane #0 <=> Z 0, C 0, T 0

Series #3 :
	Image count = 1
	RGB = true (3) 
	Interleaved = false
	Indexed = false (true color)
	Width = 9704
	Height = 17184
	SizeZ = 1
	SizeT = 1
	SizeC = 3 (effectively 1)
	Tile size = 1024 x 1024
	Thumbnail size = 75 x 128
	Endianness = intel (little)
	Dimension order = XYCZT (uncertain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = true
	-----
	Plane #0 <=> Z 0, C 0, T 0

Series #4 :
	Image count = 1
	RGB = true (3) 
	Interleaved = false
	Indexed = false (true color)
	Width = 4856
	Height = 8592
	SizeZ = 1
	SizeT = 1
	SizeC = 3 (effectively 1)
	Tile size = 1024 x 1024
	Thumbnail size = 75 x 128
	Endianness = intel (little)
	Dimension order = XYCZT (uncertain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = true
	-----
	Plane #0 <=> Z 0, C 0, T 0

Series #5 :
	Image count = 1
	RGB = true (3) 
	Interleaved = false
	Indexed = false (true color)
	Width = 2432
	Height = 4296
	SizeZ = 1
	SizeT = 1
	SizeC = 3 (effectively 1)
	Tile size = 1024 x 1024
	Thumbnail size = 75 x 128
	Endianness = intel (little)
	Dimension order = XYCZT (uncertain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = true
	-----
	Plane #0 <=> Z 0, C 0, T 0

Series #6 :
	Image count = 1
	RGB = true (3) 
	Interleaved = false
	Indexed = false (true color)
	Width = 1216
	Height = 2152
	SizeZ = 1
	SizeT = 1
	SizeC = 3 (effectively 1)
	Tile size = 1024 x 1024
	Thumbnail size = 75 x 128
	Endianness = intel (little)
	Dimension order = XYCZT (uncertain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = true
	-----
	Plane #0 <=> Z 0, C 0, T 0

Series #7 :
	Image count = 1
	RGB = true (3) 
	Interleaved = false
	Indexed = false (true color)
	Width = 608
	Height = 1080
	SizeZ = 1
	SizeT = 1
	SizeC = 3 (effectively 1)
	Tile size = 1024 x 1024
	Thumbnail size = 75 x 128
	Endianness = intel (little)
	Dimension order = XYCZT (uncertain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = true
	-----
	Plane #0 <=> Z 0, C 0, T 0

Series #8 :
	Image count = 1
	RGB = true (3) 
	Interleaved = false
	Indexed = false (true color)
	Width = 304
	Height = 544
	SizeZ = 1
	SizeT = 1
	SizeC = 3 (effectively 1)
	Tile size = 1024 x 1024
	Thumbnail size = 75 x 128
	Endianness = intel (little)
	Dimension order = XYCZT (uncertain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = true
	-----
	Plane #0 <=> Z 0, C 0, T 0

Series #9 :
	Image count = 1
	RGB = true (3) 
	Interleaved = false
	Indexed = false (true color)
	Width = 152
	Height = 272
	SizeZ = 1
	SizeT = 1
	SizeC = 3 (effectively 1)
	Tile size = 1024 x 1024
	Thumbnail size = 75 x 128
	Endianness = intel (little)
	Dimension order = XYCZT (uncertain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = true
	-----
	Plane #0 <=> Z 0, C 0, T 0

Series #10 :
	Image count = 1
	RGB = true (3) 
	Interleaved = false
	Indexed = false (true color)
	Width = 80
	Height = 136
	SizeZ = 1
	SizeT = 1
	SizeC = 3 (effectively 1)
	Tile size = 1024 x 1024
	Thumbnail size = 75 x 128
	Endianness = intel (little)
	Dimension order = XYCZT (uncertain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = true
	-----
	Plane #0 <=> Z 0, C 0, T 0

Series #11 :
	Image count = 1
	RGB = true (3) 
	Interleaved = false
	Indexed = false (true color)
	Width = 1016
	Height = 3260
	SizeZ = 1
	SizeT = 1
	SizeC = 3 (effectively 1)
	Tile size = 1016 x 64
	Thumbnail size = 39 x 128
	Endianness = intel (little)
	Dimension order = XYCZT (uncertain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = true
	-----
	Plane #0 <=> Z 0, C 0, T 0

Series #12 :
	Image count = 1
	RGB = false (1) 
	Interleaved = false
	Indexed = false (true color)
	Width = 1016
	Height = 3260
	SizeZ = 1
	SizeT = 1
	SizeC = 1
	Tile size = 1016 x 64
	Thumbnail size = 39 x 128
	Endianness = intel (little)
	Dimension order = XYCZT (uncertain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = true
	-----
	Plane #0 <=> Z 0, C 0, T 0


Reading global metadata
BitsPerSample: 8
Compression: Uncompressed
ImageLength: 3260
ImageWidth: 1016
Instrument Make: Ventana Medical Systems
Instrument Model: iScan HT
MetaDataPhotometricInterpretation: RGB
NumberOfChannels: 3
PhotometricInterpretation: RGB
PlanarConfiguration: Chunky
SamplesPerPixel: 3

Reading series #0 metadata

Any thoughts are gratefully appreciated!

Thanks!

Julian

Hi Julian,

there are a few issues issues, trying to address them in order:

Task: Given a large pyramidal TIFF, crop out a specified piece of the image, returning a pyramidal TIFF while maintaining the original metadata (and ideally the original associated slide overview image).

We will dive more into the specifics of the conversion issue but it would be useful to know more about the for generating a cropped representation of the original multi-resolution data. Is it related to the file size or because some data is unnecessary?
The alternate approach is to preserve the entire pyramidal structure but then let whatever consumes the data in charge of accessing a sub-region of the image.

Hmm, I’ve no idea where this invalid tile size comes from.

Having troubleshooted your example, the core issue is that cropping a pyramidal issue simply does not work out-of-the box with bfconvert. The crop option was implemented for a very different use case with homogeneous image dimensions. Applying it to pyramidal images with arbitrary crop regions and arbitrary downsampling between resolutions levels quickly runs into computing and rounding challenges.

I have opened a GitHub issue to discuss what the behavior should be and how to implement more informative error messages preventing some combinations of workflows.

Another issue, which is probably that I’m missing a command-line switch, is that on those occasions where it works, I just get a single resolution image TIFF, not a pyramidal TIFF, at least as far as OpenSlide believes, and I lose the associated image and a fair bit of the original metadata.

I think the primary issue here is that that bfconvert will use the file extension to select the writer. In this case, 1266_15_HE.tif will select the plain TIFF writer and write into a format with no support for pyramidal levels or rich metadata.

Any thoughts are gratefully appreciated!

To achieve your original task, you want you will most likely need a two-way conversion:

  • first convert and crop the largest resolution of the original pyramid only using -series 0, using OME-TIFF as the output format to preserve the original metadata
  • then convert the cropped OME-TIFF into a pyramidal OME-TIFF by recomputing the resolutions levels
bfconvert -crop 24064,0,45392,27760 -series 0 "1266 HE.tif" 1266_HE_fullscale.ome.tif
bfconvert -pyramid-resolutions 10 -pyramid-scale 2 1266_HE_fullscale.ome.tif 1266_HE.ome.tif

You should be able to check the generated file has both metadata and pyramidal levels using the -noflat option of showinf

showinf -noflat -nopix 1266_HE.ome.tif

Not sure if it is helpful to throw out another program, but you could use QuPath to open the whole slide image, indicate a region using an annotation, and then write out a pyramidal OME.tiff using the original pixel size, at least. “Original metadata” might be harder if that includes information other than channel names and pixel sizes, but if your primary purpose is picking out certain areas automatically and generating a pyramidal image as a result, using QuPath to determine the target automatically might be beneficial for a large number of images.

Thanks for the thought! We’ve already identified the regions, and we have about 500 of them, so I’d prefer not to have to do it by hand. If there’s an easy way to automate that, it would be great.

The other thing I’ve discovered is that using a plain bfconvert command to convert, say, a Ventana .bif to a plain .tiff increases the file size by a huge factor. I wonder if I’m doing something wrong?

Hi @juliang :smiley:

If that’s in reference to @Research_Associate’s QuPath suggestion, there’s a convert-ome subcommand for the QuPath command line which should support cropping.

If your regions are annotated already inside QuPath, you can also script export from QuPath with writeImageRegion – see https://qupath.readthedocs.io/en/latest/docs/advanced/exporting_images.html#images-regions

Using Run → Run for project within QuPath allows you to run this across multiple images, or you can do it from the command line as well.

If you need more control, there’s also an OMEPyramidWriter.Builder class that can take a load of options – including a region – to customize export.

One example of it in action here; if it might be useful, happy to help figure out how to customize a script depending upon how exactly your regions are defined.

In all cases, QuPath would be writing an ome-tiff using Bio-Formats – just giving more/different options than bfconvert, and hiding some extra logic around parallelizing tiles requests / generating pyramidal levels to make things (possibly) a bit easier.

2 Likes