Correct way to convert SVS slides for Omero import

Hi,
I am going to import some hundreds of SVS slides into Omero. If I import them as they are, the macro image and the label are always imported, and cannot be deleted (I could unlink them, but they end in the Orphaned dataset, which if possible I would avoid).
Following advice found in another thread, I tried to convert them to TIFF with bfconvert and vips (this discards macro and label), different attempts with different parameters, but when I tried to import in Omero, there was some background process running for about 20-30 minutes per slide (in addition to the time needed for conversion).
So, the question is: is there one specific way of preparing slides for import that does not need such a long time? Alternatively, is there a way to effectively delete orphaned parts? Thanks.

Hi @VDM , while there no convenient way to remove the slide label image in OMERO, you could destructively remove it from the .svs file with anonymize_slide (GitHub - bgilbert/anonymize-slide: Delete the label from a whole-slide image). That will keep a the blanked-out label image but it solves the PHI issue that, I presume, you want to fix.
Cheers,
Damir

1 Like

Dear Damir, thanks. I know and I used the anonymize-slide script, however it lets the images there, which is okay from a privacy point of view, put ideally I would like to have a “clean” dataset for students.

This certainly reads suboptimal. Assuming a faithful conversion, the expectation is that the import process should be typically as fast as importing the original SVS files themselves. The first thing to verify is that the pyramidal levels are correctly generated and detected. The opposite would trigger some background server process to regenerate a pyramid which could explain the timings mentioned above.

Using the Bio-Formats command-line tools what does the following command

showinf -nopix -noflat /path/to/converted/file

return when using with one of your converted files?

Thanks, I relaunched conversion because I had already deleted the files:

Checking file format [OME-TIFF]
Initializing reader
OMETiffReader initializing ome/provabf.ome.tiff
Reading IFDs
Populating metadata
Initialization took 0.121s

Reading core metadata
filename = /mnt/nas/Datasets/anpat/ome/provabf.ome.tiff
Used files = [/mnt/nas/Datasets/anpat/ome/provabf.ome.tiff]
Series count = 1
Series #0 :
	Image count = 1
	RGB = true (3) 
	Interleaved = false
	Indexed = false (false color)
	Width = 57768
	Height = 38038
	SizeZ = 1
	SizeT = 1
	SizeC = 3 (effectively 1)
	Tile size = 240 x 240
	Thumbnail size = 128 x 84
	Endianness = intel (little)
	Dimension order = XYCZT (certain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = false
	-----
	Plane #0 <=> Z 0, C 0, T 0


Reading global metadata
BitsPerSample: 8
Comment: macro 1600x598
Compression: JPEG
ImageLength: 38038
ImageWidth: 57768
MetaDataPhotometricInterpretation: RGB
NewSubfileType: 0
NumberOfChannels: 3
PhotometricInterpretation: RGB
PlanarConfiguration: Chunky
SamplesPerPixel: 3
Series 1 58928x38138 [0,100 57768x38038] (240x240) JPEG/RGB Q: 70
Series 1 AppMag: 20
Series 1 DSR ID: XXXX
Series 1 Date: 10/31/19
Series 1 DisplayColor: 0
Series 1 Exposure Scale: 0.000001
Series 1 Exposure Time: 32
Series 1 Filename: XXXX
Series 1 Focus Offset: 0.000000
Series 1 ICC Profile: AT2
Series 1 ImageID: 5080
Series 1 Left: 18.637545
Series 1 LineAreaXOffset: 0.016775
Series 1 LineAreaYOffset: -0.003966
Series 1 LineCameraSkew: 0.000479
Series 1 MPP: 0.5004
Series 1 OriginalHeight: 38138
Series 1 OriginalWidth: 58928
Series 1 Parmset: Very Faintly Stained
Series 1 ScanScope ID: SS7391
Series 1 SessonMode: NR
Series 1 StripeWidth: 2032
Series 1 Time: 11:40:07
Series 1 Time Zone: GMT+01:00
Series 1 Title: XXXX
Series 1 Top: 21.322136
Series 1 User: 00000000-0000-0000-0000-000000000000
Series 2 58928x38138 [0,100 57768x38038] (240x240) - 14442x9509 JPEG/RGB Q: 85
Series 3 AppMag: 20
Series 3 DSR ID: aptx2560
Series 3 Date: 10/31/19
Series 3 DisplayColor: 0
Series 3 Exposure Scale: 0.000001
Series 3 Exposure Time: 32
Series 3 Filename: XXXX
Series 3 Focus Offset: 0.000000
Series 3 ICC Profile: AT2
Series 3 ImageID: 5080
Series 3 Left: 18.637545
Series 3 LineAreaXOffset: 0.016775
Series 3 LineAreaYOffset: -0.003966
Series 3 LineCameraSkew: 0.000479
Series 3 MPP: 0.5004
Series 3 OriginalHeight: 38138
Series 3 OriginalWidth: 58928
Series 3 Parmset: Very Faintly Stained
Series 3 ScanScope ID: SS7391
Series 3 SessonMode: NR
Series 3 StripeWidth: 2032
Series 3 Time: 11:40:07
Series 3 Time Zone: GMT+01:00
Series 3 Title: XXXX
Series 3 Top: 21.322136
Series 3 User: 00000000-0000-0000-0000-000000000000
Series 4 58928x38138 [0,100 57768x38038] (240x240) - 3610x2377 JPEG/RGB Q: 92
TileByteCounts: 4731
TileLength: 240
TileOffsets: 16
TileWidth: 240
YCbCrSubSampling: chroma image dimensions are half the luma image dimensions

Thanks Vincenzo,

Series #0 :
	Image count = 1
	RGB = true (3) 
	Interleaved = false
	Indexed = false (false color)
	Width = 57768
	Height = 38038

So this output indicates the generated OME-TIFF files only contains a single resolution large RGB plane. For a multi-resolution RGB image, I would expect the same command to display something like

Series #0 :
	Resolutions = 7
		sizeX[0] = 57768
		sizeX[1] = 28884
		sizeX[2] = 14442
		sizeX[3] = 7221
		sizeX[4] = 3610
		sizeX[5] = 1805
		sizeX[6] = 902
	RGB = true (3) 
	Interleaved = false
	Indexed = false (false color)
	Width = 57768
	Height = 38038

Which tool, version and command did you use to convert the file?

Honestly, I just followed some recipes I found without thinking much, this is the reason of my request:

bfconvert -series 0 input.svs output.ome.tif

and

vips copy input.svsoutput.tif[pyramid,subifd]

The latter gives more or less the same (less metadata, actually).

Hi @VDM , understood and agree what you say about anonymize-slide. Then a bfconvert type of workflow is probably your best bet. One set of tools that does the same but tends to be much faster are the Glencoe tools: Converting Whole Slide Images to OME-TIFF: A New Workflow
I haven’t specifically tested it but I think the --series 0 option to the bioformats2raw command will only convert the actual image and omit the label and macro image from the output.
Damir

1 Like

For the Bio-Formats conversion utility, you will need to pass the -noflat option as well i.e. the following command

bfconvert -series 0 -noflat input.svs output.ome.tif

will convert the first image of the SVS file along with all its pyramidal levels into a target OME-TIFF file.

As Damir mentioned, the pipeline described in Converting Whole Slide Images to OME-TIFF: A New Workflow is an alternative which has been specially designed for converting image files of this modality at scale.

For vips, my understanding from the release notes is that the library has now full support for writing pyramidal OME-TIFF but someone from this community might be more knowledgeable about how to adjust your conversion command.

1 Like

Thanks, I tested and everything seems okay (without any other parameter, size grows 10x, but I added a -compression JPEG option and size is only slightly more tan the original).

1 Like

Just keep in mind that the original file was apparently compressed lossy-ly (probably JPEG?) and that uncompressing and then compressing with JPEG again will not help the fidelity of your images. @petebankhead has a nice write-up on that: Files & file formats · Analyzing fluorescence microscopy images with ImageJ

1 Like

Damir, I know. SVS tiles are JPEG compressed, and I wish I could avoid further compression. However, in this specific case they are not meant for further processing but visualization only (teaching/training), thus I can accept some extra loss. Ideally, I would be glad to just delete the label and macro from SVS, but util now I did not find a solution.

Instead of converting the SVS file, try to hide the macro and label images from TIFF readers by patching the IFD chain in a copy of the SVS file, e.g.:

import struct
import tifffile

with tifffile.TiffFile('Copy of SVS file.svs', mode='r+b') as svs:
    fh = svs.filehandle
    tiff = svs.tiff
    assert svs.is_svs
    for i, page in enumerate(svs.pages):
        if 'label ' in page.description or 'macro ' in page.description:
            assert i > 0
            # seek to position where offset to label/macro page is stored
            previous_page = svs.pages[i - 1]
            fh.seek(previous_page.offset)
            tagno = struct.unpack(tiff.tagnoformat, fh.read(tiff.tagnosize))[0]
            offset = previous_page.offset + tiff.tagnosize + tagno * tiff.tagsize
            fh.seek(offset)
            # terminate IFD chain
            fh.write(struct.pack(tiff.offsetformat, 0))
            print(f'zeroed value {page.offset} @ {offset}')
            break
    else:
        print('no label or macro image found')

Thanks @cgohlke , however this is what the Benjamin Gilbert script already does, but it still leaves the image there, thus each dataset is cluttered with 3 times the images needed. I suspect real removal needs recalculating and updating offsets of the images stored after those to be deleted.

For me the Benjamin Gilbert script does not remove the label and macro images, only the label image. There’s also still the thumbnail image in the second page but it’s not clear if you want to remove that too (the SVS format requires it to be present).

Ideally I would “physically” remove both, if possible. However, there is something strange. In both cases, when opening with a viewer, I do not see the label at all (good).
When importing in Omero the slide processed with your script, I see 3 images, almost the same as when using the Gilbert script. With the Gilbert script, one of them is the macro. Ideally, I would like to have the slide only :slight_smile: . The other solution I have is to “orphanize” the two extra images (after anonymisation); they will be still available, but at least not inside the dataset.

Just tried with the latest Omero Docker image: I don’t think SVS files are imported correctly. The three images imported into Omero are always from the first and the two last IFDs/pages, regardless whether the last two IFDs actually contain the label and macro images or higher pyramid levels.

1 Like

@cgohlke thanks for the testing. What you describe certainly does not feel like the expected behavior and I could not find a record of a similar issue with SVS datasets in our tracking systems. Was this a file generated by the acquisition software or a secondary file created using a script similar to the one mentioned in the thread above? Would you be able to share a representative sample file that (ideally in a permissive manner) to help us track the underlying issue?

I used CMU-1-JP2K-33005.svs and removed/hid the last two IFDs with above script.

According to the Aperio SVS File Format specification, the label and macro images are optional:

“Optionally at the end of an SVS file there may be a slide label image, which is a low resolution
picture taken of the slide’s label, and/or a macro camera image, which is a low resolution picture
taken of the entire slide.” (Digital Slides and Third-Party Data Interchange, Aperio Technologies, Inc. December 9, 2008, page 14)

I think that the issue is not in the import (because SVS files remain as they are) but likely in visualization, where maybe it is expected that SVS slides have always 3 images inside.
EDIT: I did a test.
I have a slide that is 7968x11292, with Layer 2= 1992x2823 and thumbnail=541x768 (plus label= 582x638 and macro=1600x598).
After running the Gilbert script, what is shown as label in Iviewer (and Insight too) is the Layer 2, and the macro stays there as in the original.
After running @cgohlke script, Iviewer shows the thumbnail as label and Layer 2 as macro.