Multi-scale image labels v0.1

Summary:

This specification defines a convention for storing multiscale integer-valued labels, commonly used for storing a segmentation. Multiple such labeled images can be associated with a single image:

image/
│
├── 0-4                # Resolution levels of the primary image
│
└── labels             # Group container for the labeled images.
    │
    ├── original       # One or more groups which each represent
    ├── ..             # a multi-scale pyramid of integer values
    └── recalculated   # representing e.g. detected objects.

Use of this draft for the specification of IDR images in S3 is available at https://github.com/ome/omero-ms-zarr/blob/9ac27a9a1a91d2f19a4af21c30643c58c272f8aa/spec.md

“labels” group

In order to enable discovery, the well-known group “labels” within each image directory functions as a registry of all known image labels.

See color-key below

-{
-    "labels": [
!        "original",
!        "recalculated"
-    ]
-}

“image-label” group

Each group image label group is itself a multiscale image and should contain the “multiscales” metadata key. However, the presence of the “image-label” key identifies a group as a labeled image. In order to enable discovery each such group should be registered with the “labels” group above. Additionally, labeled images should list their source image in order to enable a bidirectional link.

The primary additional metadata that is currently specified is in the “colors” metadata key. Each label value can be registered in an array

See color-key below

-{
-    "image-label": {
!        "version": "0.1",
!        "source": {
!            "image": "../.."
!        },
+        "colors": [
!            {
-             "label-value": 1,
-             "rgba": [128, 128, 128, 128]
!            }          
-        ]
-    },
-}

Example workflow

If you would like to experiment with this specification, you can install the ome-zarr library via:

pip install ome-zarr==0.0.13

The library provides a napari plugin, which can optionally be activated via:

pip install ome-zarr[napari]==0.0.13

Sample data is available under the test-data subdirectory of the S3 bucket:

$ ome_zarr info https://s3.embassy.ebi.ac.uk/idr/zarr/test-data/labels-0.1.3-dev4/6001247.zarr/
https://s3.embassy.ebi.ac.uk/idr/zarr/test-data/labels-0.1.3-dev4/6001247.zarr/ [zgroup]
 - metadata
   - Multiscales
   - OMERO
 - data
   - (1, 2, 257, 210, 253)
https://s3.embassy.ebi.ac.uk/idr/zarr/test-data/labels-0.1.3-dev4/6001247.zarr/labels/ [zgroup] (hidden)
 - metadata
   - Labels
 - data
https://s3.embassy.ebi.ac.uk/idr/zarr/test-data/labels-0.1.3-dev4/6001247.zarr/labels/1/ [zgroup] (hidden)
 - metadata
   - Label
   - Multiscales
 - data
   - (1, 1, 257, 210, 253)
   - (1, 1, 257, 126, 105)
   - (1, 1, 257, 52, 63)
   - (1, 1, 257, 31, 26)
   - (1, 1, 257, 13, 15)

If you have existing masks in OMERO, you can export your image and masks using omero-cli-zarr:

$ pip install omero-cli-zarr
$ omero zarr export Image:6001240
$ omero zarr masks Image:6001240
$ ome_zarr info 6001240.zarr/
/tmp/6001240.zarr [zgroup]
 - metadata
   - Multiscales
   - OMERO
 - data
   - (1, 2, 236, 275, 271)
/opt/data/6001240.zarr/labels [zgroup] (hidden)
 - metadata
   - Labels
 - data
/tmp/6001240.zarr/labels/0 [zgroup] (hidden)
 - metadata
   - Label
   - Multiscales
 - data
   - (1, 1, 236, 275, 271)
   - (1, 1, 236, 135, 137)
   - (1, 1, 236, 68, 67)

Design trade-offs

Two additional layouts were considered for the labeled data itself. The first was a split representation in which each label was a separate bitmask. This representation is still possible by using multiple labeled images. The other was a 6-dimensional bitmask structure. The benefit of both of these was the support of overlaps. The downside was that many implementations do not natively support a compact representation of bit arrays.

For the metadata, also a number of different configurations were considered for describing metadata about each label value. The primary choice was between an array representation and two sparse representation: a dictionary with the downside of requiring string keys and a list of dictionaries with the downside of possible reduncancy. More details to this discussion are available under “Revamp color metadata” (#62).

Current limitations

Specification

  • Multi-channel labeled images are currently not supported. The colors metadata specification would need to be updated to do so.
  • The current assumption is that for every multiscale level in the image data a layer of equal size will be present in the labeled image.
  • Currently missing metadata:
    • label value for overlaps
    • source array (as opposed to group) of the segmentation

Implementation

Color key

(according to https://www.ietf.org/rfc/rfc2119.txt):

- MUST     : If these values are not present, the multiscale series will not be detected.
! SHOULD   : Missing values may cause issues in future versions.
+ MAY      : Optional values which can be readily omitted.
# UNPARSED : When updating between versions, no transformation will be performed on these values.
Revision Source Date Description
0.1.1 @joshmoore 2020.10.01 Migration to image.sc
0.1.0 @joshmoore 2020.09.16 Initial version on GitHub
4 Likes