IDR images - details and annotation

Dear IDR team,

I would like to have some clarification on the usage of IDR data.

I am currently looking at ProjectID 405
dr0045-reichmann-zygotespindle/experimentA

I have a couple of questions with regard to IDR and how images are curated and stored.

  1. Do you store the raw microscopy images and add annotations as attributes?
  2. Do you also correct or amend image details?

The reason that I ask is with regard to the Z index and T index of imageID 3509536.

They are given as Z-sections:101 and Timepoints: 130.

I could not find additional meta information with regard to the actual dimension units of the Z section nor the time duration between each timepoints.

Am I correct that I would have to refer back to the actual publication for obtaining those detail information?

On a side note, for images in SSBD, we only imported the raw microscopy images and added the annotations under attributes. We are currently looking at ways to include details of time and Z-units in the future.

Regards
Ken

Hi Ken,

  • Do you store the raw microscopy images and add annotations as attributes?

Yes IDR uses the raw images as they were uploaded by the submitters and adds value on top of the images using annotations (tables and/or maps)

  • Do you also correct or amend image details?

There are a few occasions where we added/updated some metadata according to the publication as it was missing from the original file format.

The reason that I ask is with regard to the Z index and T index of imageID 3509536.
They are given as Z-sections:101 and Timepoints: 130.
I could not find additional meta information with regard to the actual dimension units of the Z section nor the time duration between each timepoints.

Whenever possible the metadata is read directly from the original file format from Bio-Formats and populated directly into the database - see https://idr.openmicroscopy.org/webclient/img_detail/4995045/ for an example where the physical sizes are read directly from the original files.

In the case of idr0045 the original data was submitted as multi-TIFF files. Part of the image curation for this study went into combining the Z-stack for each timepoint to publish richer multi-Z timelapse representation. But the dimensions metadata was not present in the original file and was not added when we released the data.

Am I correct that I would have to refer back to the actual publication for obtaining those detail information?

In the current state, yes the publication would be the reference point for this information although we would like to start updating metadata like this more frequently to keep improving the reusability of these datasets.

Best,
Sebastien

1 Like

Hi Sebastien,

Thanks for the clarification.

I think our plan is to keep the raw images as it is. We are discussing on how to deal with additional meta data with regards to details and dimensions, etc.

Best regards
Ken

Hi Ken,

defining how to store the metadata is certainly one of the hard questions for any image resource. To summarize, IDR has made the following choices:

  • acquisition metadata should be stored using the corresponding elements in the database whenever these exist (dimensions, physical sizes, objectives…)
  • any additional metadata not captured in the data model should be represented as a combination of tables (bulk annotations) for a collection of images as well as namespaced map annotations on the individual images

Best,
Sebastien

Hi Sebastien,

What would IDR do when the acquisition metadata, i.e. data stored as details of the raw images are not consistent with actual experimental condition? e.g. Z and T values are the wrong way round

Do you correct them and edit the details of each image data?
Or do you put it separately as additional metadata under annotations ?

Do you decide case by case or do you have a policy in place for that?

We are planning to leave the images alone and compensate that by putting correct details as additional metadata. However, we would like to hear how you manage those cases.

Best regards
Ken

Trying to go through concrete examples, inconsistencies between the raw data and the experimental condition is certainly something we are exposed to. In terms of the raw data, the most frequent examples we are dealing with are:

  • high-content screening data missing the metadata expressing the HCS layout (i.e. the organization of pixel data into Screen/Plate/Wells/Field of Viewers)
  • multi-dimensional images with inconsistent relationships (swapped dimensions, separated timepoints from a time-lapse experiment, …)

The general IDR policy is to try and publish imaging data in a representation that matches the publication (and hence the acquisition) as closely as possible.

In terms of internal process, this is one of the goals of the validation phase i.e. before we can plan the publication of the data. Resolving inconsistencies usually involves some communication with the submitters first to make sure all the files are valid and have been uploaded.
When the metadata is truly missing, we use a few technical solutions allowing to increase the value without modifying the raw data e.g. using pattern files or companion OME files to combine raw files into a rich representation consistent with the acquisition.

Best,
Sebastien