Emergence of a standardized, common file format for cell and single-particle tracking data

This is a discussion thread that started from a tweet, turned into a post here at the suggestion of Jason Swedlow. Pinging @joshmoore @s.besson @assafzar @jrswedlow and friends.


Hello science twips, in particular people interested in #tracking in Life-Sciences. Have you noticed this preprint?

Community Standards for Open Cell Migration Data https://www.biorxiv.org/content/10.1101/803064v1

Among other things, they propose a standard file format to store tracks. Even complex tracks like cell lineages with cell division, gaps in detection, complex points etc.
Check this repo:

They have a working implementation, and several import filters for files produced by some existing softwares. There is even one for #trackmate (I took this image from the TrackMate example folder)
EH-N19hX0AIgZKI

And the file format is imho well designed. There are tables for locations, track segments (called links) and tracks (collection of links). This will make possible harnessing the very different outputs of SPT and CT softwares.

So I am wondering should we put an export filter for this file format in #trackmate and #mamut . Should we make it for #mastodon as well? So quick questions

  • Can we store extra object information in the shape of scalar real numbers? Like object mean intensity, estimated radius, etc.?
  • Can we store 2D object contours? How?
  • The same for 3D?
  • Can we store settings information? By this I mean info that generated the tracks? link to the image file, parameters used for detection etc?
  • What about performance in the main implementation? Do we have experience with large datasets, with e.g. 10^6 cells?

Because then it with this and interoperability it would be plain awesome.

7 Likes

replying here just to mention that it would be awsome to have the additional features that JY mentions (and I am totally biased, coz I asked him if that is possible in Mastodon earlier).

Our application involves tracking cells temporally and spatially and objects within cells (and getting shape information, intensity information etc out),
so I’ll be watching this space very carefully :slight_smile:

1 Like

It does seem that it’s once you get into all this additional data that the real benefit of a common format occurs. Correct me if I’m wrong, but tho mentioned in the preprint as a goal none of this is laid out in the spec is it? At least I didnt see it in my quick skim. Would be great tho.

I think that this information can be accessed via the object ID. However, I was involved in the controlled vocabulary and minimal reporting requirements parts so this is not certain at all.

Perhaps, Simone Leo, Philippe Roudot, @markkitt, Paola Masuzzo, @joshmoore, @s.besson, Alejandra Gonzalez-Beltran, Gwendolien Sergeant that I recall took part in the definition and implementation of biotracks could provide more detailed and reliable information. I’ll try get their attention to your post :slight_smile:

1 Like

Thanks for ping, Assaf.

We worked on the specification at a CMSO meetup after a migration Gordon conference in Galveston, Texas in 2017. Philippe Roudot and I specifically contributed from our background with u-track from the Danuser and Jaqaman labs. I also have a background in cell lineage tracing from my PhD with Gürol Süel using software from Michael Elowitz.

We wanted to accommodate many kinds of tracks and ensured support for splitting and merging behaviors as well as gap closing. This was meant to cover both single particle tracking and cell lineage tracing.

The objects table is meant to be general but annotatable through the object_id via additional tables. The idea is that a general software package that understands the common specification should be able to do some basic plotting and analysis without necessarily being aware of the full context such as if the object is a cell or a molecule or specific properties.

We expected that distinct communities would want additional information and encourage those communities to define standards as a superset of this bio tracks standard primarily by adding tables that could be saved in a CSV file. Our focus here was to produce a minimal but extensible specification.

This is mainly intended as an interchange format, but we anticipated that it would also have some impact on internal formats. The table design has some relational database influence.

The specification uses json and CSV and is a specialization of the Frictionless Data Tabular Data Package:

The specification itself is here:
http://cmso.science/Tracks/v0.1/

The objects and links table MUST be included. A tracks table MAY be included.

4 Likes

Does anyone want to discuss CMSO or biotracks on the side of the ASCB-EMBO 2019 meeting in DC?