QuPath ourput to Database?

Hey Everyone,

Just reaching out to see if there is support for or thoughts on a database for backing QuPath measurements.

-J

Thoughts yes, support (currently) no :slight_smile:

It’s already possible to read (some, RGB) images hosted by OMERO, and @melvingelbard is working on making this support much better – including 2-way exchange of annotations (at least compatible ones; not all annotations QuPath can make have a natural OMERO counterpart).

There are similar integrations with Google Cloud API and SlideScore via custom extensions (that we didn’t write & don’t maintain, so can’t comment on much).

However, if you mean storing all the objects related to an image in a database (e.g. a million cells & all their measurements) that’s not something we can do yet. Mostly because it isn’t something we’ve needed, but we always have one eye on the things we think we might need in the future.

For example, we have developed methods to easily convert QuPath objects to/from GeoJSON – which is widely supported in databases – as a step towards storing data in places other than local files written with Java serialization. Similarly, projects are implemented as an interface so that in the future they might potentially sit on top of a database rather than the local filesystem.

So I’d be interested to understand better what you have in mind.

1 Like

I think that’s a great summary.

We are an OMERO shop but would prefer to save the annotations and ‘study’ (study name, owner, share-status, images, …) metadata in a separate DB. The use case / goal being collaborative method development and discoverability from anyone later on. Another good use of DB connected instance could be a job queue that is monitored by a ‘headless’ instance (started on server and just runs the job).

GeoJSON is a good choice, I’ve thought about that for this use case. A good level of supportability can be found with Postgres (with PostGIS) and another DB I’m looking into, CouchDB. Couch is a pure JSON store which talks over HTTP API (so no DB drivers needed). You can create ‘views’ which could accept the QuPath viewport and return just the GeoJSON coords that can be seen for instance. I’d envision when QuPath loads up a new tile the Coords associated could be recalled from DB and displayed leaflet.js style.

I wanted to try creating regions/classes in the image space and then running them through the Broad’s SKOPY tool. It’s essentially all the CP measurement types in one app and runs headless with the only input being the mask and raw channels. We are wiring this part of the workflow into a Pub/Sub system (to accept the throughput of streaming data) which is scooped into a DB automatically.

1 Like

Sounds interesting!

Two thoughts that may or may not be relevant:

  • Generally, QuPath expects all objects relating to an image to be held in RAM. To make that feasible with easily a million objects running on a standard laptop, objects are pretty well stripped-down to their essentials. So for interactive analysis, the ability to request objects for the viewport won’t help much with our current design – we’d need to rethink things quite thoroughly to benefit from that. In the near future, it’ll be much easier if we can just request all the data from the database.
  • The actual display of the objects in the viewer is all generated dynamically based on tiles and an in-memory spatial cache. If QuPath is only providing the viewer, we could potentially create a custom overlay that uses the database rather than its usual spatial cache. The caveat I can think of is that, if you want to view all objects at a low magnification, you might need to request them anyway and so don’t gain much.

Where do you see QuPath fitting in?

Sounds good – I don’t want to let ‘perfection’ get in the way of progress.

Big wins would be to store the studies/images/method details + metadata stored centrally.

Re:

  • holding the objects in Ram - sure as long as there is an understood source of truth that is kept current no worries where the data is replicated to,
  • viewports and object presentation - not sure what the performance hit is per # objects. Most of the implementations I’ve seen need to show progressively less detail when more objects are on screen. Not sure best method to do this (or even need) but keeping a happy / fast system by showing less detail when zoomed out could be an option. Certainly won’t go to waste if it’s part of the initial architecture.