Exporting annotations - discussion about format

Ok, thanks for the explanation. Of course I see your point that you don’t want to increase the maintenance effort on the developer side. But from my perspective I would not use qupath for pure annotation tasks then:

  • Just from experience, involving scripting (even if it’s just supposed to be drag and drop) will complicate the usage for many potential users.
  • This creates different non-standard ways how people do this, all with their own corner cases.
  • Going via scripts shifts the maintenance effort to the provider of the scripts, and even worse decouples it from changes in the qupath API. Which means we might need to exactly specify which qupath version to use for which script.

Also, I see that creating ground-truth annotations is not the primary purpose of qupath, so this is fine. But I think if you want it to be a use-case, not relying on scripting to get out the labels would increase its usefulness for this task a lot.

1 Like

@constantinpape fair enough, as you say, ground truth annotation isn’t the goal of QuPath – it’s just one of the things it can be used for.

I’d be interested to know what other software you find more suitable that meets all your requirements. If it offers a standard export format that works for large images, becomes widely used, and resolves the issues I referred to, then it might be worthwhile adding built-in support for that format to QuPath as well :slight_smile:

2 Likes

Unfortunately I don’t know any such software yet, that’s why I wanted to explore using QuPath.
In the past, I have mostly used paintera, but it’s designed for a very different use-case (3d annotations in Electron Microscopy).

However, I have started to work more on high-throughput 2d imaging data recently and I am still exploring which tools to use for annotation tasks. Right now, I mostly use my own quickly hacked together tools based on napari, but that obviously has the drawback that shipping this to users is not so easy + it’s not well documented and I also don’t want to be in the business of maintaining it.

In terms of data formats I also don’t have that much experience for large 2d image data. But coming from the volumetric world, the preferred solution would be a format specification based on top of hdf5, or even better zarr (see for example this discussion). For our recent project we cooked something hdf5 based up quickly ourselves due to time constraints (see here), but it would of course be much better to have a common data model.
I will think about this more and see if I find anything matching this.

1 Like

I am a bit familiar with some of the Zarr work through CZI EOSS and looking forward to using it more. However, the file format is just one of the considerations… more fundamental is the question of what precisely should be represented in the ground truth.

For the kind described here, I can think of three main representations:

  • vector-based (e.g. GeoJSON)
  • labelled image (limited to 1 class per pixel)
  • multichannel binary (potentially multiple classes per pixel)

Each of these includes a multiplicity of design decisions. For anything large and 2D, a pyramidal image is also probably needed - but that raises questions of interpolation when downsampling that are much more important for labelled images than ‘raw’ image data. But very often it is not necessary to represent labelled data for the entire image or at full resolution anyway… and usually one would rather export cropped and/or downsampled regions. For these it is very often preferable to have a ‘simple’ format (e.g. PNG, TIFF) to maximize compatibility with potential downstream tools – but one still needs flexibility.

Over the past few years, I have seen and responded to many posts requesting how to export annotations in QuPath. The vast majority seem to be written in the assumption that there is one ‘natural’ way to do the export. And yet upon further investigation, almost everyone asking the question has a different (and incompatible) view over what that ‘natural’ method of export is :slight_smile:

Maybe the community would come up with a way of representing annotating image data that meets (almost) everyone’s needs with few required parameters for customization – however I’m not aware of anything that does this yet.

For now, I don’t think it’s possible any software to handle even all of the most common use cases at the moment, or at least not to do so with a user-friendly GUI. The number of options that it would need to support are sufficiently large that it would be far more awkward and error-prone to export via such a GUI than to use a simple script.

Which brings us back to why QuPath uses scripts for now… but I stress not the ones posted above in this thread (which were needed in the past), but rather the simpler ones here – which are part of the ‘official’ documentation, and can be easily customized and batch-processed.

1 Like

By all means, don’t use QuPath if you don’t want to – but I don’t yet see any one form of export that is commonly-accepted enough to be worthwhile implementing in the software as a single GUI command. That sounds like a maintenance headache on the QuPath side, and potentially encourages people to start storing annotations in a way that becomes a pain for everyone in the future.

Without a wider discussion, I think it would be wrong to unilaterally implement one single ‘easy’ method of annotation export within QuPath – since this would probably just end up adding to the mess.

I would however be very happy for QuPath to be part of a community-wide standardization of annotations, if it becomes widely adopted as an annotation tool.

1 Like

Yes, I agree for specifying a label format these are all important considerations and this certainly needs to be discussed with multiple stakeholders and takes time to flesh out.
Just to add one more thing to the option of using zarr: there is a proposal for a multi-scale representation in zarr that I think is very relevant in this context:


Also, using a format that chunks and compresses image data (like zarr or hdf5) usually removes the need to export crops, because it does not incur any large overhead to export the whole image and then just load the bounding boxes of interest later.

Maybe my formulation in the beginning was a bit harsh; it’s just that we have a project with maybe up to a couple dozen annotators in mind; and for this I would really want a tool where the annotation task is absolutely straight forward and integrated in the GUI - all else puts a huge burden on whoever needs to support so many people.
And from my experience, everything involving scripting is just more opaque for a lot of users.

Otherwise, I heard good feedback from someone using QuPath for 2d annotations and it’s certainly a good option for a smaller annotation effort. But in my opinion, it will probably not scale well to larger efforts given the current set-up.

Yes, but my point is that in my experience working with large 2D images for some years now, for very many applications this is very often not required or desired – and reduces the downstream usefulness of the annotations to tools that support these formats.

The situation may change as this work continues, but as you mention: the zarr work is a proposal at this time. And as I mention, file format is just one consideration.

With the scripts QuPath support, you can customize the export in many ways… one of which being to use a pyramidal OME-TIFF if you really want. That can easily be replaced with other formats. But you can also switch between labelled and multichannel binary images, and a whole array of other things.

Exactly! You have a project where you would like some kind of easy GUI-based export of annotations… but I’m not at all convinced that a) you have a clear definition of how this export should look, b) it would be as useful as you believe, c) it is worth the effort to develop in QuPath at this stage, d) you’ve actually used QuPath yourself before opining on what it needs to do to be acceptable to you.

It’s entirely possible that your annotators might not even need to export - just zip up their project files and return them to you. And in the future OMERO integration might even be able to avoid that step… otherwise, your annotators might also not mind having to run a script as much as you think, if you send them the script and clear instructions.

But in any case, the ability to write scripts and extensions in QuPath exists for exactly this purpose: when a small number of people want to do something that is so specialized that it really isn’t worth implementing in GUI of the main application.

Note that I don’t mean ‘other people won’t want to export annotations’ – some certainly will – but they won’t want to export in exactly the same way. I can say this with some confidence because the software has been downloaded > 90,000 times and I’ve read about 1,000 user discussions about it online, in addition to many emails. I know users and diverse and want different things!

And I also know about the ‘huge burden on whoever needs to support so many people’ you mention :slight_smile: QuPath has recently doubled to have two developers, but the other one is nicer and so wouldn’t reply like this :wink:

A final thought: some years ago, I had no real clue what went in to developing and maintaining an open source project. Now that I’ve been doing it for a while, I’ve come to see the need for scripting as a kind of helpful natural selection: if someone won’t spend 5 minutes learning how little effort it takes to do run a script, then they haven’t really earned the right to benefit from many thousands of hours of work that go into writing the software that runs that script.

Don’t get me wrong: I think user-friendliness is extremely important. If you ever do use QuPath, you might see that in some places. But that doesn’t mean spoon-feeding users everything; there need to be tradeoffs. By discouraging demanding users who won’t put in even the tiniest bit of well-documented effort to solve their own problem, it means more time is available to work on the projects and support users who are more engaged… so in that way it’s not a bad thing at all. In fact, it’s actively helpful for the project!

PS. I write all this not to be argumentative, but because your posts have been a bit provocative and I find it quite fun to think about these things. I don’t mean these words very seriously. We might very well add a GUI-based annotation export if we can think of a way to do it that would be worthwhile, or help someone write an extension if they want to :slight_smile:

2 Likes

Adding something perhaps slightly less valuable than 2 cents: Though I don’t doubt in the least Pete’s experiences with a plethora of different user requests, I’d certainly be interested in at least trying for a early version common spec and see if we can get some consensus. Worst case, there are a few more scripts which can work toward that format. Best case, the fact that there is some common agreement regardless of how preliminary starts to reduce the variety in user expectations.

4 Likes

Just a few clarifications from my side.

a) Yes, for my project I know exactly what this should look like: an image representation of multi-label annotations (full image), let’s say as 16 bit tiff. This would cover all my use cases and I think also the majority of use cases for people who need training data for neural networks.
I see the point though that users with other use-cases in mind might need a different export option so demanding other options to be added to the GUI. That’s what I think the discussion of a more general label format is interesting.
b) That depends on the perspective: useful for me and the applications I have in mind: definitely.
useful for the general QuPath user base: I cannot judge this.
c) Sure, that’s definitely something I cannot judge either.
d) Again, I am exploring which tool to use for a project with several users who would also be new to the tool. And not having a GUI function for the basic functionality I would need for this is a pretty strong criterion; especially because most other tools I know do support this, e.g. ilastik, LabKit (but to be fair have other limitations which don’t make it very suitable for the project at hand).

Re scripting: I am not saying it’s a bad thing, quite the opposite. But it’s not something I would expect of people to do the first time they use a tool (which is just the setting we have, I cannot do anything about it and I would need to limit the effort for both annotators and me (or someone else) coordinating annotations as low as possible).

Ok, I am trying to formulate this nicely, but on this point I really disagree. Yes, you put a lot of effort into developing the software, I don’t want to diminish this. And I also don’t question that you are the best person to judge which functionality is worth implementing in QuPath overall.
But I also know what I need for my use case and what would make a tool a good fit or not.
I think QuPath checks a lot of these boxes, but in the case at hand relying on scripting for label export is a malus and is not about being lazy.

Sure, no hard feelings. In any case, I think we know both positions now well enough, so from my side we don’t need to discuss this here much further ;).
If you do want to implement a GUI for this at some point or would like to continue the discussion on a more general label format elsewhere, I am happy to give feedback.

@joshmoore sounds good to me! Perhaps with a connection to https://bioimage.io

I presume you mean of the full-resolution data, although it’s not totally clear to me if you mean a labelled image or a multichannel binary image.

For the majority of cases I’ve seen, and for the projects where I’ve been involved, the training data is desired in a ‘standard’ file format because there is no need whatsoever for a full-resolution representation of the original image. Much easier to work with manageable TIFFs, PNGs or even JPEGs. Sometimes boundaries need to be exported separately, with a variable line thickness. Sometimes I want to enforce a standard output size (e.g. 512x512 pixels), sometimes I want to adapt the size to the annotation size.

For vector-based export, sometimes the coordinates are requested in pixels and sometimes in microns, sometimes the origin is different (top left, bottom left, even the image center).

Not totally sure you’ve understood what’s possible. You previously wrote:

If you’re post-processing with a script anyway, why not have a QuPath project as that format? It sounds like there is no need whatsoever for your annotators to do the original export. I cannot understand why this is an inflexible requirement on your side.

All you need is that, at some point in your pipeline, you can find someone who can bare to look at 10 lines of code and press “Run”… or even have someone write an extension or startup script to do exactly the same thing but by clicking a button instead.

For me, this is a far better strategy anyway and the one I regularly use, since that means whoever is working on the neural network side can customize the export however they want. Also, QuPath’s data files may very well be much smaller than the exported labelled images.

From mine too :slight_smile:

1 Like

BTW I was curious, so I checked https://imagej.net/Labkit

  • Labkit’s file format for Labelings is *.labeling. It works greate for very large files with very few labels. (This file format is likely to be improved and changed in the future.)
  • The labeling can be saved and opened as *.tif as well. (This is a good option for not to big images. And can be used by any other tool.)

Just to clarify that one point: Yes, it’s likely that there will be some post-processing of the annotations necessary. But it would be much more flexible to do this after the export to some common image format. (again, let’s say tiff and at full resolution so that the content is not ambiguous):

  • It can be run centrally, no need to distribute it to users; which means changes can be made easily without the need to distribute another version of the script, e.g. for catching corner cases or if we see that we actually need some different kind of post-processing.
  • Could be written in any language and is not restricted to groovy; from my side, I can write a script to do the post-processing in python with skimage in seconds, I have no idea how long it would take me in groovy and it’s image processing toolkits that I am not familiar with at all.

The .labeling is just a json based file that stores coordinates; I found working with this quite convenient.

In the scenario I describe, the export is run centrally. The users don’t have to do it. They zip up their projects and return them to you.

In this case they don’t run a script – in fact, they don’t even have to not run a script, because they don’t export at all. They do even less than pressing a button.

Advantages:

  • Even easier to make changes centrally - including after the event, since all original annotations are preserved in context. You do the export for them… using (possibly adapted) scripts that I’ve already written for you, calling helper classes that I’ve also written for you :slight_smile:
  • Far lower risk of losing information.
  • Avoids issues such as using a different mapping from labels to classifications (e.g. because one image includes classes that are not otherwise represented – or perhaps later in the project one discovers that new classifications should be added, or some classifications may be merged)
  • Allows users to include extra information, e.g. descriptions of particular annotations in the ‘Name’ or ‘Description’ fields
  • Smaller files to send.
  • Certainly not more - and potentially even less - work than having users exporting to a common image format then post-processing that image format.

I presume you must already have the original images, so they wouldn’t be included.

QuPath projects store a reference to the original image, but has a wizard to quickly update the paths in the project to wherever the corresponding images are stored on your computer (including with an automatic directory search if needed). If your images are in OMERO, you might not even need to do that.

Still the case. No reason whatsoever to do your processing in the Groovy script. Templates are all online. Run the one you want. Adapt it if you need to.

Even I would most likely do any substantial postprocessing in Python. Not worth the effort to learn to do it in Groovy. Groovy is only for the export, and only about 5-15 lines of code (depending upon how much you want to customize).

Fine, but it would appear not supported elsewhere – and even unlikely to be supported indefinitely in Labkit. Therefore: either a maintenance burden for the developer, or an instability problem for the user.

When you write ‘especially because most other tools I know do support this, e.g. ilastik, LabKit’ it is useful to be clear that they are not using standardized formats; anyone who uses the annotations downstream has to adapt to the formats chosen by these software applications.

Which revives the importance and interestingness of seeing to what extent this can be standardized, à la @joshmoore

1 Like

Ok, I understand what you are proposing now. That indeed makes a lot of sense.
(I still think that having an easy GUI for this would be quite valuable eventually, because there would be no need to use QuPath on the developer side at all, but for the project I have in mind that would not be a big issue.)

Fully agree.

I agree once the standardization work has been done, and we have a really clear definition of how export should be performed with a minimal set of parameters to support a wide range of use cases and downstream tools.

Until then… scripts! For QuPath anyway.

Sounds good!
Until then, I might come up with a few questions on how to set up a dev env with QuPath, but I will have a look at the documentation first.

I think this topic is super important! And in fact I would agree with both @constantinpape and @petebankhead arguments:

  • Having a intuitive (non-scripting) GUI based label export option would be extremely valuable, especially in the context of routine facility workflows.

  • As every downstream method might have different requirements the burden of standardisation, implementation, and maintenance of a general label export should not rest with the main QuPath developers.

So maybe having a dedicated plugin/extension that implements a specific export format is a good compromise satisfying both requirements. This way, it could ideally be tailored to the downstream task by their respective developers and yet still be installed easily via simple distribution of a jar file. Don’t know though how easy it would be to create an label export extension in QuPath for dense label mask output e.g. for stardist (@oburri and @romainGuiet maybe know more).

3 Likes

Hi @thread!

My opinion on this would be only based on our (@oburri and I ) little experience with small annotation campaigns (10-30 images) of 2D dataset for subsequent training of StarDist. For these projects, our users took care of the drawing of the annotations, and we took care of exporting the labels and training StarDist model.

We found QuPath convenient for the task because, our users were already familiar with the software and its interface as well as the use of script for QuPath.

Furthermore, with the QuPath project architecture it is simple to have the dataset and the annotations all together. We realized that getting annotations is a simple task BUT getting GOOD annotations is often an iterative process. Indeed, we often have to ask our users to refine their annotations to make them more “consistent”. Again, using QuPath this have been easy enough so our users were not discouraged.(and having pre-existing results to prove the increase of accuracy with some good annotations was a motivation boost for them)

About exporting label images, as mentioned before it’s usually a task we take care of. So having a script to do it was not a limiting step so far.

About creating a dedicated extension to do the job, we didn’t feel the need as it was mainly “exploratory” so far (few users). Maybe we need more projects, to secure the whole workflow, but I’m not entirely sure it’s necessary at this stage. (Plus I’ve no experience ,yet, with creating QuPath extension). Maybe one could think to have an ImageJ plugin and call it from QuPath?

Best,

Romain

4 Likes

I like the idea of a vanilla IJ1 plugin. This way it can work in QuPath if people need it, and people already annotating things using ImageJ (I recall @superresolusian using ImageJ for annotating) could use it directly.

Another <2 cent contribution. I stand with @petebankhead regarding the flexibility of scripts in this domain that is in constant movement and of which we are on the far upper branches.

Another idea is to perhaps see if the people hosting this kind of data already (such as https://www.kaggle.com/, or perhaps https://github.com/wkentaro/labelme/ ) are not already working on or considering a standardization effort much more downstream (down-branch?) than us, and we can be happy to be early adopters of a new format rather than rattle our own brains?

6 Likes