QuPath projects & version control

This is what I do too when training classifiers, and it got me wondering if being able to create subfolders in a QuPath project would make things less messy in the filesystem to separate training and the actual analysis projects; especially if you have to optimise some parameters and/or train multiple classifiers (e.g. estimating stain vectors + pixel classifier for tissue detection + object classifier for cell detection)!

I’ve been wondering the same… which is part of the reason projects in v0.2.0 changed so much from v0.1.2. At some point, a way to checkpoint/version projects would be helpful.

Potentially this might come in the future, but as you say… it is messy when doing it manually, and risks being similarly messy (or dangerous) if QuPath does it for you. So it doesn’t exist yet.

1 Like

It doesn’t have to be done automatically by QuPath, just having a manual option to create subfolders/subprojects in a QuPath project may be a good starting point.

Yes, it would still need to be coded though… I’m imagining a way for QuPath to copy the files/folders recursively, and then to switch between them as required. And a way to present to the user the available checkpoints. And checks to try to avoid disastrously overwriting something.

Sounds relatively straightfoward, but it’s always a lot of work to change anything so core without breaking other stuff… and not just to develop it, but also to support it. It didn’t make it into v0.2.0, but hopefully a future version.

I see, I slightly misunderstood your previous post.

It does seem like a challenge to implement that. Hopefully you can find inspiration from how other programs do this!

1 Like

Do you know of any good examples we should look at?

Not specifically for image analysis programs, but I find the way Microsoft Office 365 handles version control really useful.

Then again, they are a mega-company with immense cloud computing/server powers doing a lot of the muscle work, so this example may not be particularly helpful for QuPath as a standalone program… Maybe others can pitch in with their experience using other software.

Personally, this sketch/mockup is what I wished I could do regarding to having subfolders/subprojects:

2 Likes

Actually… you can duplicate images in a project, and you can add metadata tags. Then you can sort them by the tags. Using that could kind of replicate the design you show already I think…

Documenting this is on our todo list (unless @melvingelbard has already done it, and I just can’t find the link) but most of what you need is accessible by right-clicking on project entries. Basically, it’s Add metadata… and Sort by…. Note that you can do the adding to more than one image at a time.

2 Likes

Oooooooh it does kind of work! However, I could not find an edit metadata option to remove or change them in case I made a typo or just want to clean things up a bit.

Edit: I found Edit metadata but it will not let me remove headers. Polishing this function would be great!

2 Likes

I think it would be pretty nice to have a right click (and maybe scriptable) “Create backup” for a given image that combines “Duplicate image” along with automatically adding several pieces of metadata (such at the date/time, number of times that image has had a duplicate created, etc.). Maybe in some far down the line version, something similar for scripts would be possible (for those not using IntelliJ). Boy has version control there been… fun, especially when cooperatively working on scripts with others remotely!

I am pretty sure Pete brought this up somewhere before (maybe a workshop?), but a quick way of backing up a project is right clicking on the folder and sending it to a Zip file (at least on Windows).
image. That way you not only have the metadata for the file creation date, but can also rename it as part of the process without changing the name of the project folder.
This is slightly more space intensive if you store WSI within the project folder… :slight_smile:

Yup, the zipping is what I usually do.

For extensive versioning, might git equipped with large file storage be a sensible way to go? Seems safer to use something well-established rather than a quick solution coded in QuPath, but I’m not git-savvy enough to know if there are clear flaws with this suggestion.

And I am not sure how much of the user base would be git savvy enough to use something like that versus something built in! Or be aware that it would be a good idea.

Of course, if it were hidden in a right click menu, I am not sure how many users would find it anyway, so maybe that is not the best solution either? Not sure :frowning:

How would this work? Will git’s version control be integrated into QuPath and the user provides the storage space (e.g. local hard drive, cloud storage)?

I don’t think many people are also aware that there is version control in Microsoft Office 365 right now, but that still doesn’t really impact their main use of it. It would probably come down to the consideration if it is worth the time implementing it for the power users and hoping others will find a use for it.

Not the way I’m imagining it currently – it would just be regular git commands entirely outside of QuPath :slight_smile:
But many of the relevant files are text files (e.g. .qpproj, scripts, classifiers) and LTS offers some support for the larger data files.

Might be implementable with JGit but I really don’t know.

Indeed, but there are existing workarounds separate from QuPath (e.g. manually zip folders, work in DropBox/OneDrive/Google Drive, use system backups). Every day spent working on implementing some specific feature in QuPath is a day spent not doing/implementing something else…

1 Like

Of course more things can be implemented with more people working on the software – so please share this far and wide!

2 Likes

I’ve been using this in a proper project and while it works great for organisation, sort by metadata is not available in the Run for project when you have to select the images to include to run a script on a batch of images.

It gets a bit messy when there are images that I added after the initial project setup, especially when there are some images that require rescanning, etc.

Is it possible to implement folder/subfolder sorting in the image selection window for Run for project?

Hi, I just want to check you’ve seen that there is a filter box below the left list, which can be used to quickly identify images that contain specific text in their name (only). Is this enough?

If not, the same filter box could potentially be made more powerful and include tags but I’m not sure exactly how that should look – it would take a bit of thought to figure out how to avoid confusion, especially if a lot of tags are used.

Admittedly I did not notice the filter box, but I just tried using it and it would still pick up any of my training images which I just appended description to the initial image names (e.g. NP10-01_1_H&E.ndpi --> NP10-01_1_H&E.ndpi pixel classifier training).

In the above example, using the search term “_1_H&E.ndpi” to exclude other stains would still include the training images in the filtered list. While this significantly decreases the number of images to scroll through and include in the batch script, some sort of metadata or tag filter would be useful.

Maybe filtering by metadata key:value? e.g. filtering for images with metadata key “stain” and value “H&E” would mean inputting “stain:H&E” in the filter box (or a separate filter box for metadata related searches/filter).

This could be made more powerful by using AND/OR terms to make combinations of search terms for subset/two sets of metadata filtering (e.g. “stain:H&E AND folder:analysis” as a search/filter term).

1 Like