GitHub-like site for image data

Hi all,

I’m wondering why there seems to be no website like GitHub but for microscopy image data instead of software repos?
My background is in software; I’ve been helping out some bio labs with image analysis and I was surprised to not find a “internet’s front page for cellular biologist”. I know OMERO / IDR (https://idr.openmicroscopy.org) exists but it feels more like Library of Congress than GitHub - it has far more emphasis on curation and less on workflows.

Did I miss some other web place as a newcomer into the field?

2 Likes

Hi @gkk,

Welcome to the forum!

Four examples are mentioned here:

it has far more emphasis on curation and less on workflows.

I’m not sure what you mean here. What is a “workflow” of images? Data should be curated and should not change. Right? Please do clarify / correct me if you have something else in mind.

John

2 Likes

I think it is due to the fact that storing images are expensive as compared to storing source codes.

Public databases like IDR and SSBD:database (http://ssbd.qbic.riken.jp/) require much supports in curation, organisation and funding to maintain. There is now BioImageArchive (https://www.ebi.ac.uk/bioimage-archive/) and SSBD:repository (http://ssbd.qbic.riken.jp/repository/) that allow you to archive your images there. I believe that they are finding it not an easy task to get research funding to support their activities. The other problem is to upload those images in 10s or 100s of GB/TB to their services. Many of their images were sent to them in phyiscal hard disks.

For Github type of service to allow users to upload images and to support image processing and other computational activities (so called workflow) means that it would either require more bandwidth for the Github image database services to allow images to flow back and forth or it would require more cloud based computational resources for processing those images near those online storages to be viable. Hence you are not finding those services available yet.

Both SSBD and IDR allows you to process their image data online through their APIs, but not many users utilise those services.


As OMERO/IDR and others are moving towards S3 type object storages of images, maybe a Github type of services will be more viable in the near future.

2 Likes

Hi John! Thanks for pointers!

I agree with Jean in the other thread

In my view, Figshare and Zenodo are pretty much dumps

None of these would work if you wanted to upload, e.g., a confocal stack.
When I mentioned workflows, I meant the lab’s actual life: you do an experiment and want to show your colleagues the results long before you’re ready to publish. Especially in COVID19, there’s a need to do that online, and state of the art seems to be sharing multi-gigabyte zip archives on GDrive or Dropbox without any preview function. And I started to wonder if I missed some neat tool people use.

1 Like

Depends on what you mean by “work”. It would totally “work” in the sense that the data would be there and available. The downside of a “dump” is that it does not force the uploader to curate the data. If important context is elsewhere, then it seems a reasonable choice to me.

I used figshare recently to “dump” sample data for a tutorial recently. The information needed to use it is in webpages and code, which figshare links to.

John

Hi @gkk,

Thanks for kicking off this thread!

My opinion is that one of the top reasons this doesn’t exist is the lack of a common way to structure the data. You can read some of the background under:

That’s a great quote. I’ll take that as a compliment! :wink:

Agreed. And especially moving them around, but with object storage, I’m beginning to have hopes. (See below)

I think this is also a part of the formatting issue. One other location that I need to add to the Where to publish a new dataset? thread is https://open.quiltdata.com/. I know @heeler et al at Allen have had success there.

What we’re working toward, though, @gkk, is the ability to put your images anywhere and then have them accessible. See:

My hope then is that the BioImage Archive (mentioned in the other thread) would provide this type of remote access, at scale.

Happy to go into more detail if you’re interested.
~J

2 Likes