How to avoid image data mishandling and misconducts? A commentary

Dear Colleagues,

Together with Simon F Nørrelykke, we wrote a commentary concerning how to avoid mishandling and misconducts in image handling and analysis:

http://bit.ly/Reproducible-BIAS

We tried to insist that ruling with do’s and don’ts is a bad idea and suggesting that we should focus on the reproducibility of image data handling and analysis.

As the commentary does not have any comment section, please use this thread for any comments and questions!

Kota

19 Likes

Hi Kota & Simon,
I wish FigureJ by Edda Zinck and myself were also suggested as a tool to achieve reproducibility by linking to original data and automatically recording processing steps involved in figure creation.
But this is definitely a very important and nice commentary. It will be most useful for anyone handling and publishing image data, thank you both!

Jerome

5 Likes

This goes in the right direction. The section on workflows should have mentioned workflow specification languages and workflow management systems. There are also repositories for workflows that make the workflows discoverable. Don’t advocate using Zenodo, especially when specialized public repositories exist. I would suggest NEUBIAS looks at other communities such as ELIXIR to see how they tackle the reproducibility issue. It seems to me that the bioimage analyst community is trying hard to reinvent the wheel. I understand that the field’s heavy reliance on proprietary software and manual interventions creates some challenges but there are ways forward. As a community of customers you could make vendors understand your needs and as a community of developers join existing efforts to patch these when they fall short for your needs.

5 Likes

Hi Jerome,

Sorry for not being able to cover all the nice tools like FigureJ that assist pipelining from raw data to the figure. We chose to explain it without any plugin, just by built-in ImageJ macro. In near future, it might be good to compile web page listing tools that assist such pipelining in various platforms.

Kota

2 Likes

Hi @jkh1

thanks for your valuable comments - it would be great if you could link each of those resources so we all can see what we have missed in our article!

Kota

2 Likes
1 Like

Thanks Kota, totally fair :+1:

2 Likes

There’s a lot out there in particular originating from bioinformatics. Here are some links that I hope can get you started:

In terms of workflow specification, EOSC-Life promotes the use of the Common Workflow Language (CWL) but there’s also the Workflow Description Language (WDL).
To share workflows, EOSC-Life is building the WorkflowHub as an evolution of previous efforts (i.e. myExperiment).

I’ve been promoting the use of Galaxy for some image processing tasks (see for example this post). Galaxy also has its interactive environment which allows interactive applications (such as notebooks) to interact with the Galaxy history.

I am aware that existing solutions may not cover all imaging-related use cases but I believe building on existing work should be the way forward instead of starting from scratch.

3 Likes

Hi @Kota

Excellent paper. I’m glad you brought up the issue of “PSF Volume” example. I’ve done a lot of work with PSFs and deconvolution and this is a very important example.

However I had a small problem reproducing it. I get “PSF BW” (Born and Wolf), instead of “PSF RW”. Any idea how I can get it to output Richards and Wolf instead of Born and Wolf? In one sense this isn’t a big deal, as I had some other observations, and they would be valid with both “Richards and Wolf” and “Born and Wolf”. On the other hand it would be nice to be able to repeat your script exactly, otherwise I’d have to make changes to the PSF_Panelling script (because it expects the name to be “PSF RW”).

I looked in the config and could not find any variable defining PSF type.

I also tried rerunning PSF generator explicitly with Richards and Wolf several times, just in case it saves the last run PSF model somewhere.

Below is a screen shot in case that provides any clue…

2 Likes

Hi @jkh1

Thanks a lot for the links. There will be no conclusion in the following answer to you but I will explain my view.

For securing the reproducibility of used methods, we recommended publishing a “workflow package” with a set of:

  1. Computer Codes in GitHub
  2. Sample Image Data in Zenodo
  3. Workflow description (text describing how to combine 1 and 2)

This is probably the easiest way, though already challenging for many, to secure the reproducibility for checking the analysis details. We also mentioned that probably using Docker container is the way to go (but too much technology to learn for general life science researchers).

I think this latter direction towards the use of Docker container for reproducible workflow is also mentioned in the first reference you cited, and this is quite a common denominator with our recommendation.

At the moment, I think the choice of the use of a specific workflow management system that is listed in your link, instead of using Docker, probably is difficult among the bioimage analysis community simply because the immediate advantage over Docker is not really visible.

Concerning the “Workflow language”, this relates specifically to “Box 4. Workflow description” in our article, the workflow diagram that is shown in that box. CWL can be recommended for the creation of this diagram - but for that, it would be useful if it can be compiled simply with a markdown file locally (or like in Jeykll or Hugo). I think the usability will directly affect the popularity. If you can maybe organize the implementation of CWL in Fiji script editor, people in the bioimage analysis community might start using it.

Concerning the use of Galaxy for bioimage analysis, I feel that there are not many practical example workflows there for attracting people to start using it. How about try implementing some of the moderately complex image analysis workflows that you actually were involved in, for example, Neumann et al 2010, or more recent and complex Alladin et al 2020 in Galaxy? I am sure it will become a good demonstration.

2 Likes

Thanks a lot @bnorthan !

This is a placeholder, just that I will try to check the probable mistake I made and information missing in the script. I will come back to you soon.

2 Likes

Oh, yes, I’d be keen on seeing an example of a moderately complex workflow “from raw data to plot” in Galaxy.

So far, what comes closest to the idea of a single shareable workflow (ready to be reproduced by any life science researcher without coding knowledge) is #knime in my opinion. It’s lacking the “easy-to-deploy on distributed computing” part, unfortunately…

4 Likes

Hi @Kota , Hi @simonfn,

Thank you for this Commentary article! I fully recognized myself in the intro:

> As image analysts at two major imaging facilities, we are regularly asked to replicate the typically vague methods in published papers and find this task ranges from straight-forward, over pleasantly challenging, to impossible.

I really like :

  • The simple examples you take to explain basic mistakes (like the classic single channel segment & measure)
  • The idea of having more detailed & fully reproducible workflows,
  • Engaging NEUBIAS members in reviewing the Image Analysis sections of articles.

BUT while I understand that “full reproducibility” can’t be achieved using closed-source software, I feel it is a bit unfair to have categories “Fully (The scientific approach)“ and “Largely (Well intended, could do better)” discriminated because of the use of commercial software. Transferred to the microscopy world it would mean that if you have a brand new commercial microscope it’s not “The scientific approach” either , because others haven’t bought it yet! Moreover many commercial software companies offer free trials. So I feel the question is more about “openness” of implementations and methods rather than commercial or freely available.

And now for something completely different : From a biologist’s perspective, I would add that in many cases scientists manage to come up with a working image analysis workflow but don’t feel confident writing a method to explain it (Like why did you apply a gaussian filter at this step rather than next one). Maybe we can open discussion about the CRAPL license (The CRAPL: An academic-strength open source license).

Finally, a discussion or recommendation about choosing license would be nice (ZENODO ask for it )

Cheers,

Romain

4 Likes

Concerning the use of Galaxy for bioimage analysis, I feel that there are not many practical example workflows there for attracting people to start using it.

It’s still early days but have you looked at our Galaxy training example? We segment and extract features from nuclei then segment and extract features from nucleoli while keeping the parent-child relationship between nucleus and nucleoli. Our real project does this for multiple genome-scale RNAi screens. We’re still analyzing the data but the idea is that the rest will go in a notebook in the Galaxy interactive environment.
We’re also implementing the CellProfiler modules to enable tracking.

For securing the reproducibility of used methods, we recommended publishing a “workflow package”

This sounds similar to Research Object crate (RO-crate) and in particular Workflow RO-crate.

Also note that containers and workflow management systems are not mutually exclusive and the idea is generally not to package a whole workflow into a container. A workflow management system can execute containerized steps but this is also not a requirement. Anyway, at this stage I am more advocating the adoption of a workflow description language as a standard for reporting image processing/analysis workflows. My point was just these things already exist so no need to re-invent them.

3 Likes

Wouldn’t that hinder optimal application of GPU-accelerated image processing?

1 Like

As usual, it depends on the task and the workflow. A workflow management system will manage job dependencies and scheduling as well as resource provisioning for each task. If everything is in one container, you don’t get these benefits. On the other hand, a very linear workflow is not much different from a single job so could be packaged as one although here also resource provisioning could benefit from more granularity.

2 Likes

Hi all, just to get an “educated” feedback. For our “industry” software platform ZEN core we already offer a dedicated module called GxP, which basically ensure that there is a records of “everything” you did that cannot me modified anymore, which is required to sell such SW in many different industries.

Such tools obviously cannot hinder the user to do “silly things”, but at least this is transparent and traceable, which is the first step towards what we all want and need - reproducible image processing pipelines

Do you think such a module would be beneficial also for a software like ZEN blue, which is mainly used in research and imaging facilities etc?

Happy to get your feedback?

4 Likes

a discussion or recommendation about choosing license would be nice

It seems there’s a consensus around using the CC-BY license for research data. OpenAIRE has some pointers on the matter.
EUDAT also has a license selector tool that you can run in your browser.

3 Likes

I think there should be at least several published actual research examples with bioimage analysis (I mean we do not know to which extent Galaxy can perform with the current spec). In addition - here is some idea and don’t know if it’s already there- it would be great if there is a Fiji script editor inside Galaxy so that users only need to define input, output, and the macro or script to run.

I will take a look to gauge how much extra work is required using this scheme.

If the workflow is written in a different workflow, it’s likely that those authors will not reassemble the same steps in a different framework… so I think it’s important to enable packaging the whole. This is, partially, because what we emphasized in the paper is the “codes as reproducible documentation” of methods.

Yes, thank you.

1 Like

Hi @sebi06

Exactly. and this better be in a text format, rather than binary - as it was explained in the article, “code as documentation”. We take it as a form of narrative of methods.