This thread is for answers to the question we had during the NEUBIAS Academy@HOME webinar “In defence of the scientific integrity of image data and analysis” on 23rd of April, 2020. If you have more questions, please feel free to post in this thread. Questions are numbered, so do not forget to indicate that number if you want to continue the discussion with a certain topic.
Slides are available here: PDF version of the slide
The video is in Youtube: https://bit.ly/KotaInDefence
I thank moderators of the webinar: Marion Louveaux (@MarionLouveaux), Rocco D’Antuono (@RoccoDAntuono ) & Julien Colmbelli. Some of the questions were answered by these great guys, and are in the text below.
Table of Contents
- “Bioimage Analysis”
- Image Processing and Analysis
- Mistakes and Misconducts
- Publishing Bioimage Analysis Methods
- Avoiding Misconducts
Q1: What part of the “image analysis” field is not transferable to “bioimage analysis”?
A: Almost all parts of image analysis knowledge and techniques in computational sciences are transferable, but they should be in a different context. For both swimming and free-climbing, you use muscle, but you need different contexts of muscle usages and learning sequences. One can be very good at swimming, but that does not mean one can also do free-climbing without learning.
Q2: Can you alter the contrast in the right way? Or you just need to keep the contrast that you obtained from the microscopy? What if you change the contrast in an equal way for all the images?
A: Indeed, the most important aspect is to describe all possible steps that you apply to raw images in Material & Methods (which is very often not done, or not properly). Contrast enhancement becomes critical to justify for images and conditions comparisons. Beyond linear adjustments, you can also apply non-linear adjustments (e.g. gamma) as long as you include the workflow described in the paper.
Q3: What is an acceptable Brightness/contrast enhancement for fluorescent images?
A: As I explained, this depends on the goal of the analysis that you are doing. There is no universal answer for this. In my explanation, any enhancement is acceptable as long as it is described in a reproducible way, and it may become criticized during peer reviewing - then it should be redone in a proper way and re-submitted as an update.
Q4: If you report the ‘beautification’ you do in the methods or somewhere within the paper, is it still acceptable to do, if it does not change the conclusions or data analysis?
A: In my opinion, any beautified image can be submitted as long as that image processing is explicitly stated in a reproducible manner. It can be in the methods section, supplementary text or in a script that is submitted together with the paper.
Q5: If you threshold with the same equation in different images you will obtain different values of threshold for different images as opposed to threshold with the same number the different images, what would be more appropriate?
A: The answer can be various depending on the goal you want to achieve. The most important point is the validation: are you really segmenting the edge of the structure you want to segment? Compare the results quantitatively against the ground-truth. The better the validation result is, the better to take that protocol.
Q6: I understood why doing particle segmentation using intensity and then using that ROI for measuring intensity is “wrong” - but how do you do it “correctly”?
A: Using a single channel image both for the segmentation and the measurement of the intensity of that segmented area can easily yield wrong results, as I explained, so you just need to have a second channel with another marker, to have a channel for segmentation purpose and another independent channel for measuring the quantity of protein.
Q7: For whole slide tissue analysis, most of the time two adjacent tissue sections do not have similar autofluorescence, so threshold on one will not work the same way on the second section. How to overcome this problem during analysis?
A: You probably need to preprocess the images before you stitch them… or there can be several different ways to do that with a better outcomes. I feel that this question is beyond the scope of my talk. Please put this question as an independent one in the forum.
Q8: What is the general guideline when changing the bit-depth of images? Especially when using ImageJ? When is it ok to go from 16 to 8-bit for example and when not? And what one keeps in mind when doing such a conversion?
A: The best guideline is to think well before you determine the procedure, taking into account that the conversion can be done in two ways in ImageJ: with scaling (normalization) of the pixel values, or without scaling (values above 255 all becomes 255, while the values below 255 stay the same value). Whether you take the former or the latter depends on the goal of your workflow. In general, as I said, it’s not the guideline you can follow. You need to study the principle and think scientifically.
Q9: I’m still thinking about the slide on thresholding a 3D PSF. What is the problem, as long as you apply the same treatment, to compare images acquired the same way?
A: PSF is an optical artifact, so measuring its volume does not reflect the actual volume that is supposed to be less than the spatial resolution you have for the measurement. You can measure the volume, but we know that that is not the volume of a certain biological structure.
Q10: What do you think could be a problem specific to using machine learning/ deep learning in bioimage analysis?
A: One clear point is that it can become heavily biased by human recognition.
Q11: Is it reasonable to characterize level 3 as a mistake?
A: I think this depends on each case, but by definition, if this is a mistake, it is not categorized as level 3. (see Table 1 in https://doi.org/10.15252/embj.201570080)
Q12: But who is finding “such mistakes”? Some guys are playing around with papers to find them?
A: The primary responsible person who should find such mistakes is the researcher’s boss and/or coauthors, but the responsibility may extend to colleagues who discuss the methods and results with the researcher and eventually to reviewers.
Q13: What’s your opinion about: being a bioImage Analyst should I be responsible for warranting that the users do not make this kind of mistakes, like writing paper methods for them?
A: You probably do not need to write a method section of a manuscript of a person you taught in a course, and also for the people who you advised individually, but for the analysis that you took apart, I think you do have a responsibility to write the method section and become a coauthor.
Q14: What do you think of the issue that choosing representative images for publication, is often not really the case: the prettiest images are chosen, but might not be necessarily representative. How do you address this? While you do show the quantification of ALL images you took for a study and its conclusions, how do you choose the one or two figures you will include?
A: I think images shown in figures are for giving inspiration and hints and are not scientific results as themselves. Only after some quantitative analysis is done, we can draw scientific conclusions. In this sense, saying an image as “representative” needs careful evaluation by the author that it matches those analysis results. Having said that, it’s also possible just to put any image and freely tell other people about the impression we have with the pattern appearing in the image. We need to draw a clear line between the analysis of images as treatment of multidimensional data, and the impression we have with the image.
Q15: It seems that to avoid misconduct you have to show the original data without image analysis in the supplementary and the analyzed image in the main figure. Do you think this could help to be more transparent in the publication? Obviously, if you are guilty of misconduct, you will find a way to manipulate anyway until someone finds your “mistakes”.
A: You could also promote the raw image in the main figure, side by side comparison of the processing result is more obvious and promotes transparency. In general, the trend is now focusing on sharing the raw data as a best practice, to follow FAIR principles: make the data Findable, Accessible, Interoperable, Reusable. The ultimate goal of being transparent is to provide the community with the data so that anyone can re-process and re-analyze them, both for transparency AND to enable potentially other scientists to extract new results. Complying to FAIR principles is very to soon become (and has already in some journals) a condition to publish.
Q16: On slide 79- code. I agree with submitting the code, and this ensures things are reproducible, but how does this ensure you did things right if you are not required to submit the dataset on which you applied such code? Isn’t there still room for integrity breaches?
A: In that slide, I also added a step to upload the original dataset for evaluating the image analysis method. This can be ZENODO, or data repository where we can be relatively sure that those uploaded data are kept in the future.
Q17: Are there any bioimage repositories, or any kind of bioimage libraries available, so people can add their data and make it more accessible?
A: I recommend ZENODO at the moment, though it’s not specified for image data in life sciences.
Q18: Are people aware of BioStudies (at EMBL-EBI), which archives all data belonging to a study? https://www.ebi.ac.uk/biostudies/
Q to Q: I rather want to ask: does this archive allow storing of huge image data as well?
Q19: I’m new to Bioimage analysis. Where do research groups provide the workflow or MATLAB code used for their bioimage analysis in their publications? I only find generic statements like: “Images were converted to single-tiff images and analyzed with MatLab (The MathWorks).”
A: In general, codes can be submitted to public code repositories such as GitHub. To avoid the disappearance of the file, the package of codes (or a single file) is better to be minted with DOI. I showed howto in slide No. 79. Other than GitHub, several other code repositories are appearing more specified as supplementary materials to publications, but seeing the number of submissions, the activity seems to be still low. Let’s see.
Q20: Many researchers use non-open source image analysis tools that may have ‘proprietary’ (i.e. hidden) algorithms. How do you report those? (this is comparable to a lot of proprietary reagents inside commercial molecular kits). Or how do you advocate for commercial companies to be more transparent about them? I am asking as a member of a histopathology core that performs image analysis also. thanks!
A: I think some software tools that are provided as commercial products tend to hide the source code and cannot really verify the details of computational procedure. This is not really good for evaluating the method. In principle, this way of providing scientific software and adding its value by hiding the code will gradually fade away by time. For now, with those “hidden” tools, what we can do now is to ensure the reproducibility of methods by asking for a complete description of the procedure especially including the version number of the tool that was used. If this is not done and the evaluation is hindered, reviewers should ask for revision to include more information. If the reviewing is hindered by the availability of the tool that is used, the reviewer better asks the publisher to provide that tool to the reviewer for verification.
Q21: How can we integrate this policy of documenting everything that we do with the new methods of image analysis using Machine learning/Deep learning?
A: As long as the methods and results can be reproducible by others based on the description by authors, we can evaluate. I think this is the same with machine learning-based methods. If there is no room with a certain machine learning strategy, that strategy is not really scientific.
Q22: If the data is manipulated then how do these articles pass the ‘review’ stage?
A: In my opinion, reviewing is not perfect, especially because the number of reviewers is not competent these days to cover all technological details.
Q23: Do journals have software to check for image manipulations? They have plagiarism in terms of words…And does something like this exist for reviewers, so that already there, these abnormalities suggesting image manipulation can be detected BEFORE the paper is published?
A: Each publisher has its own internal policy about checking image manipulations. Some do it very seriously, even having a specific position for that purpose, while the others are very sloppy. There seems to be no correlation with the journal impact factor at the moment. Commercial services are appearing that assist these forensic studies.
Q24: If reviewers are not trained in image analysis - which might be the case, wouldn’t it be an idea that just as journals have graphic artists, they should also have image analysts? I just wonder how to overcome the issue that reviewers might miss things due to a lack of widespread knowledge on the specifics of image analysis.
A: I asked several publishers, but in many cases, their budget does not have the capacity to hire professionals in specific technological domains. I think the number of reviewers should be increased from the current 2-3 people to more number of people. A reviewing team can include peer reviewers not only from the biological topic but also from technological topics to cover the required expertise to review that paper. Maybe we are too concerned with the “rapid cycle publishing” more than the quality that needs to be achieved.
Q25: Do you see a possibility for the publishers to have support for image analysis that would both help and secure the correctness of the data and analysis.
A: Editors in publishers are pretty busy and editorial offices are not that well-off - we probably should solve this issue as a community, that includes scientists, publishers, and funders.
Q26: Do you think scientific journals should ‘rethink’ their journal format? Maybe trying to emphasize a lot more on how data were acquired and analyzed? sort of having fewer results but of more quality and reproducibility…
A: Yes indeed.
Q27: Reproducibility may also not save the soul from the devil. There is also an issue about reproducibility in Science, not always related to misconduct. So, what could be an additional strategy to track that?
A: In my opinion, which rather is my general view about how people behave, we cannot avoid some people selling their soul to the devil. Some minor fraction of people do that, and for efficient exchange of information and knowledge, I think it should be taken into account as an unavoidable element in scientific communication.
Q28: Shouldn’t we address the issue at the level of image acquisition? Manufacturers need to be more forthcoming and considerate with regard to today’s topic. I should have been more specific: I meant Manufacturers = microscope/camera/other acquisition device manufacturers
A: Yes, it also matters indeed.
Q29: Would you consider writing a set of guidelines for writing methods sections and terms to use for image analysis?
A: I might have forgotten to clearly state that guidelines are not functional in scientific research. One needs to learn bioimage analysis just like measurement methods in other fields in science e.g. analytical chemistry, physics. If we start writing “The Guide Line for Image Analysis in Life Sciences”, I think it will be like an encyclopedia. Leaning bioimage analysis as a scientific method probably is the most efficient way, rather than searching for do’s and dont’s in that encyclopedia (we might then need a “bioimage analysis lawyer” then).
Q30: Did they run a deconvolution on a sum intensity projection? How does that work?
A: (This question probably came from the slide that I showed an example of “detailed” image analysis method description that is not detailed enough) I do not know because it’s not written. Please maybe ask the authors of the paper and share their answer with us…
Comment1 from the audience: Developers should as well be concerned with the user interface and usability
A: If we refer to the web development world, backend engineers and frontend designers have different roles. Maybe developers do not necessarily have the role of UI designing - but the design can be discussed with bioimage analysts concerning accessibility and usability.
Comment2 from the audience: With ImageJ you can as well document your plot creations through scripts
A: Yes, indeed, but the plotting function does not result in beautiful ones (I would say practically usable to plot and check the distribution of numbers but design-wise not so beautiful), so better use ggplot2 or other libraries in R or Python.