Which approach for image classification with two possible outputs (positive/negative)?

Hello everybody,

I am new to this forum and hope that I am doing this correctly.

In about a month I will be starting my masters thesis where I’ll be working in a biotechnology lab.
One task I was given is to classify images from cells taken from a microscope: do these cells contain so-called “inclusion bodies” (cells containing white circular shapes which should look somewhat like in the picture, see link down below ) or do they not. If a picture contains inclusion bodies, the image should be classified as “positive” and vice versa. For a start, it doesn’t have to quantify the amount of positive cells yet.

Inclusion bodies are basically aggregated (not fully folded/unfolded) proteins that can form under bacterial overexpression of genes/stress conditions. Theoretically they can be stained with reporter gene proteins so that under the fluorescence microscope they appear colored (e.g. green). However, we will first try to classify without staining, so that the image will appear more or less colorless.

According to my supervisor, choosing Python as a programming language would be a good start (main reason is that my colleagues will all be Python-users). Besides of that I got familiar with Python recently and would really enjoy using it. However, I have never done such a thing and would have to start from zero.

Getting back to my question(s): what would be the best approach to develop such a classifier? Is Python even a good approach? Do Python modules such as OpenCV do the job or would they only complicate the process? Are machine learning methods suitable for such a task (unfortunately I don’t know the exact amount of data available)? Is there open-source programs that easily accomplish such tasks or is the programming-approach a better option? The classification should be reproducible, since cells do not always look the same, let alone when they are of different genus/type. However, industry standards do not apply. We just need a high-throughput method to automate classification.

I would appreciate some suggestions, since you people are experienced with image analysis. If you aren’t fully sure it would also be helpful if you could throw out some ideas (e.g. approaches such as “Mexican Hat Algorithm”).

Thank you and kind regards,

Inclusion bodies example: https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41598-018-24070-2/MediaObjects/41598_2018_24070_Fig2_HTML.jpg?as=webp

I suspect a pixel classifier (Weka, Ilastik) would be quicker, but in general anything that highlights light surrounded by dark, possibly with a size threshold, would probably work.

One thing that makes me nervous about any such analysis is that all of the fixed variables can be heavily influenced by how in focus the image is. If you have samples like that taken by three different people, depending on your training set or regularization, you might want three different classifiers. I see scientists arguing about how to take phase contrast or other general colorless brightfield images all the time =/ Some focus planes look more like “cells in focus” while others emphasize the borders more strongly.

If you want something more robust, I would go for the fluorescence staining that you mentioned. It’s probably doable as is, but definitely spend more time doublechecking your images as time passes and potentially other people do the imaging.

1 Like

Python is a good choice indeed for its richness of libraries for image processing.
KNIME has also a number of classification algorithms that are easy to try out of the box and many example workflow are provided.
Opencv has cascade classifiers… that could probably be used eventhough now deep learning approaches are more popular

If I were you I would start with Ilastik, or YAPIC which uses Ilastik with some deep learning. Those are pixel classifier so the output is a segmentation mask,which is probably what you want if you have a picture with several cells from the 2 caregories