Methodology question: What is the difference in principle or parameters between different DIA software in tissue segment and cell segment?

Dear Pete,

Hello,

Thank you for the excellent work of digital image analysis (DIA) from you and your team.
I am working on a methology comparison of different DIA software. In my opinion, the most important function of DIA software is tissue segment and cell segment (including cell detect and phenotyping). Other function such as spatial analysis or acquirment of mean fluorescence intensity is all based on this. Many kinds of excellent DIA software are based on machine learning of artificial intelligence (AI), so I want to know :
1.What is the main difference between these kinds of DIA software, such as HALO, Inform, FCS Express and Fiji, in tissue segment and cell segment. Is the algorithm? Or the number of parameters selected for the analysis?
2.For Qupath, I read your excellent paper named “QuPath: Open source software for digital pathology image analysis”. In my opinion, the tissue segment is based on watershed strategy. And cell detection is based on random trees classifier. I don’t know if it is right, and how many parameters did you select for analysis?
3.Although different choice of algorithm may get the same results, whether we can say that application of Convolutional neural network (CNN) is the trend of DIA software among all machine learning algorithm, and it may bring us more accuracy results in a shorter time.

I used DIA software to analysis multiplex IF images jsut for one year. And my theoretical knowledge of machine learning is very poor. I think my questions may be primary, I read a lot of papers but didn’t find the answers. So could you please give me some help ? Thank you for your patience.

Hope for your reply.

Mengfei Wang
Dalian Medical University, China.

Hi,

Would you be willing to convert this question to a public message on the forum, rather than a private message to me only? This would help to receive answers from more people who are familiar with more software.

It don’t think it’s possible to give an easy answer to this. I have not used HALO, Inform or FCS Express myself to make a comparison – but the software applications are all quite different.

There is no tissue segmentation or cell segmentation algorithm in Fiji. But Fiji can be used to develop and apply tissue or cell segmentation algorithms in an endless variety of different ways (e.g. through custom plugins or macros).

This is often the case with image analysis: many software applications (including Fiji and QuPath) provide the user with different tools that they can use in different ways to solve problems such as tissue and cell segmentation.

No, QuPath’s build-in cell detection is a custom algorithm that uses a combination of image processing techniques. This includes a watershed transform, but there are many other steps involved.

Cell classification in QuPath can use a random trees classifier, but it can use other classifier types as well.

Tissue segmentation in QuPath can also be done in different ways, primarily using thresholding and/or machine learning, e.g.

I personally wouldn’t say that myself. Certainly CNNs are increasingly popular, and for many difficult applications using a CNN-based approach can give much higher accuracy than any other current method. However, this crucially depends upon the problem to which they are being applied, and the way in which they are used.

It is typically much slower to get results when creating a new model using CNNs. Depending upon how the model is created, results could be much better using CNNs than using other techniques… or much worse. Evaluation is difficult because a method (including CNNs) might be very accurate on one dataset, but completely fail on data from a different source (i.e. it doesn’t generalize).

In my opinion, some applications don’t justify the use of CNNs because they can be solved effectively by other approaches.

1 Like

Dear Pete,

Thank you for your reply.

I am glad to convert this question to a public message on the forum, I tried for this, but didn’t find the way to convert.

I read the link of Qupath webset, then read user manual of kinds of image analysis software. I made a list of the differeces among these kind of software. Would you mind give me some advice on the items I chosen to make the comparision, if I lost some important items or which items are not essential. And could you please check if the description of characteristic of Qupath is right. Thank you for your patince. Hope for your reply.

Mengfei Wang

Thanks @Wang_Mengfei I have converted the message to a public topic now and added keywords.

There are some inaccuracies in the ‘QuPath’ column (the ‘P’ should be a capital letter):

  • File type
    • Bio-Formats supports many more formats than listed.
    • QuPath does not support iSyntax
  • ROI/WSI
    • I’m not sure what you mean with ‘splicing function’
  • Parameters selected to do tissue/cell detection
    • QuPath can use many more / fewer than 12 features
  • Unmixing
    • QuPath can separate up to 3 stains in brightfield with colour deconvolution. For most ‘standard’ IF images, computational unmixing isn’t needed.
  • Spatial analysis
    • May not required scripts – it depends what exactly needs to be done. There are a few options in the Analyze → Spatial analysis menu.

I think comparing software is difficult in general – particularly when the software is quite complex, and can be used in many different ways. Can you say, what is the aim of your paper? What kind of people will read it, and what should they learn?

4 Likes

Thank you for your reply.

It’s true, it’s difficult to do the comparision. On the one hand, the configurations are different among these kinds of software, on the other hand the results of digital image analysis (DIA) is depanded on the users to a great extent.

There is more and more DIA software based on machine learning developed by researchers, and my teacher also asked me, if other software, such as StrataQuest is more powerful than Inform we use now. I also saw the same question on QuPath forum. So I think, this question mentioned me that the people who asked this question didn’t know the basic and special function of different software,and they didn’t know the principle to choose software and analytical thinking of DIA.

Thank you for your questions, which lets me think deeper. And this is the answer:

  1. The aim of my paper:
  • Establish the principle for which DIA software to choose to do personalized DIA. It’s just like what you said former, it is mainly based on the aim of your research. But there is not only one choice.

  • Find the consistency and difference of different software in basic functions, such as cell segment and tissue segment.

  • Find the special function in different software, such as two-dimensional flow cytometry image conversion and tSNE of StrataQuest and FCS Express.

  • Find a more friendly way to do the same analysis based on the combination application of two or three kinds of software. Such as phenotyping of Inform and Inform combines with FCS Express.

  1. What kind of people will read it:
  • People who didn’t use DIA software before and the beginner.

  • People who are old user can get the same or better results in a more friendly way, or get more information based on the old data by combination application of different software.

  1. what should they learn:
  • The choice of software is not only one, the aim for your research is the most important thing, based on the aim of the research and the difference of DIA software to choose. Before analysis, predict the worst results based on the features of different software and if there is any possible to make it better.

  • Some more friendly way to do the same analysis, and new DIA methods application.

This is all my thoughts, and I think I am also a beginner for DIA. I think my starting point is prectical, it is based on the question from my tutor and this forum. Thank you for your questions, and hope for your reply.

And I hope more people can give me some advice, which means my queations may be worth discussing.

Thank you.

1 Like

Not really on the topic of QuPath, but…

So much this. And, though usually fast once built, they can also be much slower to build in the first place, especially if you need to generate the ground truth annotations.

Reminds me of one term I have seen thrown around describing most current deep learning methods: “Brittle.” Mostly due to that overfitting. GANs help, but nothing has really worked to fix that brittleness.


In the unmixing tab bright field is spelled bright “filed.”

1 Like

Thank you for your reply. @Research_Associate

“Brittle” is a new concept for me, and I will try if there is any visualized way to show this in my paper.

Sorry to make spelling mistakes, I will be carful next time.

Somewhat of an old article, but this might help explain: https://www.nature.com/articles/d41586-019-03013-5

2 Likes

An interesting article. I think the application of AI in pathology is easier than in other regions, such as face recognition or the recognition of stop sign, expecially when the apllication is limitted in one or several kinds of diseases in the same kind of tissue. It’s like we choose a certain rotaiton angle for stop sign to be analized. But when it comes to the recognition of all kinds of diseases and tissues by one algorithm, such as the recognition of stromal regions in different kinds of tumor, it may get in trouble, too. And it also made me realize that accourding to the technology we have now, there may be no resolution for some miastakes or the improvement of accuracy in some personalized analysis.
Thank you for your share.

I don’t think I agree, although I’m not sure which is more difficult. I think pathology poses many very very difficult problems – arguably much more difficult than most other areas.

The challenge for image analysis / AI in pathology is usually to quantify quite subtle features in huge images. This can require integrating information at multiple scales: for example, if the local context is needed to identify if an area of ‘tumor’ is invasive or not.

Then there is enormous variation in appearance based upon the underlying biology, the sample (including angle at which it was cut, thickness), staining, scanning and image compression. There may be different diseases at different stages. Developing methods capable of finding the necessary (often) subtle features while tolerating the huge differences that aren’t relevant to diagnosis is very hard.

For H&E (as an example), overfitting is a big risk because all the images are basically a kind of pink and purple, and therefore the model may represent that ‘darker means disease’ – which is a rule that could work in the training images, but totally fail elsewhere.

In addition, it’s generally easier to get photos of crowds, cars, cats or whatever else might be used to train a CNN. Obtaining patient samples is more difficult, with the result that there is less variation to train on. Obtaining ground truth is also much more difficult, often requiring expert knowledge, whereas labelling photographs can be crowd-sourced much more easily. Then, to scale up the result to a general-purpose method, it requires careful thought to reduce biases (e.g. not working with too many images from a single source).

Then there is the consideration that some diseases that may be evident could be very rare – but are important to identify. There are mimics that an inexperienced person or AI would misidentify. And the appearance of the tissue may change in new ways in response to new treatments.

Finally, the importance of getting the right result for a medical application can be so much higher.

All of these things make pathology image analysis (and some other kinds of biological and biomedical image analysis) really hard. They are also the reason why most software in the field needs to be very flexible – and is therefore difficult to compare. The software provides core tools to work with the images, but these tools can be used in a multitude of different ways depending upon the application.

2 Likes

Dear Pete,

After reading your reply, I agreed with your opinion. And I still have several questions for overfitting. I found a figure for easily understanding the meaning of underfitting, just right and overfitting.


1.If one model should include 4 parameters, and I used 2 parameters to classify, the training set works well, but when the model used on other set, it works badly. Is it overfitting or under fitting?
2.Can we just define the model is overfitting based on the results of model used on training set and other set?Such as the results on training set is well, and bad on validation set is overfitting, the results on training set and other set are both bad is underfitting.
3.On analysis of images, for example, we choose tumor region to be trained. whether overfitting means additional wrong recognization of tumor on other set? Or overfitting just means inaccuracy of results, maybe include additional wrong regions of tumor, or decrease right regions of tumor?
4. These are my tissue segment results on pancreatic cancer(H&E) by Inform. Including Tumor(red), stromal(green), other(blue). The model works well on training set, but works badly on other set, because of the differences on H&E staining. If this means overfitting?

  1. For overfitting, based on the limitation of DIA software, we can do following steps to get a better results:
  • Improve production and dyeing quality, to reduce the noise, and increase the consistency of images.

  • Increasing the number of images in training set.

  • Decrease the number of parameters. (I am not sure.)
    I think these are the solutions for users to avoid overfitting and get better results. And if there are any other solutions for users or for developer of DIA software?

Hope for your reply.