How to assess cells agglomeration?

I’m looking for a set of features that can help to quantify the agglomeration of cells in a microscopic images. I’m using scientific python pipeline (numpy, scipy, opencv, skimage, etc…) for image analysis.
I’ve supposed that this aggregation-like feature should be based on:

  • spatial distribution
  • distance/“closeness” among cells
  • size of cells

The uploaded image shows some images of microscopic slide, each one have an associated qualitative measure of agglomeration (based on personal intuition).
I’m not a biologist and I haven’t any background in it, just an undergraduated computer science student, writing his thesis and searching for literature or any other kind of suggestion. I found this paper a good staring point for now

Neat! Welcome!

In CellProfiler you can find snippets of code in the MeasureObjectNeighbors module that captures many elements of cell organization like what you’re looking for. Take a look at its code and help sections, and let us know if you have any detailed questions!

1 Like

I agree that some sort of distance transform would be great if you can segment the objects… though it looks like not all of those pictures might be of the same things? Or at least the staining definitely seems to vary. Segmentation might be difficult, especially for the Very High image. That looks messy.

For clarity, how would you rate a sample with 10 small cells all touching at the center of the image? Versus three large cells touching at each of the four corners? Or 5 large cells touching in the center?

How relatively important are your three features? Is the size of cell metric simply to prevent 5 small cells touching from having a higher aggregation metric than 5 large cells?

Will all of your images be taken with the same visible area (um^2)?

1 Like

I am sure there are some kind of global measure that can a good indicator of the spatial distribution without trying to actually segment the cells. Some kind of “entropy” measure.

Maybe simply the variance of the pixel intensity over the full image could already give some difference between those images.

As a comparison, I often use a similar method to evaluate the degree of blur in images, by computing the variance of an edge map like sobel. The higher the variance, the more diversity ie the more edge and sharpness.

I would also try the various image feature in KNIME, especially the one related to texture. There is even one called entropy actually :stuck_out_tongue:

Good luck!

1 Like

Thanks all for the enlightening answers! I apologize for the late answer but I took some time to made some experiments before to come back.

Thanks to @Research_Associate, the questions you posed have been very helpful to clarify my thoughts: I understood that the feature I’m trying to measure could be assessed from multiple point of view and some visual features have not so much relevance in it.

Thanks to @AnneCarpenter for suggestesting the cell profiler module, I drawn inspiration from there for my “agglomeration measure”.

Thanks to @LThomas for the entropy measure. Moreover your edge-detection approach seems promising but need further investigation from my own and for deadline reasons I can’t.

By the way, the steps to compute the agglomeration feature are:

  • “cell” segmentation (classic watershed pipeline: mean shift smoothing -> otsu thresholding -> distance transform -> local maximum for watershed markers -> watershed segmentation -> label filtering by size to remove small objects like dust and lens smudges)

  • approximate each label region with a disk centered at region’s centroid and with the same area

  • compute the “intersection map”: a binary image with pixel set to one where two disk overlap, black otherwise

  • The agglomeration measure (alpha in the uploaded image) is defined as:
    (# of pixel intersection)/(# of pixel of regions) * entropy

the entropy term (the haralick’s entropy) is used as weightening factor because I noticed that gives me a good measure of “messiness”. The measure lacking of some good math properties, is only bounded from below, i.e. is 0 when every cell is separated but have no clear above bounds and is not normalized.
For now this definition is almost good to me, but further investigation is necessary to have more robust measure for the “cellular agglomerativity”. Any suggestion to improve the measure or anything else is welcome!


Just a quick question, if I understand correctly, does this mean you are (either deliberately or not) much more heavily weighting parallel oriented non-circular cells over any kind of circular cells, or cells touching end to end (which would have 0 overlap after being converted into circles)? And that perfectly circular cells will almost never contribute no matter how closely packed they are?

Actually, one follow on thought to that, I suspect this method will make segmentation errors where watershed fails to separate two cells much more heavily weighted, due to the size of the circle created. Not a huge problem in and of itself over many images, unless you have multiple cell types… and one of those cell types tends to separate less easily in the algorithm. The less well separated cell type will contribute more heavily to the measurements.

1 Like

Let me clarify a bit the working context. The microscopic slides contains nasal mucosa cytotypes (essentially: ciliated, muciparous, neutrophils, eosinophils, lymphocitye, mast cells). I have two datasets of microscopic slides images, one dataset contains images of slides processed with direct smearing (SM) and mgg staining and the other one processed with cytocentrifugation (CYT) and mgg staining. My task is to visually compare (objectively) the similarity of these the two dataset from multiple visual features (spatial distribution, chromaticity, texture, etc…). What’s the purpose? These images are processed automatically by python code that extract cells from the images and using a CNN , classify them into the cytotypes already cited. So measuring the spatial distribution I can assess the performance of the extraction process for the SM and CYT dataset.

From a quick subjective analysis, I noticed that:

  1. The number of cells cointained in the CYT images are much lower than SM images.

  2. The dispersion in CYT images are much higher than SM images, ie the distance between cells is higher.

For (1) I’ve already found and computed a measure (directly on pixels, without any segmentation) that seems pretty robust. For (2) the measure is the agglomeration measure that we’re are talking about that from a conceptual point of view seems more fuzzy and complex to define.

Now for answer your question, as you said, I think that’s true, non circular cells have more impact due to the equivalent disk approximation. I don’t understand the second part of your answer, the segmentation step is executed before the “disk approximation” step, so “I suspect this method will make segmentation errors where watershed fails to separate two cells much more heavily weighted, due to the size of the circle created.” seems wrong to me or maybe I don’t understand

The second part was just an extension of the first. If you have two cells that are not separated, they will generally appear like one, larger, less circular object. So when you create the circle for those two (one) non-separated cells, they will have a large impact on your value for that image. There will always be errors in segmentation, but if this is happening more often for one cell type, you should expect that those cells will contribute more heavily to your total value (and therefore more heavily weight images with those cell types).

If that kind of error doesn’t really impact your final answer, then no worries! If your populations are similar enough, it probably won’t. If your two methods give you different proportions of cells though, it might. Something to watch out for.

Ok is much clearer now. From the readed papers seems that the processing techniques should not modify the proportions of cells types on the slides. Unluckily the distribution of the cell types (regardless the processing methods used) is not uniform, for example the lymphocite and mast cells are much less likely to be found, so your point could be true. Thanks to point out this issue, I’ll report it in the measure description.

1 Like