P(enriched) scoring

Hi there,
Can you please refer me to a better explanation of what p(enriched) scoring is? I find it hard to understand from your CPA user manual (p. 25).
Thank you,

Hi Noga,

Our enrichment score was first developed here: pnas.org/content/106/6/1826.long and Figure 2 there gives some overview. Paraphrasing, it is a likelihood/probability that a sample (e.g. well) is significantly enriched for the positive phenotype. We use it primarily as a method to rank samples, i.e. a “hit-list”.

Here’s the issue with scoring samples, as I understand it: We need to score samples, and the simplest approach would be to simply count (# Pos cells) / (# Total cells) for each one. However a potential “counting” problem arises if one sample has only a handful of cells (e.g. 1 out of 5 positive) while another has hundreds or more (100 out of 500). The latter is almost very likely more significant, but a ratio would not tell us that, so we have to move to estimating probability distributions, which is where our beta-binomial model comes in.

I do see that our manual (p. 25 of cellprofiler.org/linked_files/Do … manual.pdf) is possibly confusing (though it is better than it used to be!). Note that the question of scoring samples is still an open problem to my knowledge, so feel free to rank your samples how you like, but this is our most rigorous attempt. Does that help?


1 Like