I am working on a python script to process output data from a CellProfiler pipeline. The pipeline filters cells into 3 different categories based on rules decided by the FastGentleBoosting algorithm in CPA. The python script will then (among other things) collect the number of cells in each category from the output spreadsheet. I also want to find the Enrichment score for each sample, to use it as a hit marker. I have not found a way to get this score from the CellProfiler pipeline, so I have tried to implement the Enrichment score calculations from CPA (found in github) into my python script. I have included all the necessary functions from all the necessary .py files.
My issue is that when I try to fit the beta-binomial, I don’t get an
alpha value (only ‘nan’). I am quite sure that the reason for the issue is the data in the
counts variable. In
counts = groupedKeysAndCounts[:, -nClasses:]
groupedKeysAndCounts = np.array([list(k) + vals.tolist() for k, vals in dm.SumToGroup(imData, group).items()], dtype=object)
I am not sure how the data in
groupedKeysAndCounts is organized, so I suspect that the beta-binomial fit function requires the data in the
counts variable to be organized in some specific way.
In my python script, I have collected the number of cells in each category for each sample. As an example, I could in one sample have 178 cells in Category 1, 82 in Category 2 and 102 in Category 3. I would then expect that I could organize the
counts variable like this:
counts = [178, 82, 102]
However using this list does not work, as there will be issues with calculating the value for the
s variable in the dirichlet_moment_match function (
m2ok - aok*aok is 0 and
s becomes ‘inf’.
Could you help me with any suggestions on what data should be included in the
counts variable for me to get the Enrichment score of each category? Also, if you think that the issue is coming from somewhere else, please let me know. Any help is appreciated!