Seeking help for young scientist, Part II - CPA


Per our recent CP thread ‘Seeking help for young scientist (Part I)’ , with Mark’s help we were able to get CP working well enought that we felt CPA would likely be able reliably classify, count and measure cell coverage (i.e. indicator of biomass). We installed CPA this weekend using SQLite and ran a training set with four bins: 1 bin for each algal species and 1 bin to collect background nooise artifacts. The results were spectacular. Almost 100% accuracy on one of the species and on garbage collection and 90% accuracy on the reamaining two species. We were absoulutely thrlled. Takes about three seconds to score a data set that would otherwise take 30 min of tedious hemocytometer work!

However we ran into a snag when we tried to run our first non training data set. Unless we’re missing something, it seems that CPA is design primarily to address the scenario where the researcher has a large volume of images that comprise a single experiment, say 25,000 images, and where the procedure is to create a training set for specifically for that experiment. Here the return on creating the training set is very high.

However, Mikaela’s scenario is quite the oppossite. Mikalea is withdrawing samples, for cell classification, counting and biomass measurents, form a coninuously running bioreactor. She will not have 25,000 images in an experiment but rather 1,000 experiments of 25 images (literally). Unfortunately we discovered that we can’t bring training set rules into new experiment runs.

There’s probably a way to merger training sets into experiment data sets using MySQL but we don’t readily have access to the kind of DB skill we probably need to install and configure MySQL. SQLite we found very easy to use.

In researching this issue on the forum we read that there might be a bleeeding edge version of CPA available that allows for cut and paste of rules from training set into experiment runs. This sounds like by far the easiest solution. There was a note regards this apporach currently not dealing with differences in classes between training set and experiment runs, but in our case the bins will never change - so this would not be an issue.

Is there any chance we could try the version of CPA that has the rules cut and paste capability. Any other thoughts on how to approach this issue.

Finally, thanks for making this amazing application available. Besides the enormous time saving potential - it is quite something to watch this machine learning application in action.

Chuck Preston
b/o Mikaela.

Hi again,

We don’t have the bleeding edge version available as for CP, but we can build one for you. What OS are you using?


Hi Mark,

In reviewing my daughter’s recent post to this forum I see that I missed your response to our previous enquiry about a ‘bleeding edge version’ . Sorry about that and thanks for the offer to compile something for Mikaela. As it turns out we took the plunge and installed a MySQL DB at home. While that was a significant undertaking in itself, itturned out to be really good move offering many advantages over the SQLite approach.

As noted in Mikaela’s recent post, during the experimental phase of her project she has been classifying and counting up to 30,000 cells per day. To date, using borrowed lab equipment and the CP/CPA software she has collected data on over 1.25 million individual cells. Quite impressive I think for a basement lab setup. (btw - Mikaela will be including the Broad Institute in her project acknowledgements :wink: )

thx again for your assistance.