So I’m the young scientist in grade 9 of which was posted about previously (by my dad), and I just wanted to say thanks for looking into my question.
It was very helpful, and I’m now counting/classifying up to 30,000 cells a day across 7 bioreactors! It’s really great to have this software freely available.
Also, I was just wondering about a couple of things:
First, what do the rules actually mean and how than they be interpreted? It’s not relevant to my study but I’m just curious.
Second, is there anyone else you are aware of that is using this software to count/classify algae cells to study population dynamics on algal systems? Just wondering if this is a novel application of the software.
You’re very welcome. It sounds like you’re doing some awesome stuff. I’d be happy to answer your question about the rules in Classifier.
First of all the rules are called “weak-learners” and are part of the gentle boosting algorithm which you can search for if you want. We print out the rules in the following format:
where a and b each are a list of weights for each class.
You can read this as: IF (feature > threshold) then apply weights from “a”, otherwise apply weights from “b”.
A weak-learner is a rule that tells you a little something about your classes… the idea being that if you take a bunch of weak-learners together then you will get a strong-learner which is one that can distinguish between your classes much better.
For example, a 2-class Classifier might have a rule like this:
The “a” part tells us that the learner is 99.9% sure that nuclei larger than 20 do belong in class 1, and 99.9% sure that nuclei larger than 20 don’t (because it’s negative) belong in class 2.
The “b” part tells us that the learner is 99.9% sure that nuclei smaller than 20 don’t belong in class 1, and 99.9% sure that nuclei smaller than 20 do (because it’s negative) belong in class 2.
So you can see, there are 3 parts, (1) the condition (feature > threshold) (2) a: the weights to give the classes if the condition is true and (3) b: the weights to give the classes if the condition is false.
Once you understand how a single learner can be used to classify an object, understanding how many learners can be combined together is pretty easy. All we do is evaluate all of the rules and collect the vectors (a or b) that result from the evaluations. We then sum all the vectors up and the index position of the maximum component of the vector is taken as the class.
Here’s an example:
Suppose we have the following object data:
object_number, nuclei_area, nuclei_intensity, cell_area, cell_intensity
1, 10, 10, 20, 1
2, 3, 0, 4, 1
3, 6, 1, 7, 8
nuclei_area > 5, [1, -1], -1, 1]
nuclei_intensity > 0, -0.2, 0.2], [0.2, -0.2]
cell_area > 7, -0.9, 0.9], [0.9, -0.9]
classifying object #1:
nuclei_area: 10 > 5 ==> [1, -1]
nuclei_intensity: 10 > 0 ==> -0.2, 0.2]
cell_area: 20 ≤ 7 ==> -0.9, 0.9]
[1,-1] + -0.2, 0.2] + -0.9, 0.9] = -0.1, 0.1]
max(-0.1, 0.1]) = 0.1 ==> class 2 (since 0.1 came from the 2nd component)
So this object falls in class 2 according to our classifier. If you want to try classifying objects 2 and 3, I’d be happy to check it for you.
As for your second question, I haven’t heard of anyone using CPA for this sort of thing, so yes, it definitely does sound new (and exciting)!
Thanks so much for getting back to me on how the rules work. Timing is perfect. My project won a gold medal in the regionals this year and I’m off this morning to the national competition. My father helped out explaining the vector math - so its really quite understandable how the rules are applied. (I guess generating the rules is a bit more complicated). Anyway, its really going to be great to have a better understanding of the classification mechanics for the judging.
btw - I expect to be using CPA again next year as I am considering changing my research focus from algal bio-fuel to neurology - maybe even with c. elgans. So likely I will have more questions on CPA in the future. Hope you don’t mind.
Glad to help, Mikaela!
First of all, congratulations on your fantastic accomplishment! I think I speak for everyone in our group when I say I am very impressed and humbled by your passion for science. The only thing we like more than helping people to do interesting science with our software, is encouraging aspiring scientists to see how cool science can really be.
I’m glad my explanation came at a good time for you. We’d be happy to answer any questions you have in the future. Keep it up!
The Canada Wide Science Fair was held last week, where Mikaela’s project “Population Dynamics in Algal Bioreactors” was entered. As you know, the use of CPA was an important component of her work. Over the course of the project over 1.25 million algal cells were classified and counted! Mikaela in not keen on talking about her accomplishments but I thought you and your team would like to know how things turned out in the competition.
It was actually quite a remarkable week. Amongst a quite a collection of awards and scholarships Mikaela picked up a second Platinum medal. While the event records are bit murky going back, the organizers think that in the 50 year history of the competition Mikaela may be only the second person to be honored with a Platinum twice!
In addition the CBC network did an interview with our girls (both our daughters competed actually , in different age categories with different projects) about their awards and their basement lab. Here’s the link to the television interview.
Thanks again to the team for all your support.
b/o Mikaela Preston
Hello, and thanks for the update, Chuck!
I just watched the video and must say I’m very excited for you and your family. I’ve forwarded your message to the rest of the gang here, as I’m sure they’ll be equally amazed. Keep up the great work!