Load different object classes into tthe Classifier

Hi,

I am currently working with objects in two different channels (green and blue) to segment normal cells (blue channel) and parasites (green channel). To segment them, I run IdentifyPrimAutomatic for each channel independently. The problem is I would like to load both sets of objects (normal cells identified in the blue channel and parasites identified in the green channel) combined to CellProfiler Analyst to distinguish between them but it seems you can only load one kind of “primary objects” to it. Is there a way to combine different object classes into a single primary object class with cellprofiler? If not possible, is there on the other hand a way to tell the exportToDatabase tool to load all the different classes to the database?

I know that trying to classify what is already classified with CellProfiler may seem weird but I’d rather have the classification performed by a learning machine than by manually adjusting the parameters for each channel.

Thank you very much for your help!

Juan

Hi Juan,

You’ve stumbled onto one of the limitations of CPA that we have yet to build in the flexibility to handle gracefully. Still, depending on a few conditions, I may be able to help you out… a question first.

What kind of biological relationship are we looking at between these two object types? ie: Could it be reduced to a parent-child(ren) relationship? Supposing you had parasites within the cells and outside of them, then you could use the relate module to establish this relationship (it will compute a “parent” column for the child objects, and a “parent” and “agg_XXX” columns for the parent objects where agg_XXX is an aggregate (mean, median, etc) measurement of column XXX of the children).

On the other hand, if you want to preserve the individuality, there is no easy way to it… basically you’d 1) make sure your object measurements are the same for both object types 2) use export_to_database twice to create separate object tables for the two object types, then 3) manually merge those object tables.

-Adam

Hi Adam,

Thanks for your quick reply! I cannot use the parent/chil relationship as it only stores the average and standard deviation of the measurements of all the children and I need the individual ones in order to classify them.

I ran my pipeline on my cells using two IdentifyPrimAutomatic modules for each of the classes. CellProfiler stores the last run of the module as the primary objects so I run it two times, swapping the place of the modules so ExportToDatabase would take each primary object on each run. What I find weird is that when I compare the two per_object.CSV tables, they contain the same number of objects, even though they have different values, so I think that means they store the measurements from both objects. The .SQL file also stores the columns for measurements of the two classes (eg. Mean_Hepatocytes_Zernike_0_0 FLOAT NOT NULL, and Mean_Parasites_Zernike_0_0 FLOAT NOT NULL,) and their position. So, despite being different objects, all my info is there.

The problem comes when loading them to Analyst. In the properties file you only provide one set of object coordinates (eg. cell_x_loc = Hepatocytes_Location_Center_X, cell_y_loc = Hepatocytes_Location_Center_Y) so the other class is not loaded to the classifier. Is there any way to overcome this?

Thanks,
Juan

Hi Juan,

I’m tempted to just answer your question and send you on a quest to get your data into a form that you can bring into Classifier, but I’m worried that this could be a waste of your efforts since I don’t completely understand what it is you are trying to do. I’ve spoken with a couple of our image assay developers (IADs) about your query and they seem to think that you could tweak your CP pipeline to do what you need. (e.g. one suggested over-segmenting your images and using area-based scoring)

2 questions

  1. What exactly are you trying to do?
    Do you ultimately want to count these 2 types of objects?
    To classify each type into sub-types?
    It sounds like you want to classify each object in general as either Hepatocyte or Parasite… but this is confusing since you said that they are present in separate channels.

  2. Could you send me an image set (1 image of each channel), so I can run it by our IADs and see what they think you should do?

Hi Adam,

Sorry I should have explained the problem before. I have images from liver cells: blue stain goes for hepatocytes while green stain goes for parasites which I also want to classify between hypnozoyts (small bright dots) and merozoyts (big bright green cells). I segment each of the three kind of cells using preprocessing and identifyPrimAutomatic so I have three different sets of cells. What I want is to know is the number of cells of each class in my images.

I am not a biologist, I am currently working in a machine learning group which is partly focused in the use of classifiers. We want to classify a really large set of images daily. As such, the usage of your classifier is really important for us. Just relying on manual adjustment of the parameters would not do the trick, Thats why, despite segmenting them with CellProfiler (with a broad definition of the parameters for every object to be segmented so we are sure that every object is there although getting a lot of ‘junk’ along with them) we will need further classification with Analyst 2.0 to set better classification rules than the ones I can adjust manually and count the number of the 3 different classes. Hope you understood my problem.

I attach 3 images: One for the blue channel (I just need to segment the round cells, the really bright objects and the big ones are not interesting for me); One for the green channel (where there is one big parasite and three small parasites or dots, the rest is junk); The third one is a color image so you see how the whole thing looks. There is low to none relation between the info in one channel and the other one, only a bit for big parasites.

Hope this helps you understand better my problem. Thank you very very much for your help!

Juan






Hi Juan,

When CellProfiler finds more than one primary object in your pipeline, it will export them all, with the number of rows (objects) being equal to the greatest common denominator among them. The rest of the objects will be padded with zeros. So when you say that the number of objects is the same both ways you run the pipeline, that’s because CellProfiler is padding with zeros- so there really aren’t the same number of objects, and you’ll see that if you look at your data.

For example, in this screen we identified the well edge in every image, so there is just one “well object” but many cells:

±------------±-------------±-----------------------±-----------------------------------+
| ImageNumber | ObjectNumber | Well_Location_Center_X | DsRedCellsIDPrim_Location_Center_X |
±------------±-------------±-----------------------±-----------------------------------+
| 1776 | 1 | 209.441 | 40.6 |
| 1776 | 2 | 0 | 140.615 |
| 1776 | 3 | 0 | 177.333 |
| 1776 | 4 | 0 | 205.179 |
| 1776 | 5 | 0 | 264.977 |
±------------±-------------±-----------------------±-----------------------------------+

To do what you suggest, I think you are going to have to trick CPA into thinking all you objects are the same primary object, since right now CPA only handles one primary object (it handles secondary and tertiary objects well, but your objects do not colocalize so I understand why you cannot use relate or IdSecondary).

You could run the analysis twice, identifying only the parasites the first time, and only the hepatocytes the second, making sure to call them both ‘Cells’ (or whatever you would like to call them) in each case, and making the same measurements both times, and simply concatenate the tables. In your properties file you would then specify Cells_Location_Center_X & Y for the cell location. If you want to keep the ImageNumbers the same for both the hepatocytes and the parasites (which I assume you would, so you could count up the total in each image) you’ll need to add an additional ‘TableNumber’ column when you merge the two tables, because each image must be uniquely specified. In this case you simply add the table_id = TableNumber definition to your properties file.

Running the analysis on both cell types simultaneously would involve a great deal more database manipulation so I would suggest the above method. Since you’re running them separately, you have to make all the same measurements- so you’ll have to measure blue intensity in the parasite for example, even though you don’t expect there to be any, because that will be an important feature to measure for the hepatocytes and vice versa.

-Kate

Thanks for the help Kate! :mrgreen:

Juan, does Kate’s suggestion make sense to you? We wrote a wizard in Python that can be used to merge tables if they have the same columns. With the help of this wizard (or with some basic SQL if you prefer) Kate’s suggestion would probably be the most simple and consistent way of doing this.

Also, I forgot to ask, what version of CPA/Classifier are you using? Also, are you using MySQL or SQLite?

Let me know if you need any more help,
Adam

Hi Dan and Kate,

Thanks for all the help. Right now I’m using CPAnalyst 1.0 since I’m running it on a linux server and the IT guys haven’t installed the SVN binaries for me yet, but will switch to v2.0 as soon as that happens. I am using MySQL for the database.

The solution you provided looks good. I already combined the data into one object table using MySQL (it took me a while since I’m quite new to MySQL, but was a good training experience). What I did is create a copy table for the first Per_Object, load the data from the second indentification run to the copy, then define a TableNumber column with a different default value for each object table, change the primary key to include TableNumber and finally merge both tables into a new copy table. I think this worked.

Now I’m quite confused about what should I do with the V1 properties file. The changes I think that should be done are:

Insert table_id = TableNumber somewhere in the file.

Change classifier_per_object_ignore_substr = ImageNumber, ObjectNumber
to classifier_per_object_ignore_substr = ImageNumber, ObjectNumber, TableNumber

Should I update this? objectCount = Image_Count_Hepatocytes

Thank you very for your help. I’ ve never seen such an efficient online support as yours!

Nice! Sounds like things are starting to move along.

Just so you know, we recently released CPA2.0 on the website, so you don’t need to install the developers version unless of course you are interested in tracking the bleeding edge. I strongly suggest using the new version since the Classifier has many improvements over CPA1.0.

As for the merging, it sounds like you’ve done the job right. You should be able to do something like:

CREATE TABLE IF NOT EXISTS tableC LIKE tableA; ALTER TABLE tableC ADD COLUMN TableNumber INT; ALTER TABLE tableC DROP PRIMARY KEY; ALTER TABLE tableC ADD PRIMARY KEY (TableNumber, ImageNumber); INSERT INTO tableC SELECT *, 0 FROM tableA; INSERT INTO tableC SELECT *, 1 FROM tableB; ALTER TABLE tableC MODIFY COLUMN TableNumber INT FIRST;

Note that it’s critical here that your tables A and B have identical columns in the same order, otherwise the insert statements could mix up your data.

You are right about the props file changes. As for the object count, this field is only necessary in CPA1.0. I’m not exactly sure where this is used since objects should be counted from linking the object table to the image table in classifier. Let me know if you want me to look into it… and I’ll pull out the somewhat-dusty code.

Glad to be of help,
Adam

Yes, things are moving!

I need to use the developers version since I am working on a linux (Fedora) server and I see no release version of CPAnalyst 2.0 for linux in your downloads section. Will you release it in the future? I will move to CPAnalyst Developer’s as soon as I can but this problem needs to be solved in that version as well so…

I checked wether the columns of the two analysis where the same and they are. Objects from first analysis were asigned TableNumber=0 and second analysis objects got TableNumber=1.I then merged the two tables and changed the table number to the first position (thx for the tip, didnt know I had to do that). I also checked for the columns in this new table and things are still ok. I also checked some of the data from both analysis in this new table and things look ok, no empty rows or duplicates. So, everythings ok with the MySQL part of the job.

The problem comes with the properties file. I dont know how to tell CPAnalyst to take TableNumber as part of the ID of my objects so it does not find any duplicates. This is what I find when I try to grab cells (numbers > 103 are part of the second analysis):

Grabbing 24 random cells from image number 1 in table Per_Object3
Encountered 1 duplicate objects.
Encountered 2 duplicate objects.
Encountered 3 duplicate objects.
Select DISTINCT Per_Object3.Hepatocytes_Location_Center_X,Per_Object3.Hepatocytes_Location_Center_Y FROM Per_Object3 WHERE (ImageNumber=1 and ObjectNumber=40)
Select DISTINCT Per_Object3.Hepatocytes_Location_Center_X,Per_Object3.Hepatocytes_Location_Center_Y FROM Per_Object3 WHERE (ImageNumber=1 and ObjectNumber=94)
Select DISTINCT Per_Object3.Hepatocytes_Location_Center_X,Per_Object3.Hepatocytes_Location_Center_Y FROM Per_Object3 WHERE (ImageNumber=1 and ObjectNumber=80)

************* More grabbing logs**************

Select DISTINCT Per_Object3.Hepatocytes_Location_Center_X,Per_Object3.Hepatocytes_Location_Center_Y FROM Per_Object3 WHERE (ImageNumber=1 and ObjectNumber=104)
java.sql.SQLException: Illegal operation on empty result set.
at com.mysql.jdbc.ResultSet.checkRowPos(ResultSet.java:659)
at com.mysql.jdbc.ResultSet.getInt(ResultSet.java:2241)
at jCPAnalyst.ShowInfo.MetaInfo.findXYCellLocations(MetaInfo.java:96)

***************** More Java stuff******************

I enclose the full error log along with my properties file. In the properties files I changed Per_Object table to Per_Object3 (the merged one), I added table_id = TableNumber and added TableNumber to the ignore list of the classifier. Is there any way I can tell the Classifier to take the TableNumber as an identifier for everyobject as ImageNumber and ObjectNumber already are?

Thanks for all!
Juan
propsANDlog.zip (1.66 KB)

Okay, I have a bit of bad news. Kate and I are so used to working with 2.0 properties files that we forgot that the old props files (read by CPA1.0) don’t support a table_id column. I looked at your props file, and it looks good except for that one part, but this means that the way you are merging tables won’t work. In fact, looking at it a bit closer, I realize that this won’t quite work as-is for CPA2.0 either because you need to merge the Image table with itself.

Here’s why:

The table_id and image_id columns in CPA2.0 refers to columns that should be found in BOTH the image and object table. The PAIR of these columns is what lets us link one table to the other to determine which image an object belongs to, and which objects belong to which image. It appears that you are only merging the object tables which means the Per_Object3 will have columns TableNumber & ImageNumber while Per_Image will only have the ImageNumber column. This means you need to perform the same merge step on the Image table even though it would mean producing completely redundant data.

So, you could do this yourself, or in the CPA2.0 SVN repo, there is a file called CreateTableMasterWizard.py that does all of this merging for you to produce valid tables for CPA2.0… The catch, in either case, is that if you are reporting enrichments from classifier on a per-image basis, they will be confused by the fact that each image is represented twice by different a different image key. If you aggregate on a per-well basis, then it should solve this problem as far as object counts and enrichments go, but it’s less than ideal.

Just to clarify, the table_id is meant to be used when merging multiple tables that represent separate IMAGE sets, not separate objects from the same images. You may need to use CreateTableMasterWizard.py for that in the future if you continue to get more images that you need to run through your pipelines, but not now.

I’ll leave it to you for what to do from this point. One option, if you can’t wait for CPA2.0 to get set up is to merge the object tables by renumbering the ObjectNumbers.
i.e.

  1. duplicate per_object1 to master_per_object
  2. iterate through per_object2 and add each object row from this table into master_per_object with all of the same values EXCEPT the ObjectNumber value, which must be incremented by the highest existing ObjectNumber for the current ImageNumber in master_per_object.

I hope this all makes sense.

Hi Adam,

It works! The way you told me to do it at the bottom totally works. I did it manually this time but I think it won’t be too hard to translate the lines I wrote into a script to perform it automatically so when I get zounds of images I can merge the tables easily. Since I normally program in C++, I will do it in MySQL++, but in case you want it, there is a guy here who could translate it to python and give it to you in case sb wants to do weird processing combinations like mine, lol. I’ll keep you informed.

Anyway, thank you very much for your help. You are really helpful and fast. Keep up the good work!

Cheers,
Juan

That’s great! And thanks for the compliments. I’ll certainly gladly accept any code you share since there’s a good chance we’ll use it at some point.

Happy analysis!
Adam