CPA on multiple classes and interpreting saved training sets

Hello,

We are working on segmenting nuclei and tissue regions from histology slides. We would like to use CPA to create manually labeled training sets on both of these (so cells could be subtype 1, 2, 3, etc., and tissue regions could be of subtypes a, b, c, etc.), because CPA’s drag-and-drop-into-bins functionality is really convenient. However, when I create a single object table, CPA seems to arbitrarily choose one object type (either nuclei or tissue) to display, and shows no thumbnails of the other object type. To get around this, I created one table for each object, and manually changed the properties file to read from one or the other object type. This seems to be working fine, but is there a better solution?

In CPA, when I click on save training set, I obtain a file which reads something like

label positive negative positive 1 83 1192 606 positive 2 108 865 537 positive 2 43 730 230 positive 2 1 119 9 positive 2 164 876 903 negative 2 148 357 767 negative 1 23 622 200

Could you explain what the different columns mean, please? Also, since my objects come from two different types (tissue and nuclei), will there be an overlap in object ID? I would like to extract out the relevant measures for each object in the training set, so that we can construct our own classifiers for them.

Thank you very much!

Pang Wei

Hello,

The issue you describe in the beginning of your message sounds like it’s due to non-unique object-ids in your database. Since it sounds like the objects are mutually exclusive, you’ll have to keep doing what you are doing and keep them in separate per-object tables. Out of curiosity, how did you produce the table in the first place? I’m not sure I know how CellProfiler would allow you to produce a single table with multiple object types.

Typically, before people take their data into CPA, they use ExportToDatabase to export a flat per-object table which compresses all objects (eg: Nuclei, Cellbody, speckles) into a single row with the metrics in each column (eg: nuclei_intensity, cellbody_intensity. speckles_intensity)

As for the training set, sorry for the confusion… the “label positive negative” line just lets classifier know that you have 2 classes called “positive” and “negative” (in case one of the classes is empty). The remaining rows specify: class, ImageNumber, ObjectNumber, X_coord, and Y_coord

It’s still not 100% clear to me how your per-object table is structured… if these two object types you’re describing are mutually exclusive (ie: nuclei are NOT part of the tissue), then you’ll have to keep them separate in CPA, and yes, the ObjectNumbers will clash. Otherwise, this may be a good question to pose on the CP forum to see if they can help you configure ExportToDatabase to get the data in the right format.

Cheers,
Adam

Ah, the ExportToDB setting I was referring to is “single object table”… this will merge your objects into the same row… again, I’m not sure if this makes sense or not with your data.

Hi Adam,

Thanks for the reply! My two object types are not mutually exclusive (tissues regions are studded with nuclei). Some nuclei may not be in tissue regions, and some tissue regions may not contain any nuclei; however, for each tissue region, I would like to get the IDs of all the nuclei that are within it. I’m thinking that relateObjects does this?

ExportToDB has an option that says “Create one table per object or a single object table?”, which I set to create one table per object. It would be nice if we could get it in the same table, but I can’t seem to get CPA to work with it. When you put a single object table into CPA, what is the expected functionality of the training set labeller? Is it supposed to pull all the different objects randomly to form the thumbnails?

Thanks,
Pang Wei

Hrm, I think you better post on the CP help forum about your case… I’m not sure if relateObjects is the right module, nor am I sure what to expect from flattened per-object table for objects that aren’t in a hierarchy.

As for your second question about the training set labeller, I’m not sure exactly what you’re asking but let me say this. All objects are treated equally since they all come from the same table and should all have the same measurement columns. There is no way to hierarchically classify objects without first separating the major classes into separate object tables and then classifying those tables on their subclasses.

Basically, if you flatten your objects into a single table and you fetch N random objects from the whole database, you can expect to get thumbnails for nuclei and thumbnails for tissue regions, but Classifier doesn’t know the difference between those two types of things, you’d have to teach it how to discern tissue from nuclei just the same as you’d teach it to discern types of nuclei. That is, you could have 4 bins: Nuclei_A, Nuclei_B, Tissue_A, and Tissue_B, and train classifier to distinguish those things from each other, but they must be mutually exclusive.

Looking back, you said “My two object types are not mutually exclusive”… since classifier can only differentiate mutually exclusive object types, I think you have to keep your objects in two separate tables and classify their subtypes independently.