Manual Feature Selection


is there a way to manually (pre-)select the features that the classifier can use for classification? In the CellProfiler pipeline, I am computing a lot of features but for testing purpose, I want to manually limit the variety of features that the CPA classifier can use.

Thanks a lot and a Happy Holiday season,

Hi Stefan,

Not as such, but you can select measurement columns to ignore by using the classifier_ignore_columns field.

For example:

This will ignore any column named “WellID”, any columns that start with “Meta_”, and any columns that end in “_Position”. So at the very least, you can trim down the list of measurements to consider.


To clarify, since we get this question a lot: Mark is referring to changing the properties file, which is the small text file you use when starting up CellProfiler Analyst. Adding more fields to the entry “classifier_ignore_columns” will allow you to ignore features containing those words. You will need to restart CellProfiler Analyst for changes to take effect.

I wanted to update this forum post, because this approach very useful when you want to ignore an entire channel in an analysis. Let’s say we have an experiment with staining for DNA, Actin, and Tubulin. I want to do machine learning using Classifier but using just the DNA channel. Therefore, I add terms to the properties file as below (Cytoplasm_., ._CorrTub., .CorrActin.*, Cells.*). Note that I’m excluding not just the stain names (which will exclude all intensity features using those channels) but also the compartment names Cytoplasm and Cells, because I defined those compartments using the actin stain.


==== Excluded Columns ====

DB Columns the classifier should exclude

classifier_ignore_columns = TableNumber, ImageNumber, ObjectNumber, Image_Metadata_., .Location., .ObjectNumber., .Parent., .Children., Cytoplasm_., ._CorrTub., ._CorrActin., Cells_.*