Does CellProfiler need its Data Tools anymore?

CellProfiler has a menu item called “Data Tools”. It also has a category of modules called “Data Tools” which are essentially the same list (except in cases like TrackObjects and EditObjectsManually where other categories make more sense)

These are described as follows:

The Data Tools menu provides tools to allow you to plot, view, export
or perform specialized analyses on your measurements.
Each data tool has a corresponding module with the same name and
functionality. The difference between the data tool and the module is that
the data tool takes a CellProfiler output file (i.e., a .mat or .h5 file) as input,
which contains measurements from a previously completed analysis run.
In contrast, a module uses measurements received from the upstream
modules during an in-progress analysis run.
Opening a data tool will present a prompt in which the user is asked
to provide the location of the output file. Once specified, the user is then
prompted to enter the desired settings. The settings behave identically as
those from the corresponding module.

Here are the modules:

And here are the menu items, which again can be run after a processing run is complete:

To some degree, if we want to have one (modules) it’s not that much extra work/code maintenance to have the other (data tools) so if they stay or go, likely we’d want them both to stay or go.

These tools are a nice feature, but we wonder how frequently used they are. And even if used often, how onerous are the alternatives, which are:

  1. for most tools (Histogram, Scatterplot, DensityPlot, CalculateMath, CalculateStatistics, FlagImage) run an analysis in excel (or if you’re a database user, use more sophisticated plotting software/libraries)
  2. for some tools (Histogram, Scatterplot, DensityPlot, Platemap) download CellProfiler Analyst and run analysis there.
  3. MergeOutputFiles? Not sure about alternatives for this one

Some tools have no (good) alternative and would likely need to stay:

  1. TrackObjects - can be finicky to set parameters, necessitating multiple iterations - but since it runs only on the last image set it would be insane to run the entire pipeline on an entire image series each time just to tweak the last step.
  2. CalculateStatistics (unless we add it to CPA)

Some tools are used very frequently in pipelines as modules, so may as well stay as data tools in the menu, like:

  1. CalculateMath is very often used in a pipeline to calculate per-cell ratios so would likely need to stay
  2. DisplayDataOnImage is very useful for debugging a pipeline so makes sense to have it in a pipeline
  3. EditObjectsManually
  4. FlagImage, I think?

My straw man thought is that we should delete these modules and menu-item data tools:
DisplayHistogram, DisplayScatterplot, DisplayDensityPlot, DisplayPlatemap
… but leave the rest as is.

Some nice things about these particular tools that we would lose with this approach:
a. Nice to watch data displays being produced during processing, to check on results and identify problems before finishing the full set.
b. Nice to not have to install and learn CellProfiler Analyst if you’re only looking at a little bit of data/a few analyses.

Perhaps these niceties outweigh the burden of keeping these tools/modules around. Perhaps there are more compelling reasons than I’ve listed to keep them around.

I’d love to hear others’ thoughts!

Interesting discussion!

I’d remove the category entirely since the modules don’t accomplish their aim of replacing outside tooling in a satisfying manner. A reasonable replacement is documentation for using CellProfiler with third-party applications (e.g. Excel) and libraries (e.g. Pandas). CellProfiler Analyst too.

A few notes about specific modules:

  • Conceptually, “TrackObjects” has utility. However, I’d like to see it receive far more attention and moved into a standard module category.

  • “CalculateMath” is a bizarre module from a functionality perspective. I also don’t understand its name since “calculate” and “math” are analogous.

  • “DisplayDataOnImage” is useful because a number of modules lack useful diagnostics in their visualization. I think it can be replaced by addressing these shortcomings.

  • “EditObjectsManually” is very useful but it’s too complicated from a usability perspective. I think it can be replaced with one or more modules that replicate common “EditObjectsManually” workflows.

  • “FlagImage” should be replaced with whatever people use for downstream analysis.