Converting Example Pipeline to run headless

Hello,

I’ve been working on a CellProfiler build for the cluster I’m working with (@allen_goodman might recognize me), and I’ve finally got a “working” build set up in a virtual environment that I’m currently trying to test to see if it’s functioning as intended. I’ve downloaded several of the example pipelines from the website, and I’m having some issues getting them to execute in headless mode. What I’ve tried so far is installing a local version to my desktop/laptop (from the git repo so as to be as close as possible to the cluster install), and importing those example pipelines, appending the CreateBatchFiles module to the end, and creating the .h5 file. However, I can’t actually execute the command generated by

cellprofiler --get-batch-commands Batch_data.h5

Anywhere, including on my local machine, where I generated the file. For example, using the basic Human Cells pipeline from the website, the output of the above command is this:

CellProfiler -c -r -p Batch_data.h5 -g ImageNumber=1

Even though they don’t appear to be grouped (but I’ll get to that in a second). If I run that command, either locally or on the cluster (with the amended paths), I get something like this:

Version: 2016-05-13T20:00:34 9cad7ea / 20160513200034
Pipeline saved with CellProfiler version 20160503134940
Times reported are CPU times for each module, not wall-clock time
Uncaught exception in CellProfiler.py
Traceback (most recent call last):
  File "/home/at237/CellProfiler_test/CPvenv/src/cellprofiler-master/cellprofiler/__main__.py", line 251, in main
    run_pipeline_headless(options, args)
  File "/home/at237/CellProfiler_test/CPvenv/src/cellprofiler-master/cellprofiler/__main__.py", line 898, in run_pipeline_headless
    initial_measurements=initial_measurements)
  File "/home/at237/CellProfiler_test/CPvenv/src/cellprofiler-master/cellprofiler/pipeline.py", line 1684, in run
    initial_measurements=measurements):
  File "/home/at237/CellProfiler_test/CPvenv/src/cellprofiler-master/cellprofiler/pipeline.py", line 1796, in run_with_yield
    in group(workspace):
  File "/home/at237/CellProfiler_test/CPvenv/src/cellprofiler-master/cellprofiler/pipeline.py", line 1714, in group
    ", ".join(grouping.keys()), ", ".join(keys)))
ValueError: The grouping keys specified on the command line (ImageNumber) must be the same as those defined by the modules in the pipeline ()

And if I omit the -g flag, I get this regurgitation of java exceptions (which I can paste if requested). I’ve tried messing with the groups input module and what I pass to -g (somewhat blindly, I admit), but I can’t seem to generate a working command. Any ideas?

Hi, just bumping this. Could anyone give me a hand? I feel like I’m really close here.

As an alternative to using the CreateBatchFiles module, you could try creating a shell script that will submit a batch of CellProfiler commands. The set of images can be split up using the -f and -l options or by groups -g. Use the ExportToDatabase module in your pipeline to write data to a MySQL compliant database, so all of your output will be collected in a single location.

Would you be able to give me an example (say, with one of the example pipelines on the website)? Ideally, I’d like to not need to use the GUI at all, as that won’t be an option on the cluster.

For what it’s worth, I get the following when I try to run the pipeline as is without CreateBatchFiles:

$ cellprofiler -c -r -p ExampleHuman.cppipe
Version: 2016-05-13T20:00:34 9cad7ea / 20160513200034
Pipeline saved with CellProfiler version 20140723174500
CP-JAVA 10:09:02.975 [Thread-0] WARN  o.c.imageset.ChannelFilter - Empty image set list: no images passed the filtering criteria.
Uncaught exception in CellProfiler.py
Traceback (most recent call last):
  File "/home/at237/CellProfiler_test/CPvenv/src/cellprofiler-master/cellprofiler/__main__.py", line 251, in main
    run_pipeline_headless(options, args)
  File "/home/at237/CellProfiler_test/CPvenv/src/cellprofiler-master/cellprofiler/__main__.py", line 917, in run_pipeline_headless
    return exit_code
UnboundLocalError: local variable 'exit_code' referenced before assignment

I’m having the same problem with the exit_code when running cellprofiler from virtualenv. CellProfiler is not exiting until the hard runtime limit on our cluster is killing the processes. Good news is it seems to run fine and return the output. Bad news is it’s really wasteful.

Just in case someone has a similar issue, I’ll post my solution:

It’s not going to fix @alextruong 's ‘Empty image set list’ issue, but as far as I can tell
UnboundLocalError: local variable 'exit_code' referenced before assignment
is caused by run_pipeline_headless() in __main__.py.

The exit_code variable is defined in an if statement:

    if options.done_file is not None:
        if measurements is not None and measurements.has_feature(cellprofiler.measurement.EXPERIMENT, cellprofiler.pipeline.EXIT_STATUS):
            done_text = measurements.get_experiment_measurement(cellprofiler.pipeline.EXIT_STATUS)

            exit_code = (0 if done_text == "Complete" else -1)
        else:
            done_text = "Failure"

            exit_code = -1

So I can only assume that options.done_file IS None in this case (don’t know why that is or what is causing it).

A simple bodge is to add exit_code = 0 in the top level of run_pipeline_headless, which has fixed the issue in my case, so the processes exit properly on the cluster.

:warning: I’m guessing there’s a reason not to do this, and something terrible might happen - so be warned.

So is that just because of an oversight when cleaning up the process? Good to know that it’s not as bad as it looks…hopefully. That might be a reason why my process doesn’t die as well, and I have to ctrl+c to kill it manually after seeing that exit code error. Just wondering, @SJWarch, did you test with the example data sets? Did the human set, for example, execute successfully?

I didn’t use the ExampleHumans pipeline. I found it less confusing to make my own pipeline with a handful of images and the LoadData module, I then created a .csv file (image_set.csv) with the path-name and file-name for the images as seen by the cluster, e.g

and ran:
cellprofiler -r -c -p test_pipeline.cppipe --data-file=image_set.csv -o exports/cp_test_output

Seems to work fine (apart from the bodged exit_code). Though I had to change --get-batch-commands to return “cellprofiler” rather than “CellProfiler” when using the CreateBatchFiles module.

@SJWarch Would you please submit your exit-code fix as a pull request on Github? Your solution should be implemented in future releases of CellProfiler. If you need instructions on how to submit a pull request please let me know.

@alextruong Were you eventually able to successfully run CP on your cluster?

Ah, I forgot to reply to this. Yes, I was able to several times, each time with git clones of the entire repo. The current version on our cluster now (in a sort of soft-launch state) is from a clone from roughly 11:30am ET this morning.

Some quirks, while I’m here: I tested (again) with ExampleHumanImages from the examples page on the website, and the files are different from when I ran it on my local machine. Presumably this is due to floating point shenanigans. In order to actually execute, I needed to add the CreateBatchFiles module to the example pipeline in the GUI interface before it could work (via cellprofiler -c -r -p Batch_data.h5). It does not work when I simply try to execute the pipeline file. Also, I was still able to summon the GUI (via X11), but it does not exit cleanly: the GUI window freezes, and the terminal window shows the following:

INFO:cellprofiler.gui.moduleview:Exiting the pipeline validation thread

but doesn’t actually close until I ctrl+c. Otherwise, it’s pretty functional.