CellProfiler install from source and run headless to solve batch processing memory errors

Hi CellProfiler Users,

I want to run CP headless from the command line so I can run one job at a time. Accumulating memory from batch jobs is causing CP to fail even on boxes with 128 GB of RAM (one worker or many yields same failure). Running one job at a time will allow memory to be cleared before starting the next job. Essentially this is the same strategy for running on a cluster.

It has been challenging to install a working version of CP using the recipes located here: https://github.com/CellProfiler/CellProfiler/wiki

I went through all the different instructions, windows, mac and various flavors of ubuntu, but was unable to get it working. Since this original post I handed this off to someone more capable then myself and now have a working version on Ubuntu. I stand by my original post; the instructions for ubuntu 9, 14 and 16 don’t work as stated. I reattempted the brew version for mac yesterday. It fails during the CP install for something related to wxpython. I didn’t retry the Conda version for windows.

What did finally work were the Ubuntu 16.04 commands on a clean install of Ubuntu 18.04 LTS on an amazon ec2 instance.

These are the commands used for install:

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install build-essential cython git libmysqlclient-dev libhdf5-dev libxml2-dev libxslt1-dev openjdk-8-jdk python-dev python-pip python-h5py python-matplotlib python-mysqldb python-scipy python-numpy python-pytest python-vigra python-wxgtk3.0 python-zmq

git clone https://github.com/CellProfiler/CellProfiler.git
cd CellProfiler
git checkout stable
pip install --user --editable . --process-dependency-links
python -m site --user-base
export PATH="$HOME/.local/bin:$PATH"

This installs CP version 2.2. Version 3 is not available for this type of setup as far as I am aware (very sad; eagerly awaiting version 4… see CellProfiler 4.0.0 timeline and roadmap). I use CP GUI version 2 to make a pipeline text file (not a project) on different system and then copy that file to the ubuntu instance. The pipeline looks the same as if it was going to be run with the GUI version. The image files are also copied to ubuntu and a csv list of the image files is created for CP to reference.

This is a screenshot of the image list .csv file (open in TextEdit as “plain text” not “Rich text”; excel works as well: save as “comma separated values” file, not UTF-8 csv). Note there is no header row and the full path to the image is stated:

then a command like this can be run which uses the “file-list” functionality:

python cellprofiler -r -c -p pipelinePath --file-list=csvPath --jvm-heap-size 32000m -o outputPath

The argument “–jvm-heap-size” takes care of memory issues for the large whole slide images I want to process (hence the original memory issues). To run a batch of images I have a python script to make an output folder and an image file list for each set of images and generate a command like the above specific for each image set. I output the results into spreadsheets and after the batch is done I have another program to consolidate the results from many image sets into one folder and spreadsheet.

Enjoy! CP version 2 doesn’t have all the same functionality as version 3, but does work and is worth the effort.

Can you run CP in python script now or have you found other ways to run CP in command form? Can you share?I met the same difficulty as you, just don’t know how to do!

Yes, I did figure it out. See above.

1 Like

Thank you very much! It helps me a lot!