CellProfiler 4 csv --data-file format

hi all

I’m trying to run CP4 from command line, I have a docker container that works well and I’m passing a pipeline file (attached) that (should) load tiff files using the LoadData module.
the code I use is:

cellprofiler --run \
--run-headless \
--pipeline plugins/segmentation_load_data.cppipe \
--data-file list_files.csv \
--plugins-directory plugins/cp_plugins \
--output-directory results/CP4_test_res \
--log-level DEBUG

the csv file is attached too.

basically markers are analyzed separately and I need to call each marker at different points in the pipeline.

I used the Image_Filename_xxx format, where xxx is the name of the marker like Image_Filename_Pankeratin or Image_Filename_CD8a to indicate Pankeratin or CD8a markers, respectively.

then I used 0 and 1 to indicate what marker is what and the GUI version actually does pass the correct name to the correct module.

however when I run it from command line, what I get is the following error:
FileNotFoundError: [Errno 2] The file, /mnt/lustre/users/SBata/0 , does not exist.

so it seems to me that CP is not using the 0s and 1s to select markers, but it’s taking it as a path and looking for that folder.

I looked at a few posts here but I could not find any answer.
any help would be great.


segmentation_load_data.cppipe (61.9 KB)
list_files.csv (1.3 KB)

Yeah, that’s unfortunately not how the LoadData CSV is designed to be used, I’m really surprised it works in GUI mode!

CellProfiler definitely expects the FileName columns to be paths, so that’s how your data should be structured- in general, one row in the CSV should equal one image set.

Sorry for the inconvenience!

I did not run the GUI mode, I only saw that it recognizes the marker’s name after the second _ sign

so, ok, I get that it needs a full path in the FileName, but you mention that each row is an image set. here I put the image…do you mean that each row should point to a folder with all images like in an ROI?


then how do I specify the individual images?..sorry if it’s a basic question I can’t get it to work.

I mean, if your pipeline loads a dozen images (presumably different channels ofb the same field) throughout the course of the pipeline, your CSV should have a dozen FileName (and optionally, a dozen PathName) columns, aka FileName_Stain1, FileName_Stain2, etc. All the channels from one field should be in one row. The next row has all the channels of the next field, etc etc.

Is that more clear?