Struggling to implement LoadData in headless

New to CellProfiler but really enjoying it. I’m using it with worms, and the WormToolbox is running well.

I’ve been using the GUI v4.1.3 on macOS and it’s been working flawlessly. I am now working on implementing the headless version to run it at our high-throughput computing center.

For testing purposes, I created a simple pipeline that includes LoadData, ImageMath, and SaveImages. I created a temporary directory on my Desktop and adjusted the default output and input in the pipeline to /Users/njwheeler/Desktop/temp_root/input/ and /Users/njwheeler/Desktop/temp_root/output/. In CreateBatchFiles, I set the local root path to /Users/njwheeler/Desktop/temp_root/ and the cluster root path to /. In LoadData, the module is directed to the Default Input Folder to find the CSV and the images. I’ve attached the CSV.

On the HTC, we will use Docker and I am testing the image on my local computer. I’ve attached the Docker file and conda_env.yml. The cellprofiler command works when I run it in the Docker image, but I can’t seem to load the actual images with the LoadData module. The command I run is cellprofiler -c -r -p pipelines/Batch_data_minimal.h5 -L 10. I’ve attached the entirety of the error message.

At first I thought maybe my CSV was improperly formatted or I had an issue with the path translation between local/batch, and I suppose that still could be the case. However, in the log there is a line that says: Getting image reader for: RawImage, None, file:/input/20210521-p01-KJG_A01.TIF. That image exists at that path, so it suggests that it’s finding the image but unable to read it. RawImage is the image name (the header of the CSV is Image_FileName_RawImage). I’m not sure what None refers to, perhaps that is part of my problem.

My Docker setup could be wrong, but I got the conda env directly from the CP GitHub.

I’m at a loss. Let me know if there is any missing information.

Batch_data_full.h5.zip (29.5 KB)
images_metadata_small.csv (449 Bytes)
conda_env.yml (494 Bytes)
Dockerfile.zip (768 Bytes)
error.txt (6.2 KB)

Sorry to hear you’re running into issues! Your Dockerfile and conda_env seem reasonable, though since we build our Docker without conda I can’t say for certain that there isn’t an issue there.

I’m not actually entirely certain that LoadData and CreateBatchFiles play nicely together- at least, I can’t recall ever having used them together before. CreateBatchFiles was really designed for situations when you don’t have a CSV, and instead a dragged-and-dropped set of images.

Is there a reason to make a batch file, rather than just pass a CSV with container-correct-paths in the Image_PathName_RawImage column into the container? If that works, that might avoid the issue, though on our side we should probably dig into it regardless to confirm if those two modules can’t be used together ever.

Thanks so much for your response!

I must have missed part of the docs, but I thought a batch file was required to run headless. Can I pass a .cpproj or .cppipe instead of the .h5?

You’ll want to use a cppipe instead of a cpproj, but yes, totally fine to pass in a pipeline rather than a batch file!

In case you haven’t seen it, here’s a link to our guide to getting started with headless CellProfiler; it has further resources such as a YouTube demo linked within.

Thanks for that. I’ve been primarily referencing these docs and didn’t know about the GitHub Wiki. Thanks for directing me there, hopefully I can figure it out from here!

Again, thanks for your help.

Ok, so I’m still not figuring it out.

I’ve tried all the ways you outlined in your video. I created a simple pipeline that inverts the images and then saves them. To load the images, I’ve tried the following:

  1. Populate Images, Metadata, and NamesAndTypes with the local data I want, export the image set listing, and edit the CSV to contain the container-correct-paths. Add the LoadData module, save the .cppipe, and run the .cppipe with --data-file directing the program to the CSV.
  2. Populate Images, Metadata, and NamesAndTypes with the data I want, add CreateBatchFiles and adjust the mappings from the local root to the container root. Create the .h5 and run it in the container.

With either approach I get the essentially the same error while loading the first image. In approach #1, it’s Error detected during run of module LoadData; while it’s Error detected during run of module NamesAndTypes. As I said in the first post, I am fairly certain the paths are correct given the line in the log: Getting image reader for: RawImage, None, file:/input/20210521-p01-KJG_A01.TIF, which is the correct path in the container.

I’m going to try using your Dockerfile…

So that intermediate “None” is in what’s supposed to be the path- can you share your edited CSV? Do you have both FileName and PathName columns, and do both have some information in them?

Yes, each column is populated. CSV and the pipeline are attached.

images_metadata_small.csv (1.4 KB)
batch_project.cppipe (10.2 KB)