Usage of Batch_data.mat


i have some questions regarding Batch_data.mat.

basically i will just write what i think is happening and it would be nice if you could confirm :smile:

my idea of this file is that it basically contains:

i) the cellprofiler pipeline
ii) the filenames and paths to all the images to be analysed

ad ii): this avoids that loadimages has to walk through all the directories and files to find the imagesets to be analysed (as specified by the -f and -l options)

in fact i am writing this, because we ran CellProfiler on the cluster without using the Batch_data.mat file (which works as well). But it got super slow. And we think the reason was that LoadImages had to retrieve the list of all images for each job that was started, thereby basically killing our filesystem.

we hope now that this is solved by using Batch_data.mat as a input pipeline instead of the .cp file, because Batch_data.mat already contains a list of the files to be analysed.

is this interpretation correct?

Thanks so much,

Hi Tischi,

Your description is a good guess, and indeed the batch file does operate in this fashion :smiley: Both LoadImages and LoadData know what to do in this case, since the image set list already exists. The other item of note is that the pipeline contains the default input/output folders remapped as per your settings in CreateBatchFiles.

Another workaround for your issue is to use LoadData on a csv of your file list, which is much quicker than traversing the file system each time. You can use a simple pipeline with LoadImages and ExportToSpreadsheet to generate this list as an output, and use that for input into LoadData.