Combine batch files?

Hi folks,

I’ve been experiencing some problems generating batch files for a large data set to be analyzed on our computing cluster. I seem to be able to generate batch files for subsets of the folders of images, but when I inspect those batch files, all of them number the images involved beginning from 1. If I cannot solve the batch file generation problem for very large data sets, can you suggest a way so that I can combine the output from each of the smaller batches in such a way that I can avoid naming collisions for the image numbers?

Thanks,

Lee.

If you specify ranges of image sets on the command line (-f and -l switches), it should give you ranges of image sets in your database and CSV files that match those on the command line. Is this what you’re doing?

Hi Lee,

Yes, this is how I’ve been submitting my jobs to our cluster. But suppose that to cover one screen’s worth of data (say 25000 images), I have two batch files representing the same pipeline but over two image sets partitioned from the screen: the first covers 15000 images, the second covers the remaining 10000. If I inspect both batch files, each numbers the images in their respective batches from 1 to . So setting the -f and -l arguments for the second batch file that represents images 15001 - 25000 will fail, since it sees the same images as numbered from 1 - 10000. Unless I write that output (both .tiff files and .csv files produced by ExportToDatabase) to a different directory and do some post-processing, I don’t see how this could work.

This isn’t such a big issue now that it looks as if I can produce one batch file for the entire screen, but thanks for addressing this all the same :smile:

Lee.

Is there a reason you need to separate them into two separate batches? My default response is to wonder if it’s really necessary, and if so is it a shortcoming we should address in CP, or something external that we need to work around.

Are you using LoadImages or LoadData to load the images? If the former, you will probably have to do post-processing, though we should consider making it possible to start at an arbitrary image number. If you are using LoadData, then combining the two batches together and selecting subsets using the -f/-l switch should solve the problem.

Postprocessing may not be that difficult, depending on what sort of measurements you are taking. If there’s no cross-image measurement (tracking, for instance), then I believe the only measurement that actually cares about the image number is ImageNumber itself (but one should verify). If this is the case, it’s fairly easy to adjust the data afterward.