Hi everyone
I’ve been experimenting with processing large images, cell by cell, with multiple GPUs. After some discussions on Gitter I’ve implemented a design suggested by @tpietzsch with additional input from @hanslovsky
The idea is to store pending jobs in a BlockingQueue
with capacity numGPUs
. For each GPU a thread is started to monitor the queue. I’ve implemented the monitor loop in a class called GPUQueueMonitor
public class GPUQueueMonitor extends Thread {
private boolean stop;
private BlockingQueue<FutureTask> queue;
private int deviceNum;
public GPUQueueMonitor(BlockingQueue<FutureTask> queue, int deviceNum) {
this.queue = queue;
this.deviceNum = deviceNum;
}
public void run() {
// set device num for this thread
YacuDecuRichardsonLucyWrapper.setDevice(deviceNum);
while (!stop) {
try {
FutureTask job = queue.take();
job.run(); // will "complete" the future
System.out.println("finished decon on GPU " + deviceNum);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
public void setStop(boolean stop) {
this.stop = stop;
}
}
The queue, CellLoader
and DiskCachedImage
are setup as follows
int numGPUs = 4;
// create a queue to hold the GPU jobs, max size of queue is the number of GPUS
BlockingQueue<FutureTask> queue = new ArrayBlockingQueue<FutureTask>(numGPUs);
// create a list of GPU monitors. This class will loop and wait for GPU jobs
ArrayList<GPUQueueMonitor> monitors = new ArrayList<GPUQueueMonitor>();
// for each GPU create and add a GPU monitor
for (int i = 0; i < numGPUs; i++) {
GPUQueueMonitor looper = new GPUQueueMonitor(queue, i + 1);
monitors.add(looper);
looper.start();
}
// create a loader, the loader doesn't do the work, but defines a lambda
// and puts the lambda in the queue
CellLoader loader = (SingleCellArrayImg img) -> {
FutureTask future = new FutureTask(() -> {
deconvolver.compute(imgF, img);
}, "finished");
queue.put(future); // blocks until space is free in queue
future.get(); // blocks until GPU thread has processed the job
};
// create a new image using the disk cache factory. Pass it the loader defined
// above
DiskCachedCellImg<FloatType, RandomAccessibleInterval<FloatType>> out = (DiskCachedCellImg) factory.create(
new long[] { imgF.dimension(0), imgF.dimension(1), imgF.dimension(2) }, new FloatType(), loader,
options().initializeCellsAsDirty(true));
My initial testing indicates this is working. Though more testing is needed, especially to confirm that the setGPUDevice
works as expected with java threads (that is device number is bound to each java thread).
One small issue. It seems that when creating a RandomAccess
to the DiskCachedImg
a synchronized access is done in GuardedStrongRefLoaderRemoverCache
. So the first cell is always processed serially, while the remaining cells are processed in parallel (assuming the loader
is triggered in parallel).
This is a work in progress. If anyone sees flaws or has suggestions let me know.
Thanks