Real time processing

Hi, further to the question I asked a couple of days ago, I’ve realised that saving an image stack file and analysing after the experiment will be unnecessarily cumbersome as I need >>100,000 images per experiment. When I’ve tried this experiment I’ve run into problems with the circular buffer filling which crashes the acquisition. I had set the circular buffer size to 10GB, but it didn’t seem to allocate this much memory as it crashed before reaching this size, despite the computer having 32GB of RAM.

To reduce the memory footprint, I want to be able to collect images (roughly 64x64 16-bit px) at ~1600 fps and process the images to find the centre of mass. I only need to record the centre of mass and precise time (which can be calculated as Nico explained in my previous question), so the image can be discarded after the centre of mass calculation. ImageJ can calculate the centre of mass with the Measure tool, but I’ve not been able to find the source code which performs this action. The closest I’ve come is Analyzer.java.

Is it possible to process the images from the circular buffer and then discard them? How can I ensure it happens fast enough that the buffer does not overflow? Else how can I troubleshoot the buffer size not allocating properly?

Hi Will,

To analyse in Java, you need to remove the images from the circular buffer. For your case you likely just need a beanshell script, which will be able to use ImageJ’s functions and plugins. Have you tried the record function?

Secondly, did you make sure that ImageJ also has the memory set to >10GB? The circular buffer data will transfer from C++ land to Java land and become ImageJ memory usage so ImageJ also needs to have that set correctly.

Hope this helps,
Pedro

It would be a bit of work, but if you’re in MM2 then you could create a pipeline plugin that would run in parallel with acquisition. Essentially still stream the images to disk normally, and the pipeline reads and processes images from the data store whenever it can.

Thanks for your answer. I did ensure the Java memory was large enough (ImageJ -> Edit -> Options -> Memory and threads) - 10GB for this, and also 10GB for the circular buffer/sequence buffer (to use the term from the micro-manager documentation). The total system memory is 32GB.

During the acquisition, I have seen the memory usage for the circular buffer increasing linearly with time, which suggests it is not removing the images from the buffer. Probably because the high acquisition/data rate (1600 fps with a 16-bit camera, 64x64 ROI means ~13MB/s).

If I am using a beanshell script, Micro-manager will still need to run the acquisition and copy the images from C++ land to Java land, right? This means I will need to fix my acquisition problem before I move on?

I have not used the record function - is this to speed up the process of making ImageJ macros?

I’m in the lab right now, trying again with this acquisition. I ensured that both the ImageJ memory and the sequence buffer memory were set to 10GB, however it filled the sequence buffer within 30,000 images. The preview window says each image is 7kB, so I calculate that the sequence buffer has only been allocated ~200MB. Now the acquisition is happening very slowly with the sequence buffer usage close to 100%. I’ve attached two pictures with both sequence buffer monitor and ImageJ memory monitor visible, one from early in the acquisition while it was running at the correct speed, and one from later while the acquisition runs very slowly.

After I took the second screenshot, I checked the system memory usage, which was
11GB. It is now up to 13GB, but I would expect Micro-Manager to request memory for the whole sequence buffer before starting an acquisition, which means it is definitely not using the 10GB I have told it to.

Should I submit an issue on github relating to Micro-Manager not allocating the sequence buffer size I have told it to?

Hi @Will_Hardiman,

I am not absolutely sure what is happening, but I suspect that you are bitten by the metadata handling in MM. I ran into related issues when setting up a system to acquire images at 1kfps with a Ximea camera, and I do remember changing the display in 2.0 so that it no longer tries to show every image, but only the most recent one (which made for a significant UI improvement).
One good test is to reproduce the problem with the DemoCamera. You can set the size and exposure time to limit the one you have, and then to make sure that generating the images does not take CPU time, set the “FastImage” property to “1”. I suspect that you will see the same outcome but if that is not the case, then the root problem lies in your camera’s device adapter code.

I can not remember exactly how metadata is happening in the source code (@marktsuchida, if you recall, can you explain?), but I do know that the metadata that comes out of the core contains the complete state of the system, so somewhere on the way, a complete copy of the cached state of the system is inserted (which adds a lot of overhead that is detrimental to your experiment). The best solution is likely to run the whole thing in a script, something along the lines of:

n = 1000;
mmc.startSequenceAcquisition(n, 0.0, true);
for (int i = 0; i < n && mmc.isSequenceRunning(); i++) {
	while (mmc.getRemainingImageCount() == 0) {;}
	img = mmc.popNextImage();
	// get centroid here and log
}
if (mmc.isSequenceRunning()) {
	mm.scripter().println("Something not right");
} else {
	mm.scripter()..println("Done");
}

Acquisition speed

Thanks for your answer, very helpful! It seems you were correct about metadata - I had the same problem using multi-D acquisition with the test camera, however by using the script panel I am able to acquire at full speed using either the test camera or my actual camera.

For processing the images, I think I will have to learn to use the API through MATLAB as I have a script that can process about 3000x faster than what I wrote in beanshell. In beanshell, I have an object “mmc” which is the MicroManager Core, and an object “mm” which is the MicroManager Studio. In MATLAB I can do

import mmcorej.*;
mmc = CMMCore;

to access the Core. How do I create the MicroManager Studio object? I believe I need it to access the setROI function.

Sequence buffer

Now that I can acquire the images, I still have the problem of the sequence buffer not allocating properly. I use the sequence buffer monitor tool to see how many frames there is space for, and I have tested acquisitions which run past this number of frames to check that it does halt the acquisition when the sequence buffer monitor tool says the buffer is full.

With an ROI of 64x64px (8kB per frame), I have tried setting the sequence buffer to 10GB, 20GB, or 1GB but each time it allocates enough for 100,000 frames. If I set it to 100MB, it allocates enough for 12800 frames. When I reset the ROI to full field of view (1920x1080px, 8MB per frame), it allocates an amount of memory in the sequence buffer linearly up to 20GB (e.g.: 10GB shows space for 2528 frames, 20GB shows space for 5056 frames).

By my reckoning, the memory required for the number of frames it says it can fit is less than the memory I tell it to allocate. However it does seem to allocate based on my setting up to a maximum of 100,000 frames, where there is a bug preventing it from either allocating more or using the memory correctly.

Should I raise an issue on GitHub regarding the sequence buffer?

Can you share the Beanshell code you used to find the center of mass? There is no reason that Java code would be slower than Matlab (in fact, it is usually the other way around), but I can imagine that iterating over pixels in Beanshell is slow. Happy to look if there is a faster alternative that is already on your classpath (you have the ImageJ, imglib2, and BoofCV image processing libraries available). Going the Matlab route will make things difficult (based on questions by others), and impossible to share with non-Matlab users.

As for the Circular Buffer size, it ooks like you found a hardcoded limit defined in MMCore/CircularBuffer.cpp:

const unsigned long maxCBSize = 100000;    //a reasonable limit to circular buffer size

// set a reasonable limit to circular buffer capacity 
if (cbSize > maxCBSize)
     cbSize = maxCBSize;

I am not sure what the reasons behind this limit were. It will indeed be great to open a ticket on github. Changing this number (or removing it) is easy enough, the main question is whether or not there should be a limit, and if so, what that limit should be.

Thanks for the assistance! I’ve opened a github issue regarding the circular buffer, hopefully I’ve worded it in such a way to be persuasive and detailed.

The code I wrote to find centre of mass is below. It’s fairly similar to the code used by ShortStatistics within ImageJ, but as you say, iterating over pixels is slow. I’ve not tested to see if ImageJ’s implementation is faster though.

XSum = 0;
YSum = 0;
PSum = 0;
for (int row = 0; row < Size[1]; row++)
{
	for (int col = 0; col < Size[0]; col++)
	{
		XSum += row * img[row * Size[0] + col];
		YSum += col * img[row * Size[0] + col];
		PSum += img[row * Size[0] + col];
	}
}
Centres[i][0] = XSum/PSum;
Centres[i][1] = YSum/PSum;

You say the imglib2 and BoofCV libraries are available in my classpath - how do I access them?

Also, I’m going to need to write out the processed data, and I’ve not yet found a way to do so. Should I look at using beanshell natively, or a function from ImageJ/elsewhere on the classpath?

Edit: I’ve found how to save an object using a FileOutputStream, but I would need to convert my float array to a byte array.

Edit2: I think I’ve sorted writing the (now double instead of float) array to disk and loading it in MATLAB.

Iterating in Beanshell is slow, since Beanshell itself is doing a lot of work rather than using optimized compiled code. I would expect it to be much faster to call compiled code rather than running the code itself in Beanshell. I wrote the following script that calls the ImageJ ShortStatistics class. It should be orders of magnitude faster than the Beanshell code:

import ij.process.ShortStatistics;
import ij.process.ImageStatistics;


dv = mm.displays().getActiveDataViewer();
ds = dv.getDatastore();
cb = mm.data().getCoordsBuilder();
coord = cb.c(0).t(0).p(0).z(0).build();
// taggedImg > img, use: img = mm.data().convertTaggedImage(taggedImg);
img = ds.getImage(coord);
ip = mm.data().ij().createProcessor(img);
ss = new ShortStatistics(ip, ij.process.ImageStatistics.CENTROID, null);
mm.scripter().message("Centroid. x: " + ss.xCentroid + ", y: " + ss.yCentroid);

Opening a file and writing to it from Beanshell can be done with boilerplate Java.

That makes sense, I’ve adapted what you wrote to fit and now it can process each frame in ~0.8ms, a 40x improvement. It’s quick enough that it processes at ~70% of acquisition speed. I wasn’t able to make it work using the datastore, but I don’t want to save the images, just the centre coordinates.

Would there be a significant speed improvement if I were to use popNextImage instead of popNextTaggedImage? I tried to use it, but didn’t have the metadata needed to create the Image object.

// Start by finding and entering bead location in px
//                { X  , Y   };
Centre = new int[]{ 960, 470 };
n = 1600; // number of frames
Size = new int[]{ 64, 64};

// Import processors for finding CofM
import ij.process.ShortStatistics;
import ij.process.ImageStatistics;

// Allocate some variables
XCentres = new double[n];
YCentres = new double[n];
// Set the ROI
java.awt.Rectangle ROI = new Rectangle(Centre[0]-Size[0]/2, Centre[1]-Size[1]/2, Size[0], Size[1]);
mm.setROI(ROI);
// Set exposure
mmc.setExposure(0.2);
// Start acquiring and process frames as they arrive
startTime = System.nanoTime();
mmc.startSequenceAcquisition(n, 0.0, true);
for (int i = 0; i < n; i++) {
	while (mmc.getRemainingImageCount() == 0) {;} // While there are no frames, wait
	if (i == 0 || i == n/2) {print("T = " + (System.nanoTime() - startTime)/1e6 + "ms");}
	tagged = mmc.popNextTaggedImage();
   img = mm.data().convertTaggedImage(tagged);
	ip = mm.data().ij().createProcessor(img);
	ss = new ShortStatistics(ip, ij.process.ImageStatistics.CENTER_OF_MASS, null);
	XCentres[i] = ss.xCenterOfMass;
	YCentres[i] = ss.yCenterOfMass;
}
print("T = " + (System.nanoTime() - startTime)/1e6 + "ms");
print(mmc.getRemainingImageCount());
mmc.stopSequenceAcquisition();
if (mmc.isSequenceRunning()) {
	print("Something not right");
} else {
	print("Done");
}

Great!

Yes, the datastore was only there as an easy way for me to get an image into the code.

I do believe that you will get some speed up using popNextImage() rather than popNextTaggedImage(). The trick then becomes how to convert that pixel buffer into an ImageJ ImageProcessor. If you know that the processor is a ShortProcessor, that is actually quite easy. The following runs for me, and I am curious what timings you get in your system:

// Import processors for finding CofM
import ij.process.ShortStatistics;
import ij.process.ImageStatistics;
import ij.process.ShortProcessor;

// Allocate some variables
XCentres = new double[n];
YCentres = new double[n];

// Set exposure
mmc.setExposure(0.2);
// Start acquiring and process frames as they arrive
startTime = System.nanoTime();
mmc.startSequenceAcquisition(n, 0.0, true);
int i = 0;
int width = mmc.getImageWidth();
int height = mmc.getImageHeight();
ip = new ShortProcessor(width, height);
while (mmc.getRemainingImageCount() > 0 || mmc.isSequenceRunning(mmc.getCameraDevice())) {
   if (mmc.getRemainingImageCount() > 0) {
		//if (i == 0 || i == n/2) {print("T = " + (System.nanoTime() - startTime)/1e6 + "ms");}
		pixels = mmc.popNextImage();
		ip.setPixels(pixels);
		ss = new ShortStatistics(ip, ij.process.ImageStatistics.CENTER_OF_MASS, null);
		XCentres[i] = ss.xCenterOfMass;
		YCentres[i] = ss.yCenterOfMass;
		i++;
   }
}
mm.scripter().message("Duration: " + (System.nanoTime() - startTime)/1e6 + "ms");
print(mmc.getRemainingImageCount());
mmc.stopSequenceAcquisition();
mm.scripter().message("Done");

“Some speed up” - it now processes quicker than it acquires, meaning the buffer never fills. Running the processing at the same time as acquisition takes 60.9s for 100k images, or if I acquire the images and then process, it takes 60.9s to acquire and 9.5s to process.

I’ve realised it would be useful for me to display an image every few seconds so I can see that I still have a bead held in the optical trap. I’ve implemented this using a RAM datastore, and scripted it to automatically save the datastore at the end of the acquisition. This hasn’t impacted the time it takes to run a 100k image acquisition.

If you wish to include my script with the others in the github repo or on micro-manager.org, please do. Thanks again for all the help!

Edit: I realised I forgot to calculate timings. I’ve used the fact that I am taking tagged images to record a time for the first image, and the last tagged image and store the average time per frame across this interval. It’s not perfect but it’s a good start.

/**
 * Using sequence acquisition and real-time processing to track the location of a particle
 */
// You need to manually create a folder, then write the path here
folderPath = "E:/Will/beads/2020_11_09_2um/";
// Find and enter bead location in px
//                { X  , Y   };
Centre = new int[]{ 960, 470 };
int n = 10001; // number of frames
Size = new int[]{ 64, 64}; // Image size in px - choosing more than 64 rows will slow acquisition

// Import processors for finding CofM
import ij.process.ShortStatistics;
import ij.process.ImageStatistics;
import ij.process.ShortProcessor;

// Define two functions for saving the data
byte[] doubleToByteArray( double[] i, int len )
{ // Converts double array to byte array. Needs to be given the length of the array
 ByteArrayOutputStream bos = new ByteArrayOutputStream();
 DataOutputStream dos = new DataOutputStream(bos);
 for (int idx = 0; idx < len; idx++){
 	dos.writeDouble(i[idx]);
 }
 dos.flush();
 return bos.toByteArray();
}

void writeDoubleArrayToFile(double[] array, int len, String path)
{ // Writes double array to file as byte array. Needs to be given the length of the array
file = new File(path);
fos = new FileOutputStream(file);
if (!file.exists()){
	file.createNewFile();
}
bytesArray = doubleToByteArray(array, len);
fos.write(bytesArray);
fos.flush();
fos.close();
}
// Allocate some variables
double XSum, YSum, PSum;
XCentres = new double[n];
YCentres = new double[n];
Times = new double[n];
int i = 0;
int iSave = 0;
int saveInterval = 10000; // How often to save an image (in frames)
double firstTime = 0, lastTime = 0;
java.awt.Rectangle ROI = new Rectangle(Centre[0]-Size[0]/2, Centre[1]-Size[1]/2, Size[0], Size[1]);
// Check saveInterval is sensible compared to n
if (saveInterval >= n){
	print("number of frames, n = " + n + " saveInterval = " + saveInterval);
	throw new Exception("saveInterval must be less than number of frames otherwise timing calculation doesn't work");
}
// Set the ROI and check it has been set
mm.setROI(ROI);
int width = mmc.getImageWidth();
int height = mmc.getImageHeight();
print("w = " + width + " h = " + height);
// Set exposure
mmc.setExposure(0.2);
// Prepare an image processor
ip = new ShortProcessor(width, height);
// Create a Datastore for the images to be stored in, in RAM.
store = mm.data().createRAMDatastore();
saveMode = org.micromanager.data.Datastore.SaveMode.valueOf("MULTIPAGE_TIFF");
// Create a display to show images as they are acquired.
mm.displays().createDisplay(store);
// Set up a Coords.CoordsBuilder for applying coordinates to tagged images.
builder = mm.data().getCoordsBuilder().z(0).channel(0).stagePosition(0);
// Start acquiring and process frames as they arrive
mmc.stopSequenceAcquisition(); // Stop previous acquisition in case it is running
startTime = System.nanoTime();
mmc.startSequenceAcquisition(n, 0.0, true);
print("T = " + (System.nanoTime() - startTime)/1e6 + "ms");
img = 0;
while (mmc.getRemainingImageCount() > 0 || mmc.isSequenceRunning(mmc.getCameraDevice())) {
   if (mmc.getRemainingImageCount() > 0) 
   {
   	pixels = new short[64*64];
   	// Get the next frame and calculate centre of mass
   	if (i%saveInterval == 0)
   	{ // Every [saveInterval] frames, get the metadata and display the image
   		img = mmc.popNextTaggedImage();
   		image = mm.data().convertTaggedImage(img,
         	builder.time(iSave).build(), null);
      	store.putImage(image);
      	pixels = image.getRawPixels();
      	iSave++;
      	if (i == 0) {firstTime = img.tags.getDouble("ElapsedTime-ms");}
   	} else
   	{ // The rest of the time, just get the pixel data
			pixels = mmc.popNextImage();
   	}
		ip.setPixels(pixels);
		ss = new ShortStatistics(ip, ij.process.ImageStatistics.CENTER_OF_MASS, null);
		XCentres[i] = ss.xCenterOfMass;
		YCentres[i] = ss.yCenterOfMass;
		if (i%saveInterval == 0) // Every [saveInterval] frames, print this frame's info
		{
			print("i " + i + " X = " + XCentres[i] + " Y = " + YCentres[i]); 
			print("Remaining: " + mmc.getRemainingImageCount());
			print("Acquired: " + (i + mmc.getRemainingImageCount()));
			print("T = " + (System.nanoTime() - startTime)/1e6 + "ms");
		}
		i++;
   }
}
lastTime = img.tags.getDouble("ElapsedTime-ms");
if (iSave > 1) {
	frameTime = (lastTime - firstTime)/((iSave-1)*saveInterval);
	for (int i = 0; i < n; i++){
		Times[i] = i * frameTime;}
}
print("Completed at " + (System.nanoTime() - startTime)/1e9 + "s");
mmc.stopSequenceAcquisition();
writeDoubleArrayToFile(XCentres, n, folderPath + "XCentres.dat");
writeDoubleArrayToFile(YCentres, n, folderPath + "YCentres.dat");
writeDoubleArrayToFile(Times, n, folderPath + "Times.dat");
store.save(saveMode, folderPath+"images_and_metadata");
1 Like

Awesome!

I think this clearly demonstrates that MM is very well capable of going > 1 kfps as long as metadata are discarded (i.e. metadata handling probably takes ~1ms per image). We clearly have work to do on optimizing metadata processing.

For now, having your script here is good enough. As long as you do not mind, I may add it to the repository. Also thinking about the upcoming workshop “Scripting Micro-Manager”. This could be a very cool example. Do you have a data set somewhere that I could use for that purpose (does not have to be all 100,000;)?

You’re welcome to use my script as you see fit - it only works because of the help I’ve received from you, and if I can give back to the community it’s good to do so.

I do have such a dataset, this one has ~25k images, as well as the metadata as I saved it a few weeks ago using Multi-D Acquisition. The pixel size has saved wrongly, it should be 0.07um/px. Also, the timing data is subject to the points you raised here.

I’m uploading the data to my Google Drive, it should be available once it’s ready: https://drive.google.com/drive/folders/1kDiJIGjDvHf3YdO-oVhMQLAvLQRS_LJL?usp=sharing

I’ve now acquired a few datasets using this script, and I’m looking at the position data compared with the few frames (1 in 10k) which I keep. The centre of mass function from ImageJ just doesn’t find the centres accurately.

I can recreate this behaviour with my own centre of mass algorithm in MATLAB, and I find that if I cast to double, normalize then square the pixel values, the algorithm is accurate. That this simple background suppression works suggests that the centre of mass algorithm from ImageJ is being skewed by the background.

To overcome this, I could use phase contrast microscopy with my existing code, but I don’t think I have phase rings for the microscope. Instead, I intend to use the thresholding/masking which is built into the ImageProcessor/ImageStatistics classes. Do you have any advice on how to use these?

If the method you describe works, it sounds simplest to code this up in Java (possible as a static method), and get it available on the classpath by adding it as a jar to the ImageJ plugins directory (ImageJ load jars in this directory at runtime and puts them on the classpath, there may be specific rules that it checks for, so look into what is needed for an ImageJ plugin). Or you can try the route you describe (I would need to dig into the particulars, just like you).

I decided to start with the thresholding method because I’m not confident to write Java. I’ve written the below, but it doesn’t seem to do any different to if I don’t set the threshold.

The method is: get the pixel values, set the imageprocessor to the new pixels, set the imageprocessor to auto threshold, then make a shortstatistics to find the centre of mass. The imageprocessor seems to be able to calculate threshold limits, but these are not passed through to the shortstatistics. I think the cause is that the imageprocessor fails to return a mask array - instead returning null.

I’m going to make a general ImageJ post to try and get help from someone with specific ImageJ experience as that seems to be where the issue lies. Thanks for all your help thus far!

pixels = mmc.popNextImage();
ip.setPixels(pixels);
ip.setAutoThreshold("Otsu dark");
ss = new ShortStatistics(ip, ij.process.ImageStatistics.CENTER_OF_MASS, null);

a = ip.getMaskArray();
print(a); // null
print("IP thresh " + ip.minThreshold + " " ip.maxThreshold); // Prints limits with sensible values
print("SS thresh " + ss.lowerThreshold + " " + ss.upperThreshold); // Prints NaN for both limits

Hi Will,

I am very much with you on not wishing to deal with the complexities of Java image processing libraries (including ImageJ). Since you know what you want to do, but writing loops in Beanshell is too slow, I would write your analysis directly in Java, have it include on the classpath (by placing it into the ImageJ plugin directory), and then call that function from Beanshell.

So, the function could look something like:

package org.will.hardiman

public class SimpleImageMath {
   static Point2D.Double getCentroid(Object pixels, int width, int height) {
      // do your calculations with pixels here 
   }
}

and then in Beanshell do:

import org.will.hardiman.SimpleImageMath;

centroid = getCentroid(pixels, width, height);

That way you can do the exact calculations that you know to work, and by doing them in Java rather than Beanshell it should be very fast.