Background thresholding behaves unexpectedly

I was trying to use the Background Per Object thresholding method in an IdentifyPrimaryObjects module, based on a cropped image using objects from a previous IdentifyPrimaryObjects module. The first identified yeast cells, and the current module was to find the DAPI-stained nuclei inside each cell. There was one cell with a particularly dim and small nucleus that I noticed wasn’t being picked up, despite being a blob of 0.02-0.05 intensities in a sea of 0-0.02. It will only pick it up if I set the threshold modifier to ~0.1.

Wondering why it would do this, I looked into the source code, which I assume is up-to-date (svn.broadinstitute.org/CellProf … reshold.py) in the get_background_threshold procedure and discovered that it wasn’t behaving as documented. The documentation says that it clips above 0.95. However, what it actually does is clip both above 0.98 and below 0.02 by setting them to 0. I couldn’t find the code for the ndimage.histogram function, but I assume it creates exclusive minimum/inclusive maximum bins, so that these “0” values are then discarded as they don’t fall within any of the bins. But my poor dim cell needs those lower values :smile: .

Another part of this code that I wasn’t sure about is how the cutoff was calculated. I would have assumed you would want to find the maximum value of the bin containing the mode, which would be “(index + 1) / nbins” (assuming the histogram is a 0-indexed array). However, the cutoff is currently given by “index / (nbins - 1)”, which returns a value slightly lower. Is this intentional?

I also see that it is limiting itself to 256 bins, which seems to presuppose 8-bit images? I have 12-bit images, so if the threshold determination is only done with 8-bit color resolution, there’s a danger of it over- or under-shooting objects in 12-bit images.

Finally (whew!), I see that the real difference between Background and Robust Background is that the former ostensibly uses “2 * Mode”, while the latter uses “Mean + 2 * SD”. In that case, assuming it is decided to change Background’s cutoff to just be the upper 5%, why not make the lower cutoff in Robust Background be a user-editable parameter?

I’ve attached a paired DIC and DAPI image with the pipeline I’ve been using and a reference JPEG indicating the cell I have been focusing on, in case you want to try it yourselves.

-Nick

EDIT: Corrected wrong variable name.
BackgroundTest.7z (1.39 MB)

Well, first of all, congratulations for making it through my code. I hope it wasn’t too painful.

The code only works if the image intensities are normalized to the range 0-1 (or if almost all pixel intensities fall between .02 and .98). Your case is perfectly valid and should be made to work; the edged image has markedly lower intensities than the original image. I am planning to change the code to calculate the minimum and maximum intensities in the image and cut off below min + .02 * (max-min) and above min + .98 * (max - min). Also, the histogram should be scaled to the minimum and maximum values within the image (+/- the .02 on the edges) in order to capture the widest amount of dynamic range. We’ll change the documentation accordingly.

Regarding the cutoff, the index can range from 0 to 255 (it can’t make it to 256), so 255/255 is the maximum and the subtraction is correct.

I believe that we use 255 bins because there is a tradeoff between number of counts per bin and the number of bins. The choice of bin will have less statistical significance as the number of bins increases, but, as you point out, the possible accuracy is lower with a lower number of bins.

If you want a work-around, you might insert a RescaleImage module between EnhanceEdges and ApplyThreshold which would boost the intensity so that the background method would work properly.

I’ll post here when I’ve modified threshold.py.

–Lee

Thanks for the explanation! Scaling the thresholds sounds like a good idea.

Forgot to post after I updated threshold.py - you should be able to review the changes now. They weren’t merged into the current release, but you’ll pick them up if you’re running from source.

http://crucible.broadinstitute.org/browse/CellProfiler/trunk/CellProfiler/cellprofiler/cpmath/threshold.py?r=10248

Took me a bit to get the run-from-source working. That change does help, but it didn’t entirely fix it because it still sets an arbitrary minimum cutoff at ~0.02. (If cutoff = 0, (0 + 0.02) / 1.04 = 0.19).

Additionally, I believe I was right about obtaining the cutoff from the histogram. The ndimage.histogram(array, min, max, nbins) function generates a 0-indexed array containing nbins elements, with the minimum value for each bin given by the formula “(index / nbins) * (max - min) + min”. For ndimage.histogram(array, 0, 1, 256), Bin 0 contains values from 0 to (but not including) 0.00390625 and Bin 255 contains 0.99609375 to (but not including) 1.0. The only remaining decision is whether it is better to go with a value at or below the mode (index) or a value above the mode (index + 1).

I believe the following revision of the get_background_threshold code works using the latter choice and uses robust_min as the minimum cutoff:

[code]def get_background_threshold(image, mask = None):
""“Get threshold based on the mode of the image
The threshold is calculated by calculating the mode and multiplying by
2 (an arbitrary empirical factor). The user will presumably adjust the
multiplication factor as needed.”""
cropped_image = np.array(image.flat) if mask is None else image[mask]
if np.product(cropped_image.shape)==0:
return 0
img_min = np.min(cropped_image)
img_max = np.max(cropped_image)
if img_min == img_max:
return cropped_image[0]

# Only do the histogram between values a bit removed from saturation
robust_min = 0.02 * (img_max - img_min) + img_min
robust_max = 0.98 * (img_max - img_min) + img_min
nbins = 256
cropped_image = cropped_image[np.logical_and(cropped_image > robust_min,
                                             cropped_image < robust_max)]
if len(cropped_image) == 0:
    return robust_min

h = scipy.ndimage.histogram(cropped_image, robust_min, robust_max, nbins)
index = np.argmax(h)
cutoff = float(index + 1) / float(nbins) * (robust_max - robust_min) + robust_min

return cutoff * 2[/code]

-Nick