Hello there, I have posted this question to other stats forums and haven’t got any responses, so I figured I would post here. Even if you could point me in the right direction or tell me I’m completely wrong, I would appreciate it immensely.
I have a repository full of old macro images of nuclear fuel rod pellets at varying points in an experiment. My goal is to do a particle analysis on these images and take the total count from the analysis to compute porosity of a pellet. This is only a proof-of-concept so there are some assumptions at play. Currently, what I am doing is getting the count and transforming it through this equation:
Where area is the size of the image I have taken. The problem with these old images, is that they have been stitched together and there is significant white-space (that shouldn’t be there) throughout the photograph. So I cannot possibly run a
analyze particles on the image at once.
I have thus, resorted to randomly sampling from decent portions of the image to get an idea of what the general
count/area for the image. This is where my problem complicates.
I have narrowed my thresholding method down to the local methods, Phansalkar & Sauvola. I have noticed that Sauvola gives me, on average, more conservative (read lower) counts; while Phansalkar gives me more outrageous counts. I have also tried playing around with
parameter 2 of both methods. Each generating varying results.
Currently, I have a script that follows this algorithm, generating me about 3500 observations for 1 macro image:
- Randomly sample image from a specified
(x,y)bound on the photograph - save image
- Threshold the image at default Phansalkar settings
- Analyze particles - save results
- Load non-thresholded sample image
- Threshold image at default Sauvola settings
- Analyze particles - save results
- Repeat process the same image at varying parameter settings for both Phansalkar & Sauvola settings:
parameter 2 = [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8] & [125, 126, 128]respectively.
- Repeat for 100 different sample images within
(x,y)bound keeping area constant.
- Repeat for 5 different
This process is just for 1 macro photo. I have several similar photos at varying time periods of an experiment. So I’m forecasting, when all said, and done I will have around 1 million observations (Although the observations will be dependent since I am using the sample image multiple times).
Currently, my distribution of counts looks something like this:
Where the counts are bimodal, centering around ~18 & ~34 particles per sample image.
My goal is to determine the most likely count of so that I can compute particle density and use credible intervals (or confidence intervals) on regression at some point in the future.
I am not a much of an image-processor, and I am only a statistician in training. My gut tells me that we need to model, not only the uncertainty of the particle count and different positions in the photo (representing different positions of the fuel pellet) but also the uncertainty of the measurements given by ImageJ.
Am I taking this too far? Should I just pick a thresholding parameter and stick with it? I could just average the counts and get a point estimate, but I need a way to convey uncertainty about that estimate.
My idea was to use a Poisson or Negative Binomial regression model to control for position, parameter, and method, but I am so steeped in observations that I feel I have lost sight of what I’m really trying to do. Which is to determine most likely count of particles at any given place of the fuel pellet.
I would really appreciate any insight, advice, or industry standards here. I’m sure this problem (uncertainty in thresholding) has been thought about before.