Meaning of distance_threshold variable in PixelSpotDecoder

When looking at the PixelSpotDecoder example (essentially processing MERFISH data), how was the distance threshold value calculated? I would have expected it to be slightly above 1, since a code with one error would have a distance of 1 from a known codeword, and should be assignable to that code. But a threshold of 0.5176 would exclude a code with a single error.

Hi, great question! I believe a heuristic method was used to determine the threshold value in this example. For every spot, the distance to nearest codeword was calculated and then the distances were plotted in a histogram and a threshold was chosen. Then the threshold was validated by comparing to results from the MERFISH group.

Your reasoning makes sense and if you have some MERFISH data you can try it. But I think the reason why it may end up being less than 1 is because before measuring Euclidian distance from the spot intensity vector to codeword vector, they are normalized (see the norm_order parameter). Combined with the fact that an error can still have background signal in the correct fluorescence channel (e.g. equal autofluorescence in all channels), and it might be necessary to make the threshold more stringent.

Thank you for your response Matt! I have found that if I increase the threshold the number of identified spots increase by an order of magnitude, while also increasing the number of “blank” barcodes identified.

A side question, what does the “ConnectedComponentDecodingResult” output of the PixelSpotDecoder run function mean? It looks like it is a map of pixels that are identified as spots. Is this correct?

The docstring for CombineAdjacentFeatures.run() has this description of ConnectedComponentDecodingResults:

ConnectedComponentDecodingResult :
    NamedTuple containing :
        region_properties :
            the properties of each connected component, in the same order as the IntensityTable
        label_image : np.ndarray
            an image where all pixels of a connected component share the same integer ID
        decoded_image : np.ndarray
            Image whose pixels correspond to the targets that the given position in the ImageStack decodes to.

In case that isn’t clear, region_properties is the output from skimage.measure.regionprops, label_image is the output of skimage.measure.label(decoded_image), and decoded_image is an image where every pixel value maps to a gene target. Let me know if you have any more questions.

Ah I see, interesting. Will have to look into this a bit more.

Thank you again for your help!