Unable to retrieve values from DecodedIntensityTable

Hello Again,

After getting a DecodedIntensityTable from my data, I am unable to get information out of that object by either .to_decoded_dataframe() or by AssignTargets.Label().run(), and the error messages I am getting are not useful for helping locate the source of the problem.

I can provide additional code but by and large, the operations used are similar to those in the User Guide.


With [truncated] output:

<xarray.DecodedIntensityTable (features: 14854, c: 12, r: 5)>
array([ removed ])
    radius             (features) float64 4.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 4.0
    z                  (features) int64 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0
    y                  (features) int64 2034 2022 2002 1995 1983 ... 6 3 3 0 0
    x                  (features) int64 1946 2025 1966 1993 242 ... 596 43 423 8
  * r                  (r) int64 0 1 2 3 4
  * c                  (c) int64 0 1 2 3 4 5 6 7 8 9 10 11
    xc                 (features) float64 0.9507 0.9893 ... 0.2066 0.003908
    yc                 (features) float64 0.9936 0.9878 0.978 ... 0.0 0.0
    zc                 (features) float64 0.05 0.05 0.05 0.05 ... 0.05 0.05 0.05
    target             (features) <U13 'Cnn3' 'Limd1' ... 'Fubp3' 'Polr3e'
    distance           (features) float64 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
    passes_thresholds  (features) bool True True True True ... True True True
    cell_id            (features) <U8 'nan' 'nan' 'nan' ... 'nan' 'nan' 'nan'
Dimensions without coordinates: features
    starfish:  {"log": [{"method": "Reduce", "arguments": {"dims": "\"{<Axes....
KeyError                                  Traceback (most recent call last)
<ipython-input-30-0ffa0da52e8c> in <module>
      1 print(decoded_filtered)
----> 2 print(decoded_filtered.to_decoded_dataframe())

/usr/local/lib/python3.6/dist-packages/starfish/core/intensity_table/decoded_intensity_table.py in to_decoded_dataframe(self)
    102         df = self.to_features_dataframe()
    103         pixel_coordinates = pd.Index([Axes.X, Axes.Y, Axes.ZPLANE])
--> 104         df = df.drop(pixel_coordinates.intersection(df.columns), axis=1).drop(Features.AXIS, axis=1)
    105         return DecodedSpots(df)

/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   3995             level=level,
   3996             inplace=inplace,
-> 3997             errors=errors,
   3998         )

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   3934         for axis, labels in axes.items():
   3935             if labels is not None:
-> 3936                 obj = obj._drop_axis(labels, axis, level=level, errors=errors)
   3938         if inplace:

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in _drop_axis(self, labels, axis, level, errors)
   3968                 new_axis = axis.drop(labels, level=level, errors=errors)
   3969             else:
-> 3970                 new_axis = axis.drop(labels, errors=errors)
   3971             result = self.reindex(**{axis_name: new_axis})

/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in drop(self, labels, errors)
   5016         if mask.any():
   5017             if errors != "ignore":
-> 5018                 raise KeyError(f"{labels[mask]} not found in axis")
   5019             indexer = indexer[~mask]
   5020         return self.delete(indexer)

KeyError: "['features'] not found in axis"

And after generating a BinaryMaskCollection with cell labels from segmentation, cell ids cannot be added to the DecodedIntensityTable.

assigner = AssignTargets.Label()
labeled = assigner.run(area_filt, decoded_filtered)

with output:

<starfish.core.morphology.binary_mask.binary_mask.BinaryMaskCollection object at 0x7f701dbb7470>
KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/xarray/core/dataarray.py in _getitem_coord(self, key)
    628         try:
--> 629             var = self._coords[key]
    630         except KeyError:

KeyError: 'spot_id'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-25-81259438a325> in <module>
      1 assigner = AssignTargets.Label()
----> 2 labeled = assigner.run(area_filt, decoded_filtered)
      3 labeled_filtered = labeled[labeled.cell_id != 'nan']

/usr/local/lib/python3.6/dist-packages/starfish/core/pipeline/algorithmbase.py in helper(*args, **kwargs)
     21         @functools.wraps(func)
     22         def helper(*args, **kwargs):
---> 23             result = func(*args, **kwargs)
     24             if result is not None:
     25                 method_class_str = str(args[0].__class__)

/usr/local/lib/python3.6/dist-packages/starfish/core/spots/AssignTargets/label.py in run(self, masks, decoded_intensity_table, verbose, in_place)
     95         """
---> 96         return self._assign(masks, decoded_intensity_table, in_place=in_place)

/usr/local/lib/python3.6/dist-packages/starfish/core/spots/AssignTargets/label.py in _assign(masks, decoded_intensities, in_place)
     61                 selectors['z'] = in_bbox.z
     62             in_mask = mask.sel(**selectors)
---> 63             spot_ids = in_bbox[Features.SPOT_ID][in_mask.values]
     64             decoded_intensities[Features.CELL_ID].loc[
     65                 decoded_intensities[Features.SPOT_ID].isin(spot_ids)] = mask.name

/usr/local/lib/python3.6/dist-packages/xarray/core/dataarray.py in __getitem__(self, key)
    638     def __getitem__(self, key: Any) -> "DataArray":
    639         if isinstance(key, str):
--> 640             return self._getitem_coord(key)
    641         else:
    642             # xarray-style array indexing

/usr/local/lib/python3.6/dist-packages/xarray/core/dataarray.py in _getitem_coord(self, key)
    631             dim_sizes = dict(zip(self.dims, self.shape))
    632             _, key, var = _get_virtual_variable(
--> 633                 self._coords, key, self._level_coords, dim_sizes
    634             )

/usr/local/lib/python3.6/dist-packages/xarray/core/dataset.py in _get_virtual_variable(variables, key, level_vars, dim_sizes)
    169         ref_var = dim_var.to_index_variable().get_level_variable(ref_name)
    170     else:
--> 171         ref_var = variables[ref_name]
    173     if var_name is None:

KeyError: 'spot_id'

It looks like your DecodedIntensityTable is missing the dimension coordinate “features”, which is why you are getting these errors. This was probably removed during an upstream filtering step.

Just to provide some additional info: The DecodedIntensityTable is a derived class of xarray and inherits its properties. One property of xarray is the Coordinates array that labels the dimensions of the xarray. A DecodedIntensityTable has three dimensions: features, r, and c, and three dimension coordinates that label the dimensions. You can see those labeled with an * in the output of print(decoded_filtered).

By the way, your data has 12 channels and 5 rounds of imaging. Are those swapped on accident?

I understand that this error seems to come from not having the ‘feature’ dimension labeled, but I’m not sure why it isn’t present in the DecodedIntensityTable after generating it from Starfish.

The number of channels and rounds is correct, I have verified them against the image files.

The general steps I took to make the DecodedIntensityTable are as follows:

  1. Load experiment with starfish.core.experiment.experiment.Experiment.from_json
  2. Flatten images with .reduce({Axes.ZPLANE}, func="max")
  3. Clip and scale values to same range with starfish.image.Filter.Clip
  4. Find spots with starfish.spots.FindSpots.BlobDetector
  5. Decode with starfish.spots.DecodeSpots.PerRoundMaxChannel

Attempting to run .to_decoded_dataframe() on the direct output from the decoder results in the above error. Why would this be happening?

It is very odd that there is no features coordinate in your DecodedIntensityTable. Can you print the IntensityTable that is returned by BlobDetector and see if that has a features dimension coordinate?

Also, it would be helpful to have the actual code you are running because every function has arguments that can change the behavior. I see that you had issues with BlobDetector before, how did you work around that?

The immediate result from BlobDetector is a SpotFindingResults object (containing a dict of PerImageSliceSpotResults), not an IntensityTable. I never fully resolved the prior issues with BlobDetector and I only had those out-of-memory issues on the example dataset-- once I moved on to our actual data I stopped experiencing that issue.

Code provided below. In this dataset we only have one fov, so the dicts aren’t strictly necessary.

exp = starfish.core.experiment.experiment.Experiment.from_json(convloc+"/experiment.json") 
flatImgs = {}
for key in exp.keys():
    flatImgs[key] = exp[key].get_image('primary').reduce({Axes.ZPLANE}, func="max")

clip = starfish.image.Filter.Clip(p_max=99.9, is_volume=True, level_method=Levels.SCALE_BY_CHUNK)
for key in flatImgs.keys():
    clip.run(flatImgs[key], in_place=True)

spots = {}
bd = starfish.spots.FindSpots.BlobDetector(
    min_sigma=(0.5, 0.5, 0.5),
    max_sigma=(8, 8, 8),
for key in flatImgs.keys():
    spots[key] = bd.run(image_stack=flatImgs[key])

decoded = {}
decoder = starfish.spots.DecodeSpots.PerRoundMaxChannel(
for key, img in flatImgs.items():
    decoded[key] = decoder.run(spots=spots[key])

decoded_filtered = decoded["fov_000"].loc[decoded["fov_000"][Features.PASSES_THRESHOLDS]]
decoded_filtered = decoded_filtered[decoded_filtered.target != 'nan']

Sorry you’re right, it is a SpotFindingResults object. Please use the following code to “trace spots” into an IntensityTable and then check for the features coordinate:

from starfish.core.spots.DecodeSpots.trace_builders import build_spot_traces_exact_match, \
    build_traces_sequential, build_traces_nearest_neighbors

intensity_table = build_traces_nearest_neighbors(spot_finding_results, search_radius=7)

If the features coordinate is missing, then I think the issue is due to the nearest neighbor trace building strategy. That option is extremely fickle and your data may not be compatible (e.g. cycles not aligned well enough or spot density too high). If you’ve already tried to align your data, then you may have to create a synthetic reference image and use the BlobDetector and PerRoundMaxChannel with the reference image. (See: TraceBuildingStrategies Tutorial).

After running that, it does appear that the feature dimension is also missing from the spot_finding_results.

intensity_table = build_traces_nearest_neighbors(spots['fov_000'], search_radius=7)
<xarray.IntensityTable (features: 251965, c: 12, r: 5)>
array([[[       nan, 0.12654503,        nan,        nan, 0.11891207],
        [       nan,        nan,        nan,        nan,        nan],
        [       nan,        nan,        nan,        nan,        nan],
        [       nan,        nan, 0.14580153,        nan,        nan],
        [       nan,        nan,        nan,        nan,        nan],
        [       nan,        nan,        nan,        nan,        nan]],

       [[       nan, 0.13655092,        nan,        nan, 0.17918257],
        [       nan,        nan,        nan,        nan,        nan],
        [       nan,        nan,        nan,        nan,        nan],
        [0.30462518,        nan, 0.13893129,        nan,        nan],
        [       nan,        nan,        nan,        nan,        nan],
        [       nan,        nan,        nan,        nan,        nan]],

       [[       nan, 0.13301943,        nan,        nan,        nan],
        [       nan,        nan,        nan,        nan,        nan],
        [0.20973782,        nan,        nan,        nan,        nan],
        [       nan,        nan,        nan,        nan,        nan],
        [       nan,        nan, 0.35006496,        nan, 0.43534485],
        [       nan, 0.18075012,        nan,        nan,        nan]],

       [[       nan,        nan,        nan,        nan,        nan],
        [       nan,        nan,        nan,        nan,        nan],
        [       nan,        nan,        nan,        nan,        nan],
        [       nan,        nan,        nan,        nan,        nan],
        [       nan,        nan,        nan,        nan,        nan],
        [       nan, 0.32037959,        nan,        nan,        nan]],

       [[       nan,        nan,        nan,        nan,        nan],
        [       nan,        nan,        nan,        nan, 0.25541127],
        [       nan,        nan,        nan,        nan,        nan],
        [       nan,        nan,        nan,        nan,        nan],
        [       nan,        nan,        nan,        nan,        nan],
        [       nan, 0.21147764,        nan,        nan,        nan]]])
    radius   (features) float64 1.0 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 4.0
    z        (features) int64 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
    y        (features) int64 2044 2044 2044 2044 2044 2044 2044 ... 1 0 0 0 0 0
    x        (features) int64 2019 2016 2012 2007 1999 ... 607 595 457 423 8
  * r        (r) int64 0 1 2 3 4
  * c        (c) int64 0 1 2 3 4 5 6 7 8 9 10 11
Dimensions without coordinates: features

I also ran this with search_radius from 0 to 9, and had the same outcome. (Would running a sweep with the radius value for DecodeSpots.PerRoundMaxChannel.run make a difference?)

I’m double-checking with my collaborators, but I believe that these images have already been aligned, so the issue must be due to high spot density.

Am I correct in understanding that the reference image will be used to define the spot locations for every image, and ‘matching’ these spots will consist of checking for which spots pass threshold checks? Would it be reasonable to construct a reference image from the projected maximum of all images, and if not, how should it be constructed?

Unlikely, in my opinion.

It may also be a bug with the nearest neighbor trace builder.

Yes it will use the reference image to define the spot locations. But instead of ‘matching’ spots it will simply store the intensity value at that location for all the other images. And then PerRoundMaxChannel will assign each round a ‘digit’ based on the channel with max intensity to construct a ‘barcode’.

Yes that is absolutely what I would recommend. Of course this works much better if your images are aligned before you do the projection.

Creating the reference image to run the spot detector seems to have done the trick, and to_expression_matrix() gives the expected table. Thanks!

I don’t know if it’s a better idea to continue this here or start a new thread, but I’ve encountered a new problem with some of the same steps in this analysis-- right now I believe that this current spot decoding method is losing many transcripts, and this dataset would perform better if TraceBuildingStrategies.NEAREST_NEIGHBOR were able to be used.

Here is a zoomed example comparing the results I got with starfish to those of a collaborator. These the locations of all the transcripts that were hot in this round and channel.

I think the main issue here lies with decoding, because when I compare the initial output from spot detection to the final expected transcripts, they appear to be closer to the expected quantity/density.

The algorithm our collaborator described to arrive at their set of transcripts matches what I would think the nearest neighbor method is:

  1. Filter for points within the cell segmentation mask
  2. Colocalize each point within a radius (either 2.4 or 1.1 px)
  3. Find all possible combinations to a barcode, use distance as a tiebreaker if needed.

How should I proceed with decoding this dataset? Is there something else I can do to get better results out of exact match, or can I troubleshoot the “spot_id” error from PerRoundMaxChannel?

The current implementation of nearest neighbor method in starfish is similar but not exactly what you described. In starfish, you choose an anchor round and then the closest spot within the search radius is picked from all other rounds. It does not consider all possible barcodes that could be constructed within the search radius. The performance has not been validated.

I have submitted a pull request that should allow you to try the Nearest Neighbor method. If you don’t want to wait for it to be merged, you can install starfish from github and add the couple lines of code that are in the pull request. If that doesn’t give better results, I suggest making a feature request or bug report on github.