I am trying to follow the Assigning Spots To Cells tutorial in order to add the area of each segmented cell to my expression matrix. In this example, it looks like the number of masks is the same as the number of cells in the expression matrix. However, after running to_expression_matrix my expression matrix only contains cell_ids that were assigned at least one spot, while the BinaryMaskCollection produced by Segment.Watershed contains all cell_ids. I therefore get an error that the BinaryMaskCollection length and the expression matrix dimensions don’t match. I have not been able to find a description of or a way to subset the masks vector for just the cell_ids preserved in the expression matrix. Is this possible? Thank you so much.
I believe the
cell_ids in the expression matrix still match the
BinaryMaskCollection (i.e. if the cells were originally labeled 0-24, even if only 10 cells have RNA spots, those 10 cells are still labeled 0-24 and not relabeled 0-9). If this assumption is not true, let me know and we can find another solution.
I think the best way to handle cases where the expression matrix dropped cells with zero RNA spots is to just add an if statement that checks whether the
cell_id is in the expression matrix. For example:
# get array of cell_ids in expression matrix cell_ids = mat['cell_id'] mat[Features.AREA] = (Features.CELLS, [mask.data.sum() for _, mask in masks if mask.name in cell_ids])
Thanks so much for your reply.
The cells in my expression matrix do still have their original cell IDs, the list is just not continuous as cells that did not have a spot assigned were just removed. Ie the cell_ids list looks something like 000, 001, 005, 006, 010, 012, 015, etc.
So your addition of checking the mask.names against the cell_ids did the trick for skipping the missing cell_ids. Thank you!