Apply mask to all channels

Hello - I’m pretty new to image analysis - enjoying learning my way around skimage. I think what I’m trying to do is pretty simple and I’m getting caught up trying to apply an image mask to all channels of a 3-band image and then performing operations on the masked data. Unfortunately I can’t share the image and it’s in a proprietary format stored as bands x Lines x Line Samples I have a working reader and function that extracts the image data and returns it as rows x col x channels - the image plots fine and I think the extraction is good. I also have four binary masks each using a specific threshold. I want to apply each mask in turn to all channels of a series of images. I have the code below working, but it’s not particularly efficient.

I’m not exactly clear on what happens with the mask is applied - I’m assuming that the masked areas are converted to ‘0’. I don’t want the masked pixels to be included in the flattened arrays I’m feeding into Pandas. Is there a way of converting these to masked arrays? I was also thinking of adding a constant to all image data values, applying the mask, re-writing out all non-zero elements in the masked image into new arrays and then subtracting the constant - but that is going to be really inefficient with large data sizes.

Thank you for any help and suggestions!

Sample code

fnamelist = ['/path/to/PDS/IMG/image.IMG']

mask = skimage.io.imread('/path/to/mask.png')
mask = mask[:,:,0] #use only one channel of this file for the mask
mask[:10] =177 #set the first 10 lines of the mask to 177


def IMG_to_skimage(fname, **kwargs):
    """
    Imput a .IMG file with an attached label and output a lines x line samples x bands np array in radiance values 
    """
    data = pdr.read(fname)
    im =  data['IMAGE']
    rad = data['LABEL']['PROCESSING_PARMS']['RADIANCE_SCALING_FACTOR']
    im_rad = ((im[0]*rad[0]), (im[1]*rad[1]), (im[2]*rad[2]))
    return np.dstack((im_rad[0], im_rad[1], im_rad[2]))

columns = ['name', 'rad', 'band', 'source']
main_data = pd.DataFrame(columns=columns)
for fname in fnamelist:
    img_data = pd.DataFrame(columns=columns)
    image = IMG_to_skimage(fname)
    image = image[0:1200, 0:1600, :] #y extent x extent to make input files (1232x1600) match mask dimensions     
    name = os.path.basename(fname)
    
    mask_dict = {108:'sky', 177:'ground', 0:'mount', 155:'dune'} #threshold values for the 4 masks to be created from the mask image file
    for k, v in mask_dict.items():
        binary_mask = mask == k
        for i in range(3):
            data = image[:,:,i]
            masked_data = image[:,:,i] * binary_mask #apply the binary mask to each channel in the input image
            
            i_df = pd.DataFrame(columns=columns)
            i_data2 = masked_data.flatten() # I don't want the masked pixels to be included as '0' here
            i_df['rad'] = i_data2
            i_df['band'] = i # record each channel
            i_df['source'] = v # record mask
            img_data = img_data.append(i_df)
            img_data['name'] = name #record input image

    main_data = main_data.append(img_data)

grouped = main_data.groupby(['source', 'band'])
ax = grouped.boxplot(subplots=False, rot=90, fontsize=12, showfliers=False)

Background

I need to collect statistics on specific image areas after applying a mask for each band of the input images. As actual images, once masked, the image itself is no longer useful information (images are detector noise), I’m only interested in the data values from each target area - that’s why I’ve been flattening them into vectors to feed into tidy data frames for downstream analysis. Open to all alternative suggestions here :slight_smile:

Analysis goals

Goals: 1) calculate descriptive stats from values in each unmasked area (ie mean pixel value, standard deviation, inter-quartile range) 2) visualize stats as grouped box plot and histogram for each masked area for each channel.

Challenges

  • I think I’m confusing some terminology - after applying a mask I end up with part of the original area ‘black’, I’m referring to these as the masked pixels (I think converted to 0’s?). The remaining area of unmasked pixels is the area I’m interested in calculating statistics for.
  • I keep ending up with all the masked pixels as ‘0’ which is skewing the stats I’m trying to calculate, I would like to either work with masked arrays or delete the masked pixels changing the shape of the arrays
  • What have you tried already: re-writing non-zero elements into new arrays
  • What software packages and/or plugins have you tried - I think working with the numpy.ma module might be helpful, I’m struggling to apply it.
1 Like

Update

I made a few changes with numpy.ma and have the code producing what’s expected, I’m just not sure I understand why masked numpy arrays made a difference:

I definitely had masked and unmasked pixels mixed up

binary_mask = mask != k #changed from binary_mask = mask == k

This works:

 masked_data = ma.masked_array(image[:,:,i], mask= binary_mask)

And this doesn’t and I’m not sure why:

masked_data = image[:,:,i] * binary_mask

When working with the masked arrays, I thought that downstream operations would only operate on valid data but:

masked_data.flatten()

produced a 1D array with all elements

While this yielded what I was looking for, a 1D array with only the valid elements:

masked_data.compressed()

The below nows produces the expected output, but I’m not sure I understand the difference between element wise multiplication using image[:,:,i] * binary_mask and numpy.ma.masked_array

Thanks :slight_smile:

fnamelist = ['/path/to/PDS/IMG/image.IMG']

mask = skimage.io.imread('/path/to/mask.png')
mask = mask[:,:,0] #use only one channel of this file for the mask
mask[:10] =177 #set the first 10 lines of the mask to 177


def IMG_to_skimage(fname, **kwargs):
    """
    Imput a .IMG file with an attached label and output a lines x line samples x bands np array in radiance values 
    """
    data = pdr.read(fname)
    im =  data['IMAGE']
    rad = data['LABEL']['PROCESSING_PARMS']['RADIANCE_SCALING_FACTOR']
    im_rad = ((im[0]*rad[0]), (im[1]*rad[1]), (im[2]*rad[2]))
    return np.dstack((im_rad[0], im_rad[1], im_rad[2]))

columns = ['name', 'rad', 'band', 'source']
main_data = pd.DataFrame(columns=columns)
for fname in fnamelist:
    img_data = pd.DataFrame(columns=columns)
    image = IMG_to_skimage(fname)
    image = image[0:1200, 0:1600, :] #y extent x extent to make video match mask dimensions
    
    name = os.path.basename(fname)
    
    mask_dict = {108:'sky', 177:'ground', 0:'mount', 255:'dune'}
    for k, v in mask_dict.items():
        binary_mask = mask != k
        for i in range(3):
            data = image[:,:,i]
            masked_data = ma.masked_array(image[:,:,i], mask= binary_mask) 
            
            i_df = pd.DataFrame(columns=columns)
            i_data2 = masked_data.compressed() #just get the valid data in the masked data as a 1D array
            i_df['rad'] = i_data2
            i_df['band'] = i
            i_df['source'] = v
            img_data = img_data.append(i_df)
            img_data['name'] = name

    main_data = main_data.append(img_data)
        
            
main_data.groupby(['name', 'source', 'band']).describe()
1 Like

Hi @Hsapers ,

Welcome to the forum!

Looks like you solved your own problem, so feel free to mark your reply as a solution if you’re content with it.

I’ll try to briefly answer your outstanding questions on this part:

element-wise multiplication

You’re correct that the first example (element-wise multiplication) “replaces elements outside the mask with zeros”, because it multiplies every value by either a zero or a one.

an alternative with indexing

Another way (I think) you could have gotten the result you want is with boolean-indexing ( read more here. Here’s an example:

In [17]: x                                                                                                                                                                                                 
Out[17]: 
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [18]: mask                                                                                                                                                                                              
Out[18]: 
array([[ True,  True, False],
       [ True,  True,  True],
       [ True, False,  True]])

In [19]: x[mask]                                                                                                                                                                                           
Out[19]: array([0, 1, 3, 4, 5, 6, 8])

MaskedArray

MaskedArray explicitly keeps track of what data are invalid. Whereas, if you multiply,
there’s not a way to determine the difference between a “real” zero and a zero that’s “missing data”. It’s described pretty well here here, but the relevant part is:

A subclass of ndarray designed to manipulate numerical arrays with missing data.

An instance of MaskedArray can be thought as the combination of several elements:

  • The data , as a regular numpy.ndarray of any shape or datatype (the data).
  • A boolean mask with the same shape as the data, where a True value indicates that the corresponding element of the data is invalid. The special value nomask is also acceptable for arrays without named fields, and indicates that no data is invalid.
  • A fill_value , a value that may be used to replace the invalid entries in order to return a standard numpy.ndarray .

And as you found, compressed() gives you a flat 1d array of the valid elements.

Hopefully that clarifies things, but do ask follow up questions if you have them,
John

3 Likes

Thank you @bogovicj for the excellent explanations and references!