Load only part of an image

What do you all find to be the best way to load only part of an image into a numpy array? I would generally prefer to do this using scikit-image, but I can’t find a way to do this from the API reference or from browsing the io plugins source code. So I generally use pillow–something like this:

import numpy as np
from PIL import Image

with Image.open('multipage.tif') as pil_image:
    pil_image.seek(10) #move pointer to the plane I want
    np_image = np.array(pil_image.crop([0,0,100,100])) 

Am I totally missing something in scikit-image? Is there a better way to do this?

scikit-image provides a submodule skimage.io that handles all the intricacies of converting PIL images. It is not a straightforward matter. We also include the tifffile library, which handles multipage TIFs.

Here’s an example:

from skimage import io
img = io.imread('my_image.tif', img_num=10)

You can select between the two backends explicitly, if you prefer:

io.imread(..., plugin='tifffile')
io.imread(..., plugin='pil')

But is there a way to read only part of an image? For example, suppose I have an image that is 50K pixels by 50K pixels. I generally won’t want to load that all into memory, especially if I am only interested in a subregion. What would be the most scikit way to do that?

1 Like

We don’t have a way of doing that in scikit-image, nor have I heard of a generic way of doing this. This is one of the things that the N5 format is aiming to address. Perhaps raising an issue in github.com/dask/dask-image would be a fruitful approach, but I don’t know. As far as I know it only supports N5.

If the images are stored as .tiff you may be able to get access using the tifffile python library and it’s memory mapping feature. This will allow you to use the range pixel indices of interest to load in certain areas of your image. However, this doesn’t work with well with some compressions, so it is better to put the entire image uncompressed as a .tif on disk, which unfortunately is not the best.

import tifffile
import numpy as np

im = tifffile.imread('myimage.tif')
mm = tifffile.memmap('myimage.tif')

np.array_equal(im[0:100,0:100],mm[0:100,0:100])
Out[1]: True

that dose matter about the file format, the “raw binary data” such as “tif”, “bmp” or “raw”, is possible to read only certain areas. And some unstructured database or h5 can give a similar feature.

If i keep a close eye on memory performance on an Image.crop(), it usually doesn’t jump the way it does when you load the entire image into a numpy.array, even transiently, so I am assuming the compression on these tiffs (slide scans) is amenable to accessing only part of the file at a time.

Actually, these are not compressed tiffs that I have been working with, nevermind! Memory is more of an issue here than storage, which I have a ton of.

The Bio-Formats API has format-independent support for opening subregions of planes.

Would it help for me to whip up an example? If so: in Python using pyimagej? Or is an ImageJ-centric solution OK?

1 Like

@ctrueden it would be very useful for me to see this done with pyimagej. The OP is about reading it into a NumPy array so although a ImageJ-centric post would also be useful, it would not answer the question on this thread. =)

1 Like

@jni I finally took the first step and wrote a simple Python script that calls SCIFIO to open image data as numpy arrays:

It does not open subregions of planes yet. (The SCIFIO API can do that, but it is more complicated.)

3 Likes

Just to explain the previous post a little: it uses SCIFIO at a “low level”, reading raw bytes, with the intent of minimizing data copying or reordering. There is high-level way to use ImageJ+SCIFIO to open NumPy arrays, which is far simpler:

try:
    import imagej
except ImportError:
    raise ImportError("""This example uses ImageJ but pyimagej is not
    installed. To install try 'conda install pyimagej'.""")

print('--> Initializing imagej')
ij = imagej.init('sc.fiji:fiji') # Fiji includes Bio-Formats.

for i in range(1, len(sys.argv)):
    path = sys.argv[i]
    print('--> Reading {}'.format(path))

    dataset = ij.io().open(path)
    image = ij.py.from_java(dataset)
    # ... do something with the numpy array ...

See also this thread:

But I still haven’t hammered out a quick example of loading subregions using either of these approaches. Will follow up here later if I do.

1 Like