How to accomplish super fast rendering in Napari

Hi everyone,

I would like to ask your advice on how to accomplish the following task in Napari in the most efficiently way :slight_smile: :

  • During an experiment, a machine record images of size 64x10240 (uint16). The number of recorded images depends on the duration
    of the experiment, it could be hundreds, thousand, tens of thousand or hundreds of thousands of images, it depends on the duration of the
    experiment. All the recorded images are store in a specific folder as follows:
/experiment1/image000000000.tif
/experiment1/image000000001.tif
/experiment1/image000000002.tif
/experiment1/image000000003.tif
/experiment1/image000000004.tif
.
.
.
/experiment1/image000000100.tif
.
.
.
/experiment1/image000001000.tif
.
.
.

I would like to address the following use-cases, if possible, using Napari:

  1. Right after the experiment is finished, I would like to be able to visualize, as fast as possible, the whole data to check if everything was acquired as
    expected during the experiment (visual inspection to detect errors). Even though the machine acquired one image ( 64x10240 (uint16) ) at a time, rendering one single image
    is not particularly interesing (the resolution of the image is 0.1 mm, therefore 64 rows represents 6.4 mm, but we are interesting in seeing “big objects”, bigger than 6.4 mm).
    Therefore, if possible, I would like to:
  • Render not one, but a group of images at a time, for example render 10 images at a time, which would correspond to 64mm, and explore them using an slider
    (each time I move the slider I would render a single “medium” image of size 640 x 10240 instead of a small one of size 64 x 10240).
    I have tried the following beginner approach, but Napari shows me only one single image at a time:
import napari
from dask_image.imread import imread

images = imread( "/experiment1/*.tif" )  # images -> dask.array<concatenate, shape=(398, 64, 10240), #dtype=uint16, chunksize=(1, 64, 10240), chunktype=numpy.ndarray>

viewer = napari.Viewer()
viewer.add_image(images)

When I executed this code, Napari let me see only one single image at a time, and it also let me explore them using a slider (the slider goes from 0 to 397, because
shape=(398, 64, 10240)).

What would you suggest me to change if I want to visualize, let’s say 10 images at a time (that is, a single image of size 640 x 10240) instead of a single image of size
64 x 10240?

  • Instead of exploring groups of images at a time, I would like to be able to ensamble and visualize the whole images into a single big image. Let’s say during the
    experiment I capture 1000 images of size 64x10240, so I would like to ensamble them into a single big image of size 64000 x 10240.
    I can accomplish this by executing the following code, but it is extremely slow (around 21 seconds, using standard Napari installation, and around 11 seconds when
    using export NAPARI_ASYNC=1):
import napari
from dask_image.imread import imread

images = imread( "/experiment1/*.tif" ).reshape(-1, 10240) # images -> dask.array<getitem, 
#shape=(3184, 1280), dtype=uint16, chunksize=(8, 1280), chunktype=numpy.ndarray>

viewer = napari.Viewer()
viewer.add_image(images)

What would you suggest me in order to be able to visualize all those images as a single big image, and be able to render it as fast as possible?

  • As a small variant of the previous case, I would like to be able to visualize all those images as a single big image, but in this case I would like to show to the user
    how the canvas get painted. Does Napari wait until it receive all the dask array chunks and them it renders the full image in one-go, or Is it possible in Napari to render each chunck
    as soon as it is available in VRAM?

Thank you very much for all your kind help! :wink:

2 Likes

Creating a single large multiscale image and then having napari render that seems like the simplest thing. If that gives you a good enough user experience. At that point, napari knows nothing about your data. napari is just viewing a big image like any other.

I don’t know if there’s a canonical “create multiscale” routine somewhere. We do it in the async code by just calling ndi.zoom over and over:

From my experience viewing large datasets off SSD the existing multiscale data worked fine. It was fast. Without NAPARI_ASYNC or NAPARI_OCTREE. Those modes are almost exclusively about remote data or computed data, where the latency is high.

If viewing a big multiscale image does NOT give you the user experience you want, then maybe you have to get into writing custom code that knows about your data. But that’s probably a lot more work.

1 Like

Try dask’s hstack and specify contrast_limits, e.g. for a sequence of 12-bit, grayscale, single page TIFF files:

import napari
import dask.array
from tifffile import imread

store = imread('path_to/*.tif', key=0, aszarr=True)
data = dask.array.from_zarr(store)
data = dask.array.hstack(tuple(data[i::10] for i in range(10)))

with napari.gui_qt():
    viewer = napari.Viewer()
    viewer.add_image(data, contrast_limits=[0, 2048])

store.close()

Dear @pwinston, @cgohlke

thank you very much for your kind reply, I am learning a lot from your suggestions! :wink:
Regarding the very slow rendering time that I reported before, I detected that this issue is related with dask_image, if I create the dask array using tifffile, the rendering is much faster (I have opened an issue in dask_image, in case you are interested in taking a look, you can find it here dask_image imread performance issue · Issue #181 · dask/dask-image · GitHub).

Following your advices, I decided to give a step back and focus on creating a small test example: visualize all those images as a single big image by concatenating them, e.g., concatening 398 images of size 64x10240 in order to get a single big image of dimension 25472 x 10240. While experimenting
with the environment variable NAPARI_ASYNC, I have found some intriguing results:

import glob
import tifffile
import dask.array as da
import napari

%gui qt

all_images = sorted(glob.glob(f"{path_images}/*.tif"))
store = tifffile.imread(all_images, key=0, aszarr=True)
using_tifffile_zarr = da.concatenate(da.from_zarr(store))

viewer = napari.Viewer()

Using NAPARI_ASYNC=0

%%time
viewer.add_image(using_tifffile_zarr, contrast_limits=[0, 65535])

Elapsed time: 1.7 seconds


Using NAPARI_ASYNC=1

%%time
viewer.add_image(using_tifffile_zarr, contrast_limits=[0, 65535])

Elapsed time: 48 milliseconds


  • Do you know why the reported elapsed times differ a lot?, is this expected?

  • Even though the elapsed times are differents, qualitatively both approaches take more or less
    the same amount of time to fully show the image in the Napari canvas, Do you know why does this happen?

  • When executing viewer.add_image(using_tifffile_zarr, contrast_limits=[0, 65535]), Napari shows me the image only when it is completely rendered in the canvas, which can take approx. 1.8 seconds. Instead of waiting 1.8 until the image appears on the canvas, is it possible to show how the chunks are rendered in an incremental way so that the user does not have to wait 1.8 seconds watching an empty canvas?

Thank you very much for all your help ;).

I think you are just seeing that with NAPARI_ASYNC=1 the load does not happen upfront in the GUI thread. The initial add_image is much faster with NAPARI_ASYNC=1 because it didn’t actually load the data.

But when you do view the data, the load happens, it happens in a worker thread. The goal of async isn’t to make loading faster, it’s to prevent the GUI thread from blocking while the load is in progress. So before while the load was happening you were “locked out” from interacting with the GUI, and you’d see the “spinning wheel of death” on mac if the load took long enough.

With async the GUI should remain usable during the load. This means no spinning wheel, and it means you can interrupt the load for example by advancing to the next slice.

When you say “is it possible to show how the chunks are rendered in an incremental way” I’m not sure what chunks you mean exactly?

NAPARI_ASYNC=1 uses the ChunkLoader but the only “chunks” are really full image layers. So if you have 3 image layers those will be three “chunks”, but within on layer nothing is broken down spatially into chunks.

What NAPARI_OCTREE=1 adds to asynchronous loading is the concept of spatial chunks. Since we only support 2D images today, the “chunks” are really “tiles”. Those can be loaded and rendered independently of each other. Have you tried NAPARI_OCTREE=1?

While NAPARI_OCTREE does work in certain cases, it’s not been tested widely at all. So I’d not be surprised if it doesn’t work in your case. But I think that’s ultimately what you want to use.

1 Like

Thank you very much @pwinston for your kind reply! :wink:.
Please excuse me, the Napari Async and Octree features are very powerful, but due to my beginner level, it is taking me more time than expected to fully understand and fully use them efficiently :relaxed:. Going back to the basic test example I described before (loading a set of of 398 images, each one with a size 64x10240, then concatenating them into a “big” image of size 25472 x 10240 and finally passing it to napari for its corresponding rendering), what I mean by “chunk” is each one of the 398 images, therefore, the “big” image of size 25472x10240 is formed by 398 chunks of size (64x10240). In this case, if I understood you correctly, both “standard” napari and “async” napari will use dask to load the data (each one of the 398 images) in parallel, so both napari versions will take the same time, the only different is that the “async” napari will lnot load the data in the gui thread, whereas the “standard” napari will use the gui thread, hence blocking a fluent interactivity, am I right?.

Thank you very much for this nice explanation:

So, according to this, my big array of size 25472x10240 is not broken down spatially into chunks, even though it was formed by concatenating 398 smaller images. Therefore, if I want to accomplish the “visual effect” of showing how the chunks (each one of the 398 images in my test example) are rendered in an incremental way, I should avoid creating the big concatenated array of dimension 25472x10240, and instead pass each one of them to napari as an independent image layer…mmm, but in this case all the 398 images will be plotted on top of each other, instead of one below each other, right?..In your example (Napari Async Image Loading - 2020-08-03 - YouTube), you render all the 16 layers on top of each other, but you use additive blending, hence you can see the visual effect of incremental rendering…in my case, I would need to plot each of the 398 layers (each one with a size 64x10240) one below each other to see the incremental rendering effect…do you know if this is possible in napari?

Regarding your comment about octree:

Can I use the octree in my basic test example, even though I do not have a multi-scale pyramidal image?..In this basic example I only have 398 “normal” images, but I want to visualize them as a whole (as a concatenated big image).

Please excuse for all these beginner questions :relaxed:, even though is challenging for me to understand the proper way to use the async and octree features, I wont give up :nerd_face:, I really want to learn how to use these powerful napari features!.

Once again, thank you very much for your kind attention! :wink:

Yes, async just moves the loading from the GUI thread to a worker thread, so it’s not faster. It’s useful though, so that napari does not “hang” when loading remote data. That was the main problem async was trying to solve: remote data. For local data, the current non-async code is quite usable, but not remote data.

Creating 398 layers will almost certainly not work. Even 16 layers are kind of pushing it. There are significant per-layer costs.

I think your best bet to view a 25472x10240 image is to create a multiscale pyramid in a “chunked” format like zarr. This won’t be more than 20-30 lines of code, but it will get your data into a form that napari can render.

If the data is local, you should be able to view it fine without NAPARI_OCTREE. I have viewed much bigger datasets than that and off SSD they render great without the octree. Super fast really. The octree is mostly for remote data or high latency data, where each load takes a long time.

Long long term I could imagine that OctreeImage becomes the new Image class and we use it for everything, local and remote, small and big. But we are pretty far from that point.

I don’t have good example code of how to create a multiscale zarr file. Typically napari just reads those files, we don’t create them. But I imagine there must be good examples around. I’d maybe repost a new topic just “how can I create a multiscale zarr file”. That seems like the best route to me. At the very least try it and see what it looks like. Then decide if you need to do more.

1 Like

Thank you very much for your kind reply @pwinston!
By the way, your videos are a great resource for learning the internal
of Napari, if it is possible for you, please continue with the nice series :wink: