CNN prediction on Large images

Hi, I trained a CNN for doing image classification on (41, 41, 7) shape training images and the actual image size on which the prediction is to be applied is of the size (2048, 2048, 100). I create image patches and index the (x,y) co ordinate of the patch as I need that for overlaying results in the end. The prediction is running on GPU however it is running prediction for a single patch at a given time. Is there a way to accelerate this process, where I am able to run multiple patches, remember the patch startX and startY index and get the prediction for each patch and then pool in the results properly so that each patch prediction goes to it’s proper place?

In other words for people doing CNN for image classification, how do you guys apply the prediction of CNN on the real microscopy images which are 10 to 20 fold bigger in size?

Programming language is Python using tensorflow and Keras as backends.

@frauzufall @fjug @ctrueden


I think this is missing some information to answer the question properly:
Are you really doing Image classification, i.e. predicting a value (corresponding to some class label) for a given patch?
Or are you doing Image segmentation, i.e. predicting a value for each pixel in a given patch?

If you are doing segmentation, you are (hopefully) using a fully convolutional network. In this case you can increase the input patch shape to the shape that barely fits into your GPU to speed up the prediction. Note that prediction is less memory needy than training, so your patches can be way larger than (41, 41, 7).
Also, it might be a good idea to use some overlap or halo to reduce artifacts at the patch boundaries.
This might be useful:

If you are doing classification, you could stack multiple batches and present the network with an (N, 41, 41, 7) input, where N is the number of batches. You will need to define some mapping of batch ids to patch position to keep the spatial information.
This might be useful:

Note that the image classification approach might not be suited for many cases, because the prediction can depend on the patch size / how objects of interest are covered by patches. Imagine a single object of interest being at the corner of 4 patches.
This really depends on the application though.


Thank you very much for this wonderful answer. Yes I am doing image classification on patches of the image. But yeah I know that FCN for segmentation and CNN for classification. I think the second part of your answer of stacking multiple batches and presenting the network with (N, 41, 41, 7) is what I was looking for.

To give you an idea of the sort of problem I am trying to do this for is like imagine you have RGB images of cats and dogs, say 41 by 41 pixels and use CNN for training on them to do the classification and now I present the network with 500 by 500 pixel image containing 10 dogs and 10 cats, then I make a window of 41 by 41 pixel, stride it in (x,y) and try to get the regions in the image where there are cats and dogs. This gives multiple rectangles in the regions where the network finds cats or dogs but then I can do Non-Maximal suppression to narrow down the rectangles I get.

So yeah I will implement the second part of your answer. Thanks a lot again.


You that worked, with mapping of batch id to patch position. Thanks for speeding it up. Cheers.

@constantinpape and others

I’m struggling with the same problem but in the context of whole slide image segmentation (10k x 10k pixels or so) where I want to recognize tissue regions at multiple resolutions (general tumor vs normal and smaller tissue regions like immune aggregates/TLS) using U-Net (using an adaptation of the this notebook)

How do you suggest to approach this task? Patch the slide to a 512x512 pixel tiles with overlap? You mentioned maximizing the size such that it fits in the GPU, how do I do that? And how do I handle combining the patch overlap? Do I need to get probabilities and assign each pixel based on the highest probability in either image pixels found in the overlap? Is there any package to perform patching and merging effectively?

Thank you

Sure that would work.

Feed in patches of increasing size, 512 x 512, 756 x 756, 1024 x 1024, …
And see at which size it runs out of memory.

Yes, assigning probabilities and then merging for the overlaps is an option.
This is very application dependent. You also mentioned that you predict things at different resolutions, this is certainly something you need to take into account as well here. I am not aware of any general purpose package for this; the problem is that there are always custom things necessary for a given task.

1 Like

In a very simple scenario (e.g. single resolution U-Net) you can use the tile_iterator function from csbdeep:

from csbdeep.internals.predict import tile_iterator

img = np.random.randint(0,100,(2048,2048)) 
result = np.empty_like(img) 

for tile,s_src,s_dst in tile_iterator(img, n_tiles=(4,4), block_sizes=(32,32), n_block_overlaps=(2,2)): 
    # do something with the tile, e.g. model.predict()
    tile_processed = 2*tile 
    # store the result at the correct position
    result[s_dst] = tile_processed[s_src] 


It assumes your input image dimensions to be compatible with the network (e.g. being divisible by the block_size=(32,32), depending on your network) and will create the given number of tiles with a specific overlap and allows to stitch the processed tiles together.


Thank you @constantinpape and @kapoorlab

Another related question which I have overlooked until now when Im getting to the actual patching and analysis part how can I handle un-annotated areas? I am working with huge tissues and it’s impossible to annotate all or the vast majority of the tissue, so I have annotations from a pathologist who has annotated selected tissue structures of interest to us. That means that in many tiles I’m generating, there is a segment that’s not annotated.

Would it affect the U-Net negatively by indirectly indicating that the un-annotated area is negative to one category or another, although it’s not negative? How do I handle this important case? Does it make sense to mask the image to only the annotated parts, such that un-annotated regions are black?


First of all, I am not quite sure what task you want to solve here.

  • Do you have a patch classification task? I.e. do you want to classify image patches as belonging to a specific kind of tissue? If so, a U-Net is not the right model, it is a segmentation model. Instead, you should use a classification model like ResNet, VGG, etc.
  • Or do you have a segmentation task? I.e. you want to segment individual pixels as belonging to one tissue type or the other. Then a U-Net is the model of choice.

In training, you should present your model only with data for which you have annotations. In prediction, you can of course use it for the full images, whether they are annotated or not.

Just a side note but maybe still useful. I successfully used MightyMosaic to applying Unets to large images.

1 Like

Thank you all.
You might want to look at my post here inquiring about hyperparameters and network architecture selection for which I would appreciate your advice.

I see, so you are working on a segmentation task. In that case averaging the predictions over patches is fairly easy: just predict with overlapping patches and take the average of probabilities on the overlap.

Another comment: you might find libraries designed for medical segmentation tasks more suitable then generic segmentation models: