Following this big data release from Recursion Pharmaceuticals of ~450GB of image data from a SARS-CoV-2 Virus screen available for download as a zip at the bottom of the linked page - I thought it might be fun to try to browse through the data with napari, but without having to download and unzip the whole thing.
In my ideal world I would be able to point something like dask-image at the zip, give it some information about where to find what images and get back a dask array that I could lazily index into.
I found this python remote zip library (pip installable) that allows me to read into parts of the zip file, but it is still quite slow (>10 s) here is my example of just getting one png 1024x1024, 8-bit.
from remotezip import RemoteZip path = 'https://storage.googleapis.com/rxrx/RxRx19a/RxRx19a-images.zip' file = 'RxRx19a/images/HRCE-1/Plate1/AA02_s2_w2.png' with RemoteZip(path) as zip: image_data = zip.read(file)
Note to convert to an actual image you have to do
from napari import view_image from imageio import imread image = imread(image_data) view_image(image)
which looks nice!
but is too slow to think about putting everything inside a dask array and scrolling through it.
Curious if anyone has tried something like this before and has any ideas