NGFF Status Update (May 2021)

Hello NGFF’ers

I realize that it’s been quite some time since there’s been an #ome-ngff status update. (Perhaps the lightweight updates in NGFF 2020 Newsfeed - #14 by joshmoore were a good idea…) I’ve captured most of the highlights of 2021 to date below, though I’ve likely left something out. If anyone has anything to add or ask, please feel free.

Pre-print

Of course, the largest portion of sweat and tears over the last 5 months went into the biorxiv preprint. (tweet in case you missed it.) We’ve just begun revisions, so if you have thoughts, now and here are as good a time & place for feedback as ever.

(@markkitt to answer your question since I’m not on Discus, the filesystem was the default for root block devices on EC2: terraform file)

Nested storage (v0.2)

One of the rabbit holes we went down in trying to get the preprint completed was “nested chunks” for Zarr. The details aren’t too terribly important, but basically it took about 2 months to specify whether all Zarr implementations used a “/” or a “.” in files name in order to speed up access for large volumes. In the end, it led to v0.2 of OME-NGFF which is now released at http://ngff.openmicroscopy.org/0.2 .

Axes (v0.3)

Ongoing now and ready for feedback (:exclamation:) is the specification of axes information led by @constantinpape in https://github.com/ome/ngff/pull/46. This should hopefully make #ome-ngff compatible with xarray, “N-D labeled arrays and datasets in Python”, and begin further N-dimensionalification steps.

Other active issues

There are a couple of other specification issues which are being moved forward right now, for example, around Polygons and Meshes thanks to @glyg, @Anatole_Chessel, et al. Anyone who is working on a spec and needs help/support/feedback, please speak up.

NetCDF support (4.8.0)

In other news (again, in case you missed it) NetCDF 4.8.0 (written in C) has added support for Zarr as a backend alongside HDF5 (tweet). This will hopefully guarantee significant support for the format moving forward.

Meetings

Finally, it’s pass time for another community call. The last one was on Feb. 23rd. However, with the upcoming OME community meeting I don’t foresee myself organizing a full day of calls within the next 3 weeks. It might be that something more focused on ongoing specs would still be useful. If someone would like to take the lead on that, I’ll happily attend. Otherwise, I’d propose we try to chat early summer. Thoughts & suggestions welcome (as always).

~Josh

9 Likes

Thanks for clarifying. I’m going to guess that is not NTFS and is probably either ext4 or Elastic File System running on cloud-backed storage?

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/RootDeviceStorage.html#choose-an-ami-by-root-device

Awesome, thanks for the update @joshmoore!! I’m excited to make sure that napari can read/ write the data from the new spec, including axis and when it comes polygons/ meshes. Definitely let me know if we can assist.

1 Like

It was intended to look as “local” (or non-cloud) as possible. Spinning up instances to do some reporting:

/dev/nvme0n1p1 on / type ext4 (rw,relatime,discard)
sysbench output
ubuntu@ip-10-0-1-88:~$ sysbench fileio --file-test-mode=rndrd run
sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Extra file open flags: (none)
128 files, 16MiB each
2GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random read test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      309419.86
    writes/s:                     0.00
    fsyncs/s:                     0.00

Throughput:
    read, MiB/s:                  4834.69
    written, MiB/s:               0.00

General statistics:
    total time:                          10.0001s
    total number of events:              3094738

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                   13.58
         95th percentile:                        0.00
         sum:                                 9358.26

Threads fairness:
    events (avg/stddev):           3094738.0000/0.00
    execution time (avg/stddev):   9.3583/0.00

(and for extra credit)


ubuntu@ip-10-0-1-212:~$ sysbench --test=fileio --file-test-mode=seqwr run
WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Extra file open flags: (none)
128 files, 16MiB each
2GiB total file size
Block size 16KiB
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing sequential write (creation) test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      0.00
    writes/s:                     2594.36
    fsyncs/s:                     3323.38

Throughput:
    read, MiB/s:                  0.00
    written, MiB/s:               40.54

General statistics:
    total time:                          10.0203s
    total number of events:              59178

Latency (ms):
         min:                                    0.00
         avg:                                    0.17
         max:                                  169.39
         95th percentile:                        0.01
         sum:                                 9999.89

Threads fairness:
    events (avg/stddev):           59178.0000/0.00
    execution time (avg/stddev):   9.9999/0.00

1 Like