BigDataViewer Hdf5 Performance

bigdataviewer

#1

Hello @hanslovsky, @tpietzsch

I have a question about Hdf5 (I am afraid it is rather an observation than a question…).

As far as I know even read methods in the java Hdf5 libraries are public static synchronized, implying that there can be only one read process in the whole JVM. Is that correct?

Is it then also correct to say that while Bdv is fetching data from different resolution layers there is one huge queue of processes that all want to access the one synchronized Hdf5 read method?
Which then would also means that any other process in the same JVM that want to do read Hdf5 files end up in the same waiting queue, which could potentially be kind of blocked by the processes going on in Bdv.

And I guess that’s why you like n5?

Would be great if you could shorty comment whether my understanding is correct.

Best, Christian


#2

I am not too familiar with the HDF5 Java libraries. All I know is that parallel writing is non-trivial with HDF5 (not only the Java bindings), which was one of the reasons for N5.

Pinging @axtimwalde for comment.


#3

Here is some info: https://support.hdfgroup.org/ftp/HDF5/releases/HDF-JAVA/hdfjni-3.3.2/src/hdfjava-3.3.2-javadoc/hdf5_java_doc/hdf/hdf5lib/H5.html

And one example from that page:

public synchronized static native int H5Dread(int fid, int filetype, int memtype, int memspace, Object data);

#4

Christian, I think you are correct, unfortunately…


#5

@tpietzsch
This made me think of BigDataServer: Do you think it makes conceptually sense to fire up a new BigDataServer for each client? Like this they would all have their own JVM and would not block each others Hdf5 read methods.


#6

I think bad things happen as HDF5 itself doesn’t want to be read in parallel by default. The library has to be very explicitly compiled in parallel mode with one of a particular set of libraries. Serial HDF5 libraries will do a filesystem level lock to prevent multiple processes reading.


#7

…really? that would be a problem!
do you know why one would do this?
I understand for writing, but reading?


#8

Last time I tried I think there’s a read only flag that might allow parallel reads, but the filesystem in use disallowed file locks so it didn’t work as expected. HDF5 is best thought of as a hierarchical filesystem. The locks are likely to protect some global whatnots in the library.


#9

I did some more digging as a sanity check. Multi-threaded reads to a single file (or even multiple files) require the thread-safe compilation. Reads are still serial due to global structures in library. Fully independent processes can read from the same file as long as nothing is writing. “Parallel HDF5” via MPI is incompatible with the thread-safe version.

So in theory spawning fully independent JVMs might work though I saw some references on stackoverflow to fork() not helping.