Future development of OME Files C++ in 2019

Future development of OME Files C++ in 2019

With version 0.6.0 of OME Files released, it’s time to look ahead to the next releases, and plan their development. The development resources for OME Files are somewhat restricted at present; my full time job involves some imaging-related work, but is more focused upon instrumentation. As a consequence, development will be at a slower pace than before. As for the 0.6.0 release, I am willing to do work on OME Files on a contractual basis, albeit with some limitations upon the number of hours I can dedicate per week. However, there is certainly scope for other contributors to get involved, which could involve creating issues, to submitting merge requests, to having direct access to the GitLab repositories.

New features and improvements

New features, and improvements to existing features, will need to be driven by user requirements, to be sure they meet real needs, and focus the development upon what are the most important priorities for our userbase. Most of the codebase could be improved in various different ways, so I would very much like to hear from each end user about what your priories are, so we can take them into consideration. While
public discussion is welcome, if you would prefer to keep it confidential, please feel free to email me in confidence.

Some examples, in no order of preference:

  • API improvements for ease of use
  • Creation of DLLs on Windows, by removing use of STL from the exported interfaces
  • Replacement of the nD pixel buffer with an easier to use and more performant alternative
  • Extensions to the model API to allow transparent serialisation and storage of user- and application-specific metadata in the OME data model as custom annotations
  • Replacement of the model and metadata interfaces with smaller, simpler and easier to use interfaces
  • Replacement of the OME-XML model object backend with alternative backends, to improve scalability and performance, which could include SQLite, HDF5 or other file formats
  • Replacement of the XSLT transforms, either with XSLT2 (QtXmlPatterns) or some other transform mechanism, such as direct DOM manipulation, or database upgrades in the case of SQLite
  • Further replacement of Boost with small header-only libraries like fmt (see below)
  • Further use of Qt5 to implement core functionality (see below)

Use of third-party libraries

From its inception, OME Files has been portable across operating systems and compilers. In order to port the core Bio-Formats interfaces and implementation from Java, we had to use third-party libraries to provide needed functionality, from XML and XSLT processing, to using smart-pointers to approximate the semantics of Java object lifetime and garbage collection. To provide this functionality, we initially adopted:

  • Boost (smart pointers, threads, regular expressions, filesystem
  • abstraction, options parsing, logging, arrays, tuples etc.)
  • Xerces-C++ (XML DOM)
  • Xalan-C++ (XSLT transforms)
  • libtiff (TIFF file format support)

Xerces and Xalan

One medium-term concern is that the extensive use of XML and XSLT technologies from the early 2000s locks us into specific implementations which are largely unmaintained and have no viable replacement. The OME data model (ome-xml library) makes heavy use of many XML features, including namespaces, validation and transforms, and uses quite a few esoteric features which only the Xerces and Xalan libraries provide with C++; other XML and XSLT libraries only provide a subset of the required features. Xerces-C++ and Xalan-C++ are of particular concern, due to lack of active maintenance for a long period, but the same concerns also apply on the Java side. I joined the Apache project to maintain Xerces-C++ and add CMake support, and a few weeks ago I joined the Xalan-C++ PMC with the aim of adding outstanding bugfixes, cleaning it up to work on moderns systems, adding a CMake build and making a new release. You can see the progress on GitHub. However, even with this maintenance work, support for these technologies is waning, and we might want to consider alternatives both on the C++ and Java side.

The work done for 0.6.0 allows the replacement of Xerces and Xalan with QtXml and QtXmlPatterns. However, this comes at a cost of not supporting offline validation, and not supporting the Xalan-specific XSLT1 transforms ome-xml provides. I would be interested in an opinions regarding the best direction to take the library to allow use of old versions of the OME data model, whilst not being tied to obsolete libraries which many people would rather avoid.

Boost

With the requirement of C++11 and now C++14, many of the Boost libraries we used have been replaced by the C++ standard library equivalents (smart pointers, threads, arrays, tuples, regex, type_traits). With C++17, even more libraries can be replaced
(filesystem and variant are the main ones). This leaves a small number of header-only libraries, and program_options, which could also be replaced, to drop Boost entirely.

Is continuing in this direction, with the aim of removing use of all Boost libraries, acceptable to all?

Qt

Currently, Qt is entirely optional. Only the bare-bones image viewer (ome-files-view) requires it. However, for 0.6.0 we did start to permit optional use of Qt in the core libraries to replace some usage of Boost, Xerces and Xalan.

My understanding is that quite a few of our end users are already using Qt in their applications. Depending upon the portability requirements of the different projects using OME Files, we could consider using Qt more. Or, we could keep the core interfaces using Standard C++ to the maximum extent possible. I would certainly appreciate feedback.

Simplifying filesystem support

Three filesystem variants are supported at present. In order of preference, these are std::filesystem (C++17 and later), Boost.Filesystem, and a set of Qt5 Core wrappers which mimic the interface of the former two.

In the long term, it is expected that std::filesystem will become the only supported choice, since it’s part of the standard library. However, while C++14 continues to be supported, Boost.Filesystem is needed, or the Qt5 wrappers where Boost is not possible to be used. At present, all three may still be required. However, understanding the exact usage requirements may allow the test matrix to be simplified. Part of this consideration is whether Boost or Qt5 is most preferable as a dependency. If every user is already using Qt5, that makes the decision simpler.

Note that the next version of Xcode on MacOS will support std::filesystem, which will make it supported on all platforms.

Current infrastructure

A large part of the work for 0.6.0 was acquiring and setting up the hardware needed for continous integration builds. Its scope and coverage now exceed the old and retired Jenkins-based infrastructure, but there are limits to how wide that scope can be. The current set of platforms is comprised of:

  • FreeBSD (Bare metal and VMware virtual machines; used for testing LLVM clang++ and libc++)
  • Linux (Ubuntu and CentOS Docker images on Digital Ocean; used for testing a variety of GCC versions)
  • MacOS (Bare metal and VMware virtual machines; used for testing the current stable release of Xcode)
  • Windows (Bare metal and VMware virtual machines; used for testing Visual Studio 2015, 2017 and 2019)

Build rationalisation

Due to resource limitations, the current set of supported platforms and configurations on each platform isn’t sustainable. Each platform, and each configuration on each platform, requires time to maintain and test. On the CI side, each of these requires infrastructure to support building and testing, which has a maintenance overhead as well as a bandwidth overhead uploading and downloading artefacts. It takes several hours to build and test a single merge request.

As an example, the ome-model and ome-files-cpp build pipelines for the 0.6.0 release builds show the set of platforms and configurations currently tested.

I would like to rationalise the configurations to reduce this maintenance cost, while still continuing to support the same set of platforms. I would like to propose the following changes for the next OME Files release:

Dropping support for Python 2

Currently, the ome-model and ome-files-py components, plus all components building Sphinx documentation, support building with both Python 2 and Python 3, including CI testing with both Python versions.

All supported platforms have Python 3 available, and Python 3 is tested on every platform. Python 2 is tested on a subset. I have verified that the code generation is byte-for-byte identical between the two, and we have been defaulting to Python 3 for all CI builds for almost a year now.

Unless there are any major objections, I would like to drop the CI builds for Python 2, and end support for Python 2.

Dropping support for Visual Studio 2015

Visual Studio 2015, 2017 and 2019 are currently supported. However, maintaining and testing on many Visual Studio versions is quite costly, and two is about the limit in practice. I would suggest that we drop VS2015 in favour of the more recent versions.

Both VS2017 and VS2019 support easier installation in Windows docker containers, and are compatible with the VS2015 toolset. The CI builds can be switched over to docker images should we drop VS2015 (currently all three are running on the host system, a VMware virtual machine).

Any feedback, public or private, would be greatly appreciated.

Kind regards,
Roger

3 Likes

Dear @rleigh-codelibre, thanks for your great work and for the shout-out to the community. Its highly appreciated.

About your individual topics:

  • Xerces and Xalan:
    The strong dependency on Xerces and Xalan has always been a bit worrying to me. I understand that xml is a powerful meta data storage container within OME Files. But the inherited cost in terms of build dependencies and code complexity seem relatively high to me, since at least we personally do not make much use of the additional features that xml provides.
    For this reason I personally would prefer a solution with “cheaper” complexity if possible. We do not make much use of XML in our applications, so any alternative with lower complexity seems like an improvement for us.
    A dependency on Qt would not come as a good alternative for us just for practical reasons: Qt is a huge library that has a very high build time, big library file size and a not completely free license for commercial use. In my eyes these disadvantages outweigh the benefits of the better maintained XML support.
  • Boost
    Personally I do not mind the Boost dependency. However I also do not see any issues with removing Boost in the long run. Just to me personally this comes at a low priority, mostly because removing Boost adds the new requirement of c++17 for certain aspects like std::filesystem support. c++17 is (as of now) not yet our standard on all platforms. As an alternative, I have seen projects allowing to switch between Boost and c++17 implementations using a compiler define transparently. While I have no personal experience with this, it seemed like a low-effort solution.
  • Qt
    Personally I would prefer to keep Qt optional.
  • Dropping support for Visual Studio 2015
    Is the test data publicly available? If yes then it may be an option to switch tests for less common platforms to community-driven public infrastructure. I.e. recently Microsoft opened their Azure DevOps pipelines with 3x 10 parallel runners for open source Github projects, for MacOS, Linux and MSVC. As long as all sources and test data are openly available, it would be well possible to leave it to the community to drive the testing of less common platforms. Basically, if anybody needs this support they can drive it themselves, without blocking your progress too much.

Just my two cents. Thanks a lot for your great work again!

Would libxml2/libxslt work? Docs claim that they compile in Windows, and there are C++ wrappers for ease of use.