Hi all,
I have just finished a ‘Dimensionality Reduction’ plugin (GitHub repo) for Imagej/Fiji which hopefully some of you will find interesting and useful. The plugin captures data from an open image stack or folder of images and performs one of three dimensionality reduction techniques (PCA, t-SNE, or UMAP) to project the high-dimensional data into a lower dimensional (2D) space that is then plotted onto an ImageJ scatter-plot. Under-the-hood, the plugin uses two really-awesome libraries (t-SNE: Leif Jonsson’s pure Java implementation of Van Der Maaten and Hinton’s t-sne clustering algorithm; and UMAP: Jesse Paquette’s (of tag.bio) java implementation of UMAP, based on the reference Python implementation). Both are distributed under open-source licences, so even if my implementation doesn’t suit you, perhaps their libraries can find a place in your respective projects!
Using the ‘Dimensionality Reduction’ plugin on an image stack or folder of images
The plugin can be called from a GUI dropdown ‘Plugins>Dimensionality Reduction>…’
Which will result in a dialogue box being presented, allowing the user to select some key parameters:
(UMAP dialogue example)
Or by macro, with the following:
run("PCA");
//or
//run t-SNE with default parameters:
run("t-SNE");
//or specifying (example) parameters:
run("t-SNE", "parallel initial_dims=30 perplexity=50 max_iterations=1000");
//or
//run UMAP with default parameters:
run("UMAP");
//or specifying (example) parameters:
run("UMAP", "n_threads=8 n_nearest=20 metric=manhattan");
In this version I haven’t incorporated all the parameter options offered in the parent libraries, but I feel like I have covered many of the most important.
As an example, running UMAP on a subset (total=24,754) of ‘handwritten’ mnist numbers (0-3) in an image stack of this kind:
results in this projection:
We can also specify a ‘label’ file upon calling the plugin, to colour the datapoints by ground-truth (or whatever), with the following macro command:
run("UMAP", "label_path=[C:/Users/Antinos/Documents/My_label_file.csv]");
//omitting other specified parameters for clarity of the example
With the .csv label file structured as a single column of correctly ordered (with respect to the image stack) labels with a single column-header.
With a label file specified, this plot is produced:
Using the plugin on non-image data
So images are just ordered arrays of data, therefore it makes sense that these dimensionality reduction techniques can be applied to any non-image data just as well as on images. For convenience, but also because it is an inherently nice way to handle data, to run the plugin on non-image data I recommend encoding that data (e.g. numerical results table/ microarray/ RNA-seq/ or whatever) in an ImageJ image-stack before calling the plugin as usual.
To get you started I have also included a convenience function ‘Results to Stack’ with the plugin, which can be called via the GUI interface:
Or by macro:
run("Results to Stack");
Which pulls ONLY NUMERICAL data from an ImageJ results table to build an N stack of nx1 images, where N is the number of samples (stacks) and n is the number of dimensions. Maybe counter-intuitively, the table should be ordered with rows as dimensions and columns as samples.
Anyway, as an example, adding RNA-seq data of 837 single-cells from the GTEx project GSE45878 describing the expression of 22704 genes (I may have trimmed the original set a little) you can generate this very odd image stack:
From which you can plot the following (example using UMAP):
Conclusion and installation
I won’t make this post too much longer. Feel free to download the plugin from my googledrive. I have also included some test image-stacks and accompanying label.csv files to play with. To install the plugin, copy ‘Dimensionality_Reduction-1.0.0.jar’ to your Fiji plugins folder.
This plugin will also work well with my recently created ‘Cluster My Data’ plugin:
Some of the other plugin parameters I haven’t mentioned include:
- t-SNE ‘parallel’: as you might guess, this runs the plugin in a multi-threaded fashion.
- UMAP ‘metric=’: allows the user to pick between metrics to measure distance in the input space, including:
- euclidean (default)
- manhattan
- chebyshev
- minkowski
- canberra
- braycurtis
- cosine
- correlation
- haversine
- hamming
- jaccard
- dice
- russelrao
- kulsinski
- rogerstanimoto
- sokalmichener
- sokalsneath
- yule
It feels like I have been spamming this forum with posts recently (at least by my standards), so apologies for that and for this essay of a post.
Kind regards.