Dimensionality Reduction (UMAP, t-SNE, and PCA) plugin for ImageJ/Fiji

Hi all,

I have just finished a ‘Dimensionality Reduction’ plugin (GitHub repo) for Imagej/Fiji which hopefully some of you will find interesting and useful. The plugin captures data from an open image stack or folder of images and performs one of three dimensionality reduction techniques (PCA, t-SNE, or UMAP) to project the high-dimensional data into a lower dimensional (2D) space that is then plotted onto an ImageJ scatter-plot. Under-the-hood, the plugin uses two really-awesome libraries (t-SNE: Leif Jonsson’s pure Java implementation of Van Der Maaten and Hinton’s t-sne clustering algorithm; and UMAP: Jesse Paquette’s (of tag.bio) java implementation of UMAP, based on the reference Python implementation). Both are distributed under open-source licences, so even if my implementation doesn’t suit you, perhaps their libraries can find a place in your respective projects!


Using the ‘Dimensionality Reduction’ plugin on an image stack or folder of images


The plugin can be called from a GUI dropdown ‘Plugins>Dimensionality Reduction>…’
GUI_dropdown

Which will result in a dialogue box being presented, allowing the user to select some key parameters:
UMAP_options_dialogue (UMAP dialogue example)

Or by macro, with the following:

run("PCA");

      //or

//run t-SNE with default parameters:
run("t-SNE");
//or specifying (example) parameters:
run("t-SNE", "parallel initial_dims=30 perplexity=50 max_iterations=1000");

      //or

//run UMAP with default parameters:
run("UMAP");
//or specifying (example) parameters:
run("UMAP", "n_threads=8 n_nearest=20 metric=manhattan");

In this version I haven’t incorporated all the parameter options offered in the parent libraries, but I feel like I have covered many of the most important.

As an example, running UMAP on a subset (total=24,754) of ‘handwritten’ mnist numbers (0-3) in an image stack of this kind:
0-3 mini subset
results in this projection:

Which nicely represents the 4 groups of numbers!
We can also specify a ‘label’ file upon calling the plugin, to colour the datapoints by ground-truth (or whatever), with the following macro command:

run("UMAP", "label_path=[C:/Users/Antinos/Documents/My_label_file.csv]");
//omitting other specified parameters for clarity of the example

With the .csv label file structured as a single column of correctly ordered (with respect to the image stack) labels with a single column-header.

With a label file specified, this plot is produced:


Using the plugin on non-image data

So images are just ordered arrays of data, therefore it makes sense that these dimensionality reduction techniques can be applied to any non-image data just as well as on images. For convenience, but also because it is an inherently nice way to handle data, to run the plugin on non-image data I recommend encoding that data (e.g. numerical results table/ microarray/ RNA-seq/ or whatever) in an ImageJ image-stack before calling the plugin as usual.

To get you started I have also included a convenience function ‘Results to Stack’ with the plugin, which can be called via the GUI interface:
GUI_dropdown_Results-to-Stack

Or by macro:

run("Results to Stack");

Which pulls ONLY NUMERICAL data from an ImageJ results table to build an N stack of nx1 images, where N is the number of samples (stacks) and n is the number of dimensions. Maybe counter-intuitively, the table should be ordered with rows as dimensions and columns as samples.

Anyway, as an example, adding RNA-seq data of 837 single-cells from the GTEx project GSE45878 describing the expression of 22704 genes (I may have trimmed the original set a little) you can generate this very odd image stack:


From which you can plot the following (example using UMAP):


Conclusion and installation

I won’t make this post too much longer. Feel free to download the plugin from my googledrive. I have also included some test image-stacks and accompanying label.csv files to play with. To install the plugin, copy ‘Dimensionality_Reduction-1.0.0.jar’ to your Fiji plugins folder.

This plugin will also work well with my recently created ‘Cluster My Data’ plugin:

Some of the other plugin parameters I haven’t mentioned include:

  • t-SNE ‘parallel’: as you might guess, this runs the plugin in a multi-threaded fashion.
  • UMAP ‘metric=’: allows the user to pick between metrics to measure distance in the input space, including:
    • euclidean (default)
    • manhattan
    • chebyshev
    • minkowski
    • canberra
    • braycurtis
    • cosine
    • correlation
    • haversine
    • hamming
    • jaccard
    • dice
    • russelrao
    • kulsinski
    • rogerstanimoto
    • sokalmichener
    • sokalsneath
    • yule

It feels like I have been spamming this forum with posts recently (at least by my standards), so apologies for that and for this essay of a post.

Kind regards.

15 Likes

Hello Antinos,
Thanks for the detailed post, it’s very interesting !
I have played a bit with dimensionality reduction in the tensorflow embedding projector but having an equivalent in ImageJ/Fiji could be handy too !

I am not actively using it those days though but do you have the source code on a github repository or so ? It could help to find out how to use it in custom script or plugins, in addition to the macro language.
With GitHub you can also add some documentation to the project, receive some external contributions, and even add a DOI to the repository with Zenodo if you want to be credited for this work :wink:

I think dimensionality reduction is still underestimated in the life-science community, hopefully with such user-friendly contribution it will spread !

1 Like

Hi Antinos,

this looks like a great tool! Thank you!
I also think that you should put in on github. Makes feedback much easier.
Just a small question for practical application.

Aim: UMAP from cells of a multiplex IF image (e.g. 2000 x 2000 px)
Method:

  • detect cells
  • create individual images for each cell
  • create stack from all cell images
  • run dimensionality reduction

Would this be the correct approach? Since the cells have different sizes, I would have to pad them with black (0) to make all images have the same dimensions. How does black background (value 0) influence the dimensionality reduction? So, does a cell without padding end up in the same cluster as a the same cell padded 30px with 0?

Thank you again and best regards,
Mario

Hi @LThomas and @marioK,
Thanks for your messages. I was planning to add the source to github (probably today). The java files are also available in the jar, which you can access by unzipping.
I agree that DR has a tonne of potential. I think it’s already found its way into many awesome tools mentioned on this forum (e.g. Cytomap, QuPath etc), but I thought there was a little gap for a native ImageJ implementation.

Mario, you are definitely on the right track with your proposed workflow. However, for multiplexed IF you’d have to think carfeully about what consitutes your multidimensional data. Raw pixel values may be approariate to feed into a DR pipeline but perhaps morphometric and fluorescence measures (similar to what is displayed in a Results table) may be more appropriate. The sky is the limit really. The original datasets can certainly be multimodal (e.g. size, colour, species etc), but the easiest applications are on high-dimensional data of a single type (e.g. gene expression). There are some considerations that should be made for data-shaping (mean centreing etc) that are more applicable to one or the other approaches.
One thing to keep in mind, is that DR gives you the opportunity to create an unsupervised profile/fingerprint of a sample based on the underlying structure of its constiuent parts (not a formal technical description!), and where you go from there and how you can use it to find sub-groups or to compare one sample-set to another is up to you.

Kind regards.

@antinos
Yes, my first approach was creating a result table with suitable measurements for each cell.
Unfortunately, the “Multi Measure” function of the ROI Manager performs the measurements per channel and appends the results. Hence I end up with multiple entries for the same cell, one for each channel.
Of coures it is no problem to clean-up this table, but it defeats the purpose of trying out your ImageJ implementation.

Best regards,
Mario

Hi @marioK,

to answer some of your questions directly, which could also hlep others start to play with this. When DR is applied directly to images, the images are normally a lot smaller than 2000x2000px. Think about whether you need that number of dimensions (4,000,000 per image) to encode data that is applicable to the model… quite often you can get away with a heavily downsampled version (but please feel free to test the hypothesis!). Depending on the underlying structures you are attempting to capture, you may even be able to go down to the 200x200 or even 20x20 pixel level.
As for padding, cell size may actually be an important source of variance across the samplesets (helping to define one cell population compared to another in the lower dimensional space)! This is where the ‘data-shaping’ that I mentioned in my last post comes in. The background should be made as uniform as possible, if you don’t want this to become an important wieghting factor in the model, and the cells should be in the same place (normally centred in the middle of the image). Cell shape has its own intrinsic noise, unless you also take the time to orientate each cell individually along some logical axis, but some of that will be reduced over mutlipe N numbers (usually the more the merrier for DR). There may also be more subtle advice for this kind of implementation available in the literature. Black pixels shouldn’t intrinsically affect any of the DR methods available in my plugin, but it’s an interesting question. Perhaps a noise fill would be better… I may test this for fun later.

Kind regards.

2 Likes

I detected by chance a minor main menu glitch after installation in my environment where I just recently allowed the addition of entries in the main menubar. Your plugin produced a menu without name - last three lines.

However this is visible in ImageJ, too, but not directly in the main menu.

You defined a main menu in the plugins.config file without a name. You see this if you open Plugins->Utilities->Control Panel (missing name under Help).

So the correct main menu config would be:

Plugins>Dimensionality Reduction, "PCA no options", dimred.gui.pca_gui
Plugins>Dimensionality Reduction, "t-SNE with options", dimred.gui.tsne_gui
Plugins>Dimensionality Reduction, "UMAP with options", dimred.gui.umap_gui
Plugins>Dimensionality Reduction, "-" null
Plugins>Dimensionality Reduction, "Results to Stack", dimred.gui.results_to_stack

Dimensionality Reduction, "PCA", dimred.Pca_
Dimensionality Reduction, "t-SNE", dimred.Tsne_
Dimensionality Reduction, "UMAP", dimred.Umap_
1 Like

Oh dear,
This is when you all find out how improvised my methods are! You are very right, and you can actually find the links by clicking just to the right of ‘Help’ in the main bar! :sweat_smile:
I thought I had found a way to link the display names (PCA, t-SNE, and UMAP) with the appropriate classes (allowing ImageJ to populate the macro autocomplete etc) without creating menu entries or needing to refactor my build. There’s probably some other way to handle this in the config file, otherwise I’ll look over the issue later and sort it out on the source end. This is an artefact of adding the gui classes and links as an afterthought.

Thanks for spotting the issue!

Ok, it was an easy fix. Just deleted the preceding comma from the last three config file entries:

Plugins>Dimensionality Reduction, "PCA no options", dimred.gui.pca_gui
Plugins>Dimensionality Reduction, "t-SNE with options", dimred.gui.tsne_gui
Plugins>Dimensionality Reduction, "UMAP with options", dimred.gui.umap_gui
Plugins>Dimensionality Reduction, "-" null
Plugins>Dimensionality Reduction, "Results to Stack", dimred.gui.results_to_stack

"PCA", dimred.Pca_
"t-SNE", dimred.Tsne_
"UMAP", dimred.Umap_

Sorry, I’m just working some of this out as I go along.
The updated file is now in the googledrive folder.

Kind regards.

1 Like

Not trying to bump this. Just added the requested GitHub repo link as an edit to the beginning of the main post.

2 Likes