Uniform Manifold Approximation and Projection (UMAP)

version 4.2.1 (3.78 MB) by Stephen Meehan
An algorithm for manifold learning and dimension reduction.

4.2K Downloads

Updated Thu, 20 Oct 2022 06:55:43 +0000

View License

Given a set of high-dimensional data, run_umap.m produces a lower-dimensional representation of the data for purposes of data visualization and exploration. See the comments at the top of the file run_umap.m for documentation and many examples of how to use this code.
The UMAP algorithm is the invention of Leland McInnes, John Healy, and James Melville. See their original paper for a long-form description (https://arxiv.org/pdf/1802.03426.pdf). Also see the documentation for the original Python implementation (https://umap-learn.readthedocs.io/en/latest/index.html).
This MATLAB implementation follows a very similar structure to the Python implementation from 2019, and many of the function descriptions are nearly identical.
Here are some additional tools we have added to our implementation:
1) The ability to detect clusters in the low-dimensional output of UMAP. As clustering method, we invoke either DBM (described at https://www.hindawi.com/journals/abi/2009/686759/) or DBSCAN (built in to MATLAB R2019a and later).
2) Visual and computational tools for data group comparisons. Data groups can be defined either by running clustering on the data islands resulting from UMAP’s reduction or by external classification labels. We use a change quantification metric (QFMatch) which detects similarity in both mass & distance (described at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5818510/) as well as an F-score for measuring overlap when the groups are different classifications for the same data. For visualizing data groups, we provide a dendrogram (described as QF-tree at https://www.nature.com/articles/s42003-019-0467-6), a Multidimensional scaling view and sortable tables which show each data group’s similarity, overlap, false positive rate and false negative rate. The documentation in run_umap.m and UMAP_extra_results.m describes these and additional related tools provided.
3) A PredictionAdjudicator feature that helps determine how well one classification’s subsets predict another’s.
4) A complementary independent classifier named “exhaustive projection pursuit” (EPP) that generates labels both for supervising UMAP as well as for classification comparison research. EPP is described at https://onedrive.live.com/?authkey=%21ALyGEpe8AqP2sMQ&cid=FFEEA79AC523CD46&id=FFEEA79AC523CD46%21209192&parId=FFEEA79AC523CD46%21204865&o=OneUp.
5) The ability to use neural networks either from MATLAB's "fitcnet" function or the Python package TensorFlow to learn from a training data set and provide a classification on new data to either compare against or merge with UMAP classification.
Without the aid of any compression, this MATLAB UMAP implementation tends to be faster than the current Python implementation (version 0.5.2 of umap-learn). Due to File Exchange requirements, we only supply the C++ source code for the MEX modules we use to accelerate the computations. The command "run_umap" (without arguments) lets you select the immediate download of these files or the building of these files with C++ source code and build script that we provide. See the fast_approximation argument comments in the run_umap.m file for further speedups. As examples 13 to 15 show, you can test the speed difference between the implementations for yourself on your computer by setting the 'python' argument to true.
The Bioinformatics Toolbox is required to change the 'qf_tree' argument, which is optional.
This implementation is a work in progress. It has been looked over by Leland McInnes, who in 2019 described it as "a fairly faithful direct translation of the original Python code". We hope to continue improving it in the future.
Provided by the Herzenberg Lab at Stanford University.
In our latest version we add interoperability with FlowJo, a widely used analysis app for flow cytometry distributed by BD Life Sciences. FlowJo had its beginnings at the Herzenberg Lab so we are pleased to bridge all that is in this UMAP package with FlowJo. You can supervise UMAP with population definitions made in a FlowJo workspace. Moreover, you can export UMAP regions of interest back into FlowJo workspaces.
We appreciate all and any help in finding bugs. Our priority has been determining the suitability of our concepts for research publications in flow cytometry for the use of UMAP supervised templates and exhaustive projection pursuit.

Cite As

Connor Meehan, Jonathan Ebrahimian, Wayne Moore, and Stephen Meehan (2022). Uniform Manifold Approximation and Projection (UMAP) (https://www.mathworks.com/matlabcentral/fileexchange/71902), MATLAB Central File Exchange.

MATLAB Release Compatibility
Created with R2021b
Compatible with R2017a to R2021b
Platform Compatibility
Windows macOS Linux
Acknowledgements

Inspired: CytoMAP

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

epp

fcs

mlp

umap

util