Bioinformatics Toolbox
Read, analyze, and visualize genomic and proteomic data
Have questions? Contact sales.
Have questions? Contact sales.
Bioinformatics Toolbox™ provides algorithms and apps for Next Generation Sequencing (NGS), microarray analysis, mass spectrometry, and gene ontology. Using toolbox functions, you can read genomic and proteomic data from standard file formats such as SAM, FASTA, CEL, and CDF, as well as from online databases such as the NCBI Gene Expression Omnibus and GenBank®. You can explore and visualize this data with sequence browsers, spatial heatmaps, and clustergrams. The toolbox also provides statistical techniques for detecting peaks, imputing values for missing data, and selecting features.
You can combine toolbox functions to support common bioinformatics workflows. You can use ChIP-Seq data to identify transcription factors; analyze RNA-Seq data to identify differentially expressed genes; identify copy number variants and SNPs in microarray data; and classify protein profiles using mass spectrometry data.
Learn more about computational biology.
Bioinformatics Toolbox provides algorithms and visualization techniques for Next Generation Sequencing analysis. The toolbox enables you to analyze whole genomes while performing calculations at a base pair level of resolution. You can use the NGS browser to visualize and investigate short-read alignments using either single-end or paired-end short reads. You can also build custom analysis routines, as shown in the following examples.
The data sets used in Next Generation Sequencing analysis are often too large to fit into physical memory. Bioinformatics Toolbox provides specialized data containers that enable you to analyze entire genomes.
The BioIndexedFile
object lets you access the contents of text files containing nonuniform-sized entries such as sequences, annotations, and cross references to the data set. You can generate these objects from tables, flat files, or application-specific formats such as SAM, FASTA, and FASTQ.
The BioMap
class stores information from short-read sequences, including sequence headers, read sequences, quality scores, and data about alignment and mapping to a single reference sequence. You can use object properties and methods to explore, access, filter, and manipulate the data contained in a BioMap object.
You can use several methods for normalizing microarray data, including lowess, global mean, median absolute deviation (MAD), and quantile normalization. You can apply these methods to the entire microarray chip or to specific regions or blocks. Filtering and imputation functions let you clean raw data before running analysis and visualization routines.
Bioinformatics Toolbox lets you perform background adjustments and calculate gene (probe set) expression values from Affymetrix® microarray probe-level data using Robust Multi-Array Average (RMA) and GC Robust Multi-Array Average (GCRMA) procedures. You can apply circular binary segmentation to array CGH data and estimate the false discovery rate of multiple hypotheses testing of gene expression data from a microarray experiment. You can also perform rank-invariant set normalization on either probe intensities for multiple Affymetrix CEL files or gene expression values from two different experimental conditions.
Specialized routines for visualizing microarray data include volcano plots, box plots, loglog plots, I-R plots, and spatial heat maps of the microarray. You can also visualize ideograms with G-banding patterns.
Using routines from Statistics and Machine Learning Toolbox™, you can classify your results, perform hierarchical and K-means clustering, and represent your microarray data in statistical visualizations, such as 2D clustergrams with optimal leaf ordering, heat maps, principle component plots, and classification trees.
Bioinformatics Toolbox provides a set of functions for mass spectrometry data analysis. These functions enable preprocessing, classification, and marker identification from SELDI, MALDI, LC/MS, and GC/MS data. Preprocessing functions include baseline correction, smoothing, calibration, and resampling. You can align raw spectra data using the M/Z axis and perform retention-time alignment on LC/MS and GC/MS data. You can plot multiple spectra simultaneously.
You can smooth, align, and normalize spectra and then use classification and statistical learning tools to create classifiers and identify potential biomarkers
Bioinformatics Toolbox enables you to apply basic graph theory to sparse matrices. You can create, view, and manipulate graphs such as interaction maps, hierarchy plots, and pathways. You can determine and view shortest paths in graphs, test for cycles in directed graphs, and find isomorphism between two graphs.
Bioinformatics Toolbox provides functions that build on the classification and statistical learning algorithms in Statistics and Machine Learning Toolbox, including:
Bioinformatics Toolbox enables you to access the Gene Ontology database from within MATLAB®, parse gene ontology annotated files, and obtain subsets of the ontology such as ancestors, descendants, or relatives.
The toolbox provides functions, objects, and methods for sequence analysis, including pairwise sequence, sequence profile, and multiple sequence alignment. These include:
The toolbox lets you manipulate and analyze your sequences to gain a deeper understanding of your data. You can:
The toolbox enables you to visualize sequences and alignments. You can view linear or circular maps of sequences annotated with GenBank features. You can visualize secondary structure diagrams of an RNA sequence. Interactive viewers let you explore and modify pairwise and multiple sequence alignments.
The toolbox enables you to create and edit phylogenetic trees. You can calculate pairwise distances between aligned or unaligned nucleotide or amino acid sequences using a broad range of similarity metrics such as Jukes-Cantor, p-distance, alignment-score, or a user-defined distance method. Phylogenetic trees are constructed using hierarchical linkage with a variety of techniques, including neighbor joining, single and complete linkage, and Unweighted Pair Group Method Average (UPGMA).
The toolbox supports weighting and rerooting trees, calculating subtrees, and calculating the canonical form of trees. The phylogenetic tree viewer lets you prune, reorder, and rename branches; explore distances; and read or write Newick-formatted files. You can also use the annotation tools in MATLAB to create presentation-quality trees.
The toolbox provides protein sequence analysis techniques, including routines for calculating properties of a peptide sequence such as atomic composition, isoelectric point, and molecular weight. You can determine the amino acid composition of protein sequences, cleave a protein with an enzyme, and create backbone plots and Ramachandran plots of PDB data. You can use the Sequence Tool to view the properties of an amino acid sequence or use the Molecule Viewer to display and manipulate 3D molecular structures.
You can access standard file formats for biological data, online databases, and Web sites. Bioinformatics Toolbox enables you to:
MATLAB provides tools that let you turn your data analysis program into a customized software application. These include development tools for building user interfaces, a visual integrated development environment, and a profiler. MATLAB application deployment products let you integrate your MATLAB algorithms with existing C, C++, and Java™ applications, deploy the developed algorithms and custom interfaces as standalone applications, convert MATLAB algorithms into Microsoft® .NET or COM components that can be accessed from any COM-based application, and create Microsoft Excel® add-ins.
You can integrate MATLAB with commonly used bioinformatics tools such as BioPerl, SOAP-based Web services, and COM plug-ins.