Main Content

bwaindex

Create BWA indices from reference sequence

Since R2020b

Description

bwaindex(referenceFile) creates BWA index files for the reference sequence in referenceFile [1][2]. By default, the function writes the index files to the same directory as referenceFile.

The index files are in the AMB, ANN, BWT, PAC, and SA file formats.

bwaindex requires the BWA Support Package for Bioinformatics Toolbox™. If the support package is not installed, then the function provides a download link. For details, see Bioinformatics Toolbox Software Support Packages.

example

bwaindex(referenceFile,indexOptions) uses additional options specified by indexOptions.

example

bwaindex(referenceFile,Name,Value) uses additional options specified by one or more name-value pair arguments. For example, bwaindex(referenceFile,'Algorithm','is') specifies the linear-time algorithm.

example

Examples

collapse all

This example requires the BWA Support Package for Bioinformatics Toolbox™. If the support package is not installed, the software provides a download link. For details, see Bioinformatics Toolbox Software Support Packages.

Build a set of index files for the Drosophila genome. This example uses the reference sequence Dmel_chr4.fa, provided with the toolbox. The 'Prefix' argument lets you define the prefix of the output index files. You can also include the file path information. For this example, define the prefix as Dmel_chr4 and save the index files in the current directory.

bwaindex('Dmel_chr4.fa','Prefix','./Dmel_chr4');

As an alternative to specifying name-value pair arguments, you can use the BWAIndexOptions object to specify the indexing options.

indexOpt = BWAIndexOptions;
indexOpt.Prefix = './Dmel_chr4';
indexOpt.Algorithm = 'bwtsw';
bwaindex('Dmel_chr4.fa',indexOpt);

Once the index files are ready, map the read sequences to the reference using bwamem. Two pair-end read input files are already provided with the toolbox. Using name-value pair arguments, you can specify different alignment options, such as the number of parallel threads to use.

bwamem('Dmel_chr4','SRR6008575_10k_1.fq','SRR6008575_10k_2.fq','SRR6008575_10k_chr4.sam','NumThreads',4);

Alternatively, you can use BWAMEMoptions to specify the alignment options.

alignOpt = BWAMEMOptions;
alignOpt.NumThreads = 4;
bwamem('Dmel_chr4','SRR6008575_10k_1.fq','SRR6008575_10k_2.fq','SRR6008575_10k_chr4.sam',alignOpt)

Input Arguments

collapse all

Reference file name, specified as a character vector or string. The file must be a FASTA-formatted file with the reference sequence information for indexing.

Data Types: char | string

Additional options for indexing, specified as a BWAIndexOptions object, character vector, or string. The character vector or string must be in the bwa index native syntax (prefixed by a dash). If you specify a BWAIndexOptions object, the function uses only those properties that are set or modified.

Data Types: char | string

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: bwaindex(referenceFile,'Algorithm','bwtsw') specifies to use the BWT-SW algorithm.

Algorithm to construct the BWT (Burrows-Wheeler transform) index, specified as a character vector or string. Options are:

  • 'is' — Linear-time algorithm. The memory requirement for using this option is 5.37 times the size of the database. You cannot use this option if your database is larger than 2 GB.

  • 'bwtsw' — BWT-SW algorithm.

The default algorithm is chosen automatically based on the size of the reference genome.

Data Types: char | string

Number of bases processed per batch in the bwtsw algorithm, specified as a positive scalar.

Data Types: double

Additional commands, specified as a character vector or string.

The commands must be in the native syntax (prefixed by one or two dashes). Use this option to apply undocumented flags and flags without corresponding MATLAB® properties.

Example: 'ExtraCommand','-6'

Data Types: char | string

Flag to include all available options with the corresponding default values when converting to the original options syntax, specified as true or false.

The original (native) syntax is prefixed by one or two dashes. By default, the function converts only the specified options. If the value is true, the software converts all available options, with default values for unspecified options, to the original syntax.

Note

If you set IncludeAll to true, the software converts all available properties, using default values for unspecified properties. The only exception is when the default value of a property is NaN, Inf, [], '', or "". In this case, the software does not translate the corresponding property.

Example: 'IncludeAll',true

Data Types: logical

Prefix for the output index files, specified as a character vector or string. You can specify only the prefix or a file path and prefix. The default value is the same as the input FASTA file name.

Example: 'Prefix','D:/ngs/GRCh38_p12'

Data Types: char | string

References

[1] Li, Heng, and Richard Durbin. “Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform.” Bioinformatics 25, no. 14 (July 15, 2009): 1754–60. https://doi.org/10.1093/bioinformatics/btp324.

[2] Li, Heng, and Richard Durbin. “Fast and Accurate Long-Read Alignment with Burrows–Wheeler Transform.” Bioinformatics 26, no. 5 (March 1, 2010): 589–95. https://doi.org/10.1093/bioinformatics/btp698.

Version History

Introduced in R2020b