bioinfo.pipeline.block.Bowtie2
Bioinformatics pipeline block to align sequencing reads to reference sequences
Since R2023a
Description
A Bowtie2
block enables you to map sequencing reads to reference
sequences.
The block requires the Bowtie 2 Support Package for Bioinformatics Toolbox™. If this support package is not installed, then a download link is provided. For details, see Bioinformatics Toolbox Software Support Packages.
Creation
Syntax
Description
creates a
b
= bioinfo.pipeline.block.Bowtie2Bowtie2
block.
also specifies additional alignment b
= bioinfo.pipeline.block.Bowtie2(options
)options
.
also specifies the output file name.b
= bioinfo.pipeline.block.Bowtie2(OutFilename=fileName
)
specifies additional options as the property names and values of a b
= bioinfo.pipeline.block.Bowtie2(Name=Value
)Bowtie2AlignOptions
object. This object is set as the value of the
Options
property of the block. For example, bt2Block =
bioinfo.pipeline.block.Bowtie2(Trim5=10)
sets the Trim5
property of the object to trim 10 residues from the 5' end.
Input Arguments
fileName
— Output file name
string | character vector
Output file name, specified as a string or character vector. The file extension
must end with .sam
. The block saves the mapping results to this
file.
Data Types: char
| string
options
— Bowtie2 options
Bowtie2AlignOptions
| string | character vector
Bowtie2 options, specified as a Bowtie2AlignOptions
object, string, or character vector.
If you are specifying a string or character vector, it must be in the native
bowtie2
option syntax (prefixed by one or two dashes) [1].
Data Types: char
| string
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: bt2Block = bioinfo.pipeline.block.Bowtie2(Trim3=6)
specifies to trim 6 residues from the 3' end.
Note
The following list of arguments is a partial list. For the complete list, refer to
the properties of
Bowtie2AlignOptions
object.
AllowDovetail
— Flag to allow dovetail configurations
false
or 0 (default) | true
or 1
Flag to allow dovetail configurations, specified as 1 (true
)
or 0 (false
). This property specifies whether the alignment of
one mate can extend past the beginning of the alignment of the other mate and be
considered concordant.
This property applies to paired-end reads only.
Data Types: double
| logical
AmbiguousPenalty
— Penalty for positions with ambiguous characters
1
(default) | nonnegative integer
Penalty for positions with ambiguous characters on the read sequence, reference sequence, or both, specified as a nonnegative integer.
Data Types: double
Properties
ErrorHandler
— Function to handle errors from run
method
function handle
Function to handle errors from the run
method of the block, specified as a function handle. The handle specifies the function to call
if the run method encounters an error within a pipeline. For the pipeline to continue after a
block fails, ErrorHandler
must return a structure that is compatible with
the output ports of the block. The error handling function is called with the following two inputs:
Structure with these fields:
Field Description identifier Identifier of the error that occurred message Text of the error message index Linear index indicating which block process failed in the parallel run. By default, the index is 1 because there is only one run per block. For details on how block inputs can be split across different dimensions for multiple run calls, see Bioinformatics Pipeline SplitDimension. Input structure passed to the
run
method when it fails
Data Types: function_handle
Inputs
— Input ports
structure
This property is read-only.
Input ports of the block, specified as a structure. The field
names of the structure are the names of the block input ports, and the field values are bioinfo.pipeline.Input
objects. These objects describe the input port behaviors.
The input port names are the expected field names of the input structure that you pass to the
block run
method.
The Bowtie2
block Inputs
structure has the
following fields:
IndexBaseName
— Base name of the reference index files. The index files are in theBT2
orBT21
format. For example, if you haveDmel_chr4.1.bt2
andDmel_chr4.2.bt2
as your index files, specifyIndexBaseName
as"Dmel_chr4"
. This input is a required input that must be satisfied.Reads1Files
— Names of FASTQ files for the first mate reads or single-end reads. For paired-end data, sequences inReads1Files
must correspond file-for-file and read-for-read to sequences inReads2Files
. This input is a required input that must be satisfied.Reads2Files
— Names of FASTQ files for the second mate reads for paired-end data. This input is an optional input.
The default value for each of these inputs is a bioinfo.pipeline.datatypes.Unset
object, which means that the input value is
not set yet.
Data Types: struct
Outputs
— Output ports
structure
This property is read-only.
Output ports of the block, specified as a structure. The field
names of the structure are the names of the block output ports, and the field values are bioinfo.pipeline.Output
objects. These objects describe the output port behaviors.
The field names of the output structure returned by the block run
method
are the same as the output port names.
The Bowtie2
block Outputs
structure has the
field named SAMFile
.
Data Types: struct
Options
— Bowtie2 options
Bowtie2AlignOptions
object (default)
Bowtie2 options, specified as a Bowtie2AlignOptions
object. The default value is a default
Bowtie2AlignOptions
object.
OutFilename
— Output file name
"Aligned.sam"
(default) | string
Output file name, specified as a string. By default, the output file is named as
Aligned.sam
, which contains the mapping results.
Data Types: string
Object Functions
compile | Perform block-specific additional checks and validations |
copy | Copy array of handle objects |
emptyInputs | Create input structure for use with run method |
eval | Evaluate block object |
run | Run block object |
Examples
Align Reads Using Bowtie 2
Import the pipeline and block objects needed for the example.
import bioinfo.pipeline.block.* import bioinfo.pipeline.Pipeline
Create a FileChooser block to select a read file provided with the toolbox.
FC = FileChooser(which("SRR6008575_10k_1.fq"));
Create a Bowtie2 block and a pipeline object.
B = Bowtie2; P = Pipeline;
Add blocks to the pipeline.
addBlock(P, [FC B]);
Set the IndexBaseName input port value to "Dmel_chr4" which is the base name of the index files for Drosophila genome provided with the toolbox.
B.Inputs.IndexBaseName.Value = "Dmel_chr4";
Connect the blocks.
connect(P, FC, B, ["Files", "Reads1Files"]);
Run the pipeline.
run(P); R = results(P,B)
R = struct with fields:
SAMFile: [1×1 bioinfo.pipeline.datatype.File]
Call unwrap to see the location of the output file.
unwrap(R.SAMFile)
Fetch Parallel-Running Block Results from Bioinformatics Pipeline
Import the pipeline and block objects needed for the example.
import bioinfo.pipeline.Pipeline import bioinfo.pipeline.blocks.*
Create a pipeline.
P = Pipeline;
A FileChooser
block can take in a URL of a remote file as an input and download the file to make it available for the downstream blocks. Download the file Homo_sapiens.GRCh38.dna.chromosome.19.fa.gz
that contains the human reference genome chromosome 19 in the FASTA format.
chr19url = "http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.19.fa.gz";
fileChooserBlock1 = FileChooser(chr19url);
A UserFunction
block to unzip the downloaded reference genome file using the gunzip
function. When you create the block, you can specify the function to call and set the input and output port names that map to the input and output arguments of the corresponding function, respectively. In this example, name the input port as "ZippedFilenames
" and the output port as "UnzippedFilenames
"
gunzipUserFunctionBlock = UserFunction(@gunzip,RequiredArguments="ZippedFilenames",OutputArguments="UnzippedFilenames");
The reference genome file needs to be indexed in before reads can be aligned to it. To generate the indices, create a Bowtie2Build
block.
bowtie2BuildBlock = Bowtie2Build;
Add the blocks.
addBlock(P,[fileChooserBlock1,gunzipUserFunctionBlock,bowtie2BuildBlock]);
Connect the output port named "Files
" of fileChooserBlock1
to the input port named "ZippedFileNames
" of gunzipUserFunctionBlock
. Also connect the output "UnzippedFilenames
" of gunzipUserFunctionBlock
to the input "ReferenceFASTAFiles
" of bowtie2BuildBlock
.
connect(P,fileChooserBlock1,gunzipUserFunctionBlock,["Files","ZippedFilenames"]); connect(P,gunzipUserFunctionBlock,bowtie2BuildBlock,["UnzippedFilenames","ReferenceFASTAFiles"]);
Create blocks for downloading RNA-seq data.
adrenal_1_url = "https://usegalaxy.org/dataset/display?dataset_id=d44d2a324474d1aa&to_ext=fq"; adrenal_2_url = "https://usegalaxy.org/dataset/display?dataset_id=d08360a1c0ffdc62&to_ext=fq"; brain_1_url = "https://usegalaxy.org/dataset/display?dataset_id=f187acb8015d6c7f&to_ext=fq"; brain_2_url = "https://usegalaxy.org/dataset/display?dataset_id=08c45996966d7ded&to_ext=fq"; fileChooserBlock2 = FileChooser([brain_1_url;adrenal_1_url]); fileChooserBlock3 = FileChooser([brain_2_url;adrenal_2_url]);
Create a Bowtie2
block for mapping reads.
bowtie2Block = Bowtie2;
Add blocks to the pipeline.
addBlock(P,[fileChooserBlock2,fileChooserBlock3,bowtie2Block]);
Connect the blocks.
connect(P,bowtie2BuildBlock,bowtie2Block,["IndexBaseName","IndexBaseName"]); connect(P,fileChooserBlock2,bowtie2Block,["Files","Reads1Files"]); connect(P,fileChooserBlock3,bowtie2Block,["Files","Reads2Files"]);
Run the pipeline in parallel.
run(P,UseParallel=true);
Starting parallel pool (parpool) using the 'Processes' profile ... Connected to parallel pool with 4 workers.
If you try to get the block results while the pipeline is still running, you get an incomplete result.
bt2Results = results(P,bowtie2Block)
bt2Results = Incomplete pipeline result.
Use fetchResults
to wait for the blocks that are running in parallel to complete and get the results.
bt2Results = fetchResults(P,bowtie2Block)
bt2Results = struct with fields:
SAMFile: [1×1 bioinfo.pipeline.datatype.File]
Tip: Use the unwrap
method to see the location of the output file. For example, unwrap(bt2Results.SAMFile)
shows the location of the sorted SAM file.
Alternatively, you can use the following two commands instead of fetchResults
.
wait(P,bowtie2Block); bt2Results = results(P,bowtie2Block);
References
[1] Langmead, Ben, and Steven L Salzberg. “Fast Gapped-Read Alignment with Bowtie 2.” Nature Methods 9, no. 4 (April 2012): 357–59. https://doi.org/10.1038/nmeth.1923.
Version History
Introduced in R2023a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)