Main Content

run

Run pipeline

Since R2023a

Description

example

run(pipeline) runs the pipeline. The pipeline must have all the input ports satisfied.

example

run(pipeline,inputStruct) runs the pipeline using the structure inputStruct as an input. This syntax is one of three ways to satisfy input ports by matching the field names of inputStruct to unconnected inport port names in the pipeline.

run(___,Name=Value) uses additional options specified by one or more name-value arguments for any of the above syntaxes.

Examples

collapse all

Import the Pipeline and block objects needed for the example.

import bioinfo.pipeline.Pipeline
import bioinfo.pipeline.block.*

Create a pipeline.

qcpipeline = Pipeline;

Select an input FASTQ file using a FileChooser block.

fastqfile = FileChooser(which("SRR005164_1_50.fastq"));

Create a SeqFilter block.

sequencefilter = SeqFilter;

Define the filtering threshold value. Specifically, filter out sequences with a total of more than 10 low-quality bases, where a base is considered a low-quality base if its quality score is less than 20.

sequencefilter.Options.Threshold = [10 20];

Add the blocks to the pipeline.

addBlock(qcpipeline,[fastqfile,sequencefilter]);

Connect the output of the first block to the input of the second block. To do so, you need to first check the input and output port names of the corresponding blocks.

View the Outputs (port of the first block) and Inputs (port of the second block).

fastqfile.Outputs
ans = struct with fields:
    Files: [1×1 bioinfo.pipeline.Output]

sequencefilter.Inputs
ans = struct with fields:
    FASTQFiles: [1×1 bioinfo.pipeline.Input]

Connect the Files output port of the fastqfile block to the FASTQFiles port of sequencefilter block.

connect(qcpipeline,fastqfile,sequencefilter,["Files","FASTQFiles"]);

Next, create a UserFunction block that calls the seqqcplot function to plot the quality data of the filtered sequence data. In this case, inputFile is the required argument for the seqqcplot function. The required argument name can be anything as long as it is a valid variable name.

qcplot = UserFunction("seqqcplot",RequiredArguments="inputFile",OutputArguments="figureHandle");

Alternatively, you can also use dot notation to set up your UserFunction block.

qcplot = UserFunction;
qcplot.RequiredArguments = "inputFile";
qcplot.Function = "seqqcplot";
qcplot.OutputArguments = "figureHandle";

Add the block.

addBlock(qcpipeline,qcplot);

Check the port names of sequencefilter block and qcplot block.

sequencefilter.Outputs
ans = struct with fields:
    FilteredFASTQFiles: [1×1 bioinfo.pipeline.Output]
         NumFilteredIn: [1×1 bioinfo.pipeline.Output]
        NumFilteredOut: [1×1 bioinfo.pipeline.Output]

qcplot.Inputs
ans = struct with fields:
    inputFile: [1×1 bioinfo.pipeline.Input]

Connect the FilteredFASTQFiles port of the sequencefilter block to the inputFile port of the qcplot block.

connect(qcpipeline,sequencefilter,qcplot,["FilteredFASTQFiles","inputFile"]);

Run the pipeline to plot the sequence quality data.

run(qcpipeline);

seqqcplot_figure.png

Import the Pipeline and block objects needed for the example.

import bioinfo.pipeline.Pipeline
import bioinfo.pipeline.block.*

Create a pipeline.

P = Pipeline;

Create a Bowtie2Build block to build index files for the reference genome.

bowtie2build = Bowtie2Build;

Create a Bowtie2 block to map the read sequences to the reference sequence.

bowtie2 = Bowtie2;

Add the blocks to the pipeline.

addBlock(P,[bowtie2build,bowtie2],["bowtie2build","bowtie2"]);

Get the list of names of all the required input ports from every block in the pipeline that are needed to be set or connected. IndexBaseName is an input port of both bowtie2build and bowtie2 block. Reads1File is the input port of the bowtie2 block and ReferenceFASTAFile is the input of bowtie2build block.

portnames = inputNames(P)
portnames = 1×3 string
    "IndexBaseName"    "Reads1Files"    "ReferenceFASTAFiles"

Some blocks have optional input ports. To see the names of these ports, set IncludeOptional=true. For instance, the Bowtie2 block has an optional input port (Reads2Files) that accepts files for the second mate reads when you have paired-end read data.

allportnames = inputNames(P,IncludeOptional=true)
allportnames = 1×4 string
    "IndexBaseName"    "Reads1Files"    "Reads2Files"    "ReferenceFASTAFiles"

Create an input structure to set the input port values of the bowtie2 and bowtie2build blocks. Specifically, set IndexBaseName to "Dmel_chr4" which is the base name for the reference index files for the Drosophila genome. Set Reads1Files to "SRR6008575_10k_1.fq" and Reads2Files to "SRR6008575_10k_2.fq". Set ReferenceFASTAFile to "Dmel_chr4.fa". These read files are already provided with the toolbox.

inputStruct.IndexBaseName = "Dmel_chr4";
inputStruct.Reads1Files   = "SRR6008575_10k_1.fq";
inputStruct.Reads2Files   = "SRR6008575_10k_2.fq";
inputStruct.ReferenceFASTAFiles = "Dmel_chr4.fa";

Optionally, you can compile and check if the input structure is set up correctly. Note that this compilation also happens automatically when you run the pipeline.

compile(P,inputStruct);

Run the pipeline using the structure as an input.

run(P,inputStruct);

Get the bowtie2 block result after the pipeline finishes running.

wait(P);
mappedFile = results(P,bowtie2)
mappedFile = struct with fields:
    SAMFile: [1×1 bioinfo.pipeline.datatype.File]

The Bowtie2 block generates a SAM file that contains the mapped results. To see the location of the file, use unwrap.

unwrap(mappedFile.SAMFile)

Input Arguments

collapse all

Bioinformatics pipeline, specified as a bioinfo.pipeline.Pipeline object.

Input structure to satisfy unconnected input ports, specified as a structure.

The field names of inputStruct must match the names of unconnected ports in the pipeline.

Tip

Use inputNames to get the list of names for all unconnected input ports and use them as field names in inputStruct.

Data Types: struct

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: run(pipeline,UseParallel=true) runs the pipeline in parallel.

Location to store the pipeline results, specified as a character vector or string scalar. The default location is the PipelineResults folder within the current working directory (pwd). In the PipelineResults folder, results from each block of the pipeline are stored separately in a subfolder that is named after the block name.

If you rerun the pipeline with the same results directory, what happens to the existing results depends on RunMode:

  • When RunMode=Minimal (default), the existing results are reused unless the block has become stale.

  • When RunMode=Full, the existing results are always overwritten.

Data Types: char | string

Information to print to the MATLAB® command line while the pipeline is running, specified as one of the following:

  • "Off" or 0 — Display no messages.

  • "Error" or 1 — Display only error messages.

  • "Warn" or 2 — Display warnings and errors.

  • "Info" or 3 — Display warnings, errors, and pipeline run progress information.

  • "Debug" or 4 — Display more detailed debugging information.

Data Types: double | char | string

Flag to run the pipeline in parallel, specified as a numeric or logical 1 (true) or 0 (false). Parallel Computing Toolbox™ is required to run in parallel.

Note

Only process-based pools are supported. Thread-based pools are not.

Data Types: double | logical

Run mode of the pipeline, specified as one of the following:

  • "Minimal" — The pipeline runs only the blocks for which one of the following statements is true:

    • The block has not been run before or its results have been deleted.

    • You have modified the block since the last time it ran.

    • Input data, including new runtime inputs, to the block has changed since the last run.

    • The block has one or more upstream blocks which have run since the last time the block was run.

    Tip

    If you specify a subset of blocks to run using To, From, and Only name-value arguments, these rules are applied only to those selected blocks. It is recommended that you use the default run mode "Minimal" because skipping up-to-date blocks can save significant time running the pipeline, especially when the pipeline has long-running blocks that do not need to rerun.

  • "Full" — The pipeline runs all blocks even if they have previously computed results.

Data Types: char | string

Starting blocks when you run the pipeline, specified as a bioinfo.pipeline.Block object or vector of block objects. You can also specify a character vector, string scalar, string vector, or cell array of character vectors representing block names. By default, the pipeline runs every block that needs to be run as defined by the Minimal run mode.

If you specify this argument, the pipeline starts running from the specified blocks and all the downstream blocks.

If you specify both To and From blocks, there must exist one block between the blocks specified by To and From.

You cannot use this argument together with the Only name-value argument.

Ending blocks when you run the pipeline, specified as a bioinfo.pipeline.Block object, vector of block objects, character vector, string scalar, string vector, or cell array of character vectors representing block names. By default, the pipeline runs every block that needs to be run as defined by the Minimal run mode.

If you specify this argument, the pipeline runs all the upstream blocks and stops at the specified blocks.

If you specify both To and From blocks, there must exist one block between the blocks specified by To and From.

You cannot use this argument together with the Only name-value argument.

Only blocks to run, specified as a bioinfo.pipeline.Block object, vector of block objects, character vector, string scalar, string vector, or cell array of character vectors representing block names. By default, the pipeline runs every block that needs to be run as defined by the Minimal run mode.

If you specify this argument, the pipeline runs only the specified blocks.

You cannot use this argument together with the To or From name-value arguments.

Blocks with results that are saved to MAT-files, specified as a bioinfo.pipeline.Block object, vector of block objects, character vector, string scalar, string vector, or cell array of character vectors representing block names.

By default (SaveResults = "-all"), results from each block are saved in the corresponding MAT-file in the block folder.

More About

collapse all

Satisfy Input Ports

All required input ports of every block in a pipeline must be satisfied before you can run the pipeline.

To satisfy an input port, you must do one of the following:

  • Connect to another port.

  • Set the value of the input port, that is, myBlock.Inputs.PropertyName.Value. For example, consider a BamSort block. To specify the name of a BAM file as the block input value, set the value as bamsortBlock.Inputs.BAMFile.Value = "ex1.bam".

  • Pass in an input structure by calling run(pipeline,inputStruct), where inputStruct has the field name equivalent to the input port name and the field value as the input port value.

Version History

Introduced in R2023a