Split Input SAM Files and Assemble Transcriptomes Using Bioinformatics Pipeline
Import the pipeline and block objects needed for the example.
import bioinfo.pipeline.Pipeline import bioinfo.pipeline.block.*
Create a pipeline.
P = Pipeline
P = Pipeline with properties: Blocks: [0×1 bioinfo.pipeline.Block] BlockNames: [0×1 string]
Use a FileChooser
block to select the provided SAM files. The files contain aligned reads for Mycoplasma pneumoniae from two samples.
fileChooserBlock = FileChooser([which("Myco_1_1.sam"); which("Myco_1_2.sam")]);
Create a Cufflinks
block.
cufflinksBlock = Cufflinks;
Add the blocks to the pipeline.
addBlock(P,[fileChooserBlock,cufflinksBlock]);
Connect the blocks.
connect(P,fileChooserBlock,cufflinksBlock,["Files","GenomicAlignmentFiles"]);
Set SplitDimension
to 1
for the GenomicAlignmentFiles
input port. The value of 1 corresponds to the row dimension of the input, which means that the Cufflinks
block will run on each individual SAM files (Myco_1_1.sam
and Myco_1_1.sam
).
cufflinksBlock.Inputs.GenomicAlignmentFiles.SplitDimension = 1;
Run the pipeline. The pipeline runs Cufflinks
block two times independently and generates a set of four files for each SAM file.
run(P);
Get the block results.
cufflinksResults = results(P,cufflinksBlock)
cufflinksResults = struct with fields:
TranscriptsGTFFile: [2×1 bioinfo.pipeline.datatype.File]
IsoformsFPKMFile: [2×1 bioinfo.pipeline.datatype.File]
GenesFPKMFile: [2×1 bioinfo.pipeline.datatype.File]
SkippedTranscriptsGTFFile: [2×1 bioinfo.pipeline.datatype.File]
Use the process table to check the total number of runs for each block. Cufflinks
ran two times independently.
t = processTable(P,Expanded=true);
Set SplitDimension
to empty []
(which is the default). In this case, the pipeline does split the input files and runs Cufflinks
just once for both SAM files, processing each SAM file one after another.
cufflinksBlock.Inputs.GenomicAlignmentFiles.SplitDimension = []; deleteResults(P,IncludeFiles=true); run(P); cufflinksResults = results(P,cufflinksBlock)
cufflinksResults = struct with fields:
TranscriptsGTFFile: [2×1 bioinfo.pipeline.datatype.File]
IsoformsFPKMFile: [2×1 bioinfo.pipeline.datatype.File]
GenesFPKMFile: [2×1 bioinfo.pipeline.datatype.File]
SkippedTranscriptsGTFFile: [2×1 bioinfo.pipeline.datatype.File]
Check the process table, which confirms that Cufflinks
ran just once.
t2 = processTable(P,Expanded=true);
Tip: you can speed up the pipeline run by setting UseParallel=true
if you have Parallel Computing Toolbox™. The pipeline can schedule independent executions of blocks on parallel pool workers.
run(P,UseParallel=true)
See Also
bioinfo.pipeline.Pipeline
| bioinfo.pipeline.block.Cufflinks
| SplitDimension