Introduction to Workflow Managers
Workflow Management Systems (WfMS), such as Snakemake, Galaxy, and Nextflow have been developed specifically to manage computational data-analysis workflows.
Benefits:
- Run time management: Management of program execution on the operating system and splitting tasks and data to run at the same time in a process called parallelisation.
- Software management: Use of technology like containers, such as Docker or Singularity, that packages up code and all its dependencies so the application runs reliably from one computing environment to another.
- Portability & Interoperability: Workflows written on one system can be run on another computing infrastructure e.g., local computer, compute cluster, or cloud infrastructure.
- Reproducibility: The use of software management systems and a pipeline specification means that the workflow will produce the same results when re-run, including on different computing platforms.
- Reentrancy: Continuous checkpoints allow workflows to resume from the last successfully executed steps.
Nextflow
Basic concepts of Nextflow: 8.1 Nextflow Basic Concepts.
Nextflow workflows have three main parts:
- Processes - describe independent tasks to be run with defined input and output. A process script can be written in any scripting language that can be executed by the Linux platform (Bash, Perl, Ruby, Python, etc.).
- Channels - used to manipulate the flow of data from one process to the next.
- Workflows - set of instructions for programs involved in the pipeline. A single pipeline can contain many subworkflows.
nf-core
An organization that creates and maintains a curated set of open-source analysis pipelines built using Nextflow. For example, this is a workflow built for rnaseq:
Snakemake
Similar to Nextflow, the Snakemake workflow management system is a tool for creating reproducible and scalable data analyses. The main difference is that workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition. Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment.
Resources:
- University of Ghana documentation
- Intro to Snakemake