Analysis
One of the core offerings of Flow is its ability to manage and run a suite of Nextflow pipelines. Here we will explore how those pipelines are represented, and how they are executed.
Pipelines
The pipeline model represents a single, distinct, named Nextflow pipeline. Different instances of Flow will have different pipelines installed. The key properties of pipelines are:
- Name
- name
- Type
- string
- Description
- The overall name for the pipeline, as it is presented to the user. 
 
- Name
- is_nfcore
- Type
- boolean
- Description
- Whether or not this should be presented as an nf-core pipeline 
 
- Name
- is_demultiplex
- Type
- boolean
- Description
- Whether or not the pipeline demultiplexes samples. When true, this tells Flow to perform additional steps when processing the pipeline outputs to ensure samples are created. 
 
- Name
- imports_samples
- Type
- boolean
- Description
- Whether or not the pipeline imports samples. When true, this tells Flow to perform additional steps when processing the pipeline outputs to ensure samples are created. 
 
There are two ways that pipelines are organised. The first is via via pipeline categories and pipeline subcategories. Each pipeline model has a many-to-one relationship with a pipeline subcategory model, which in turn has a many-to-one relationship with a pipeline category model. These determine how the pipelines are presented in the frontend, and each has a description of what the category/subcategory represents.
The second is with the pipeline repo model. Each pipeline must be assoicated with a git repo, with a distinct URL that Flow can pull from. Pipeline repos and categories/subcategories are orthogonal - multiple pipelines from a single repo can be in different categories, and the pipelines in a single category or subcategory can be from different repos.
Pipeline versions
A single pipeline will have one or more pipeline versions - these refer to specific commits in the repo, and contain the paths to the actual files that should be run. When you run a pipeline, you are running a specific pipeline version. The key properties of pipeline versions are:
- Name
- name
- Type
- string
- Description
- The name of the pipeline version as presented to the user. 
 
- Name
- git
- Type
- string
- Description
- The git commit to use for this version - this can be a commit hash, a branch, or a tag. 
 
- Name
- private
- Type
- bool
- Description
- If private, only admin users will be able to see and access this pipeline. Useful for testing pipeline integrations with Flow. 
 
- Name
- active
- Type
- bool
- Description
- Only active pipeline versions can be run. Typically the most recent versions will be active, while older versions will be disabled by setting them to be inactive. 
 
- Name
- description
- Type
- string
- Description
- A brief description of what the pipeline does (which may change in different versions). 
 
- Name
- long_description
- Type
- string
- Description
- A more in-depth description of what the pipeline does (which also may change in different versions). 
 
- Name
- created
- Type
- int
- Description
- The timestamp of the pipeline version's creation, which is used to order the versions. 
 
- Name
- path
- Type
- string
- Description
- The path to the main pipeline .nf file (relative to the repo root). This is the file that should actually be run with - nextflow run.
 
- Name
- schema_path
- Type
- string
- Description
- The path to the Flow schema file (relative to the repo root). This is a JSON file which describes the inputs and outputs of the pipeline. 
 
- Name
- config_paths
- Type
- string
- Description
- A comma separated list of paths to any additional config files within the repo that should be applied when running. These are in addition to the global config files run for every pipeline. 
 
- Name
- copy_full
- Type
- bool
- Description
- For some pipelines, there may be large, non-committed files in the repo that are needed for running. When this is true, the pipeline repo will be copied over for every run, instead of just being recreated from the - .gitdirectory.
 
- Name
- upstream_pipeline_versions
- Type
- [ID]
- Description
- For pipelines which can take fileset preparations as inputs, this defines the pipeline versions whose executions can be used. 
 
Executions
When you run a pipeline, you create an execution object. This represents the running of a single pipeline version.
Flow uses Celery to manage the execution of pipelines. When a run is submitted, an execution object is created and a response returned to the user giving the ID of the new object. Meanwhile the job is added to the Celery queue. Once Celery selects the job, it will submit the Nextflow run in an install-specific way (different institutes will have different established systems for submitting such jobs) and then will watch the execution output to populate the database with new objects around what is created.
The key properties of executions are:
- Name
- identifier
- Type
- string
- Description
- The human readable name generated by Nextflow, typically two random words joined by an underscore. 
 
- Name
- pid
- Type
- string
- Description
- The PID of the main Nextflow process on the server (the process executions will have their own PIDs). 
 
- Name
- dependent
- Type
- bool
- Description
- Whether or not permissions applying to any samples or projects the execution is in should apply to this execution. 
 
- Name
- private
- Type
- bool
- Description
- If - false, anybody will be able to view the execution, even users not signed in (providing they can access the instance of Flow).
 
- Name
- resequence_samples
- Type
- bool
- Description
- Whether or not any samples created by the execution should be merged into existing samples where possible. 
 
- Name
- command
- Type
- string
- Description
- The full command-line command that was used to run this execution. 
 
- Name
- params
- Type
- string
- Description
- A JSON string containing the simple parameters passed to this execution. 
 
- Name
- data_params
- Type
- string
- Description
- A JSON string containing the data IDs passed as parameters to this execution. 
 
- Name
- sample_params
- Type
- string
- Description
- A JSON string containing the sample IDs passed as parameters to this execution, along with any additional columns of data. 
 
- Name
- nextflow_version
- Type
- string
- Description
- The version of Nextflow this execution was run with. 
 
- Name
- stdout
- Type
- string
- Description
- The full stdout produced by the run. 
 
- Name
- stderr
- Type
- string
- Description
- The full stderr produced by the run. 
 
- Name
- exit_code
- Type
- int
- Description
- The system exit code returned - 0 generally means it ran without issue. 
 
- Name
- status
- Type
- string
- Description
- The Nextflow reported status of the execution. 
 
- Name
- created
- Type
- int
- Description
- The timestamp for the initial creation of the execution in the request/response loop that submitted it. 
 
- Name
- task_started
- Type
- int
- Description
- The timestamp for the start of the celery process that submitted the execution. This may be a few seconds after - createdor it may be many hours, depending on the Celery queue.
 
- Name
- started
- Type
- int
- Description
- The timestamp for the start of the Nextflow job itself - typically milliseconds after - task_started.
 
- Name
- finished
- Type
- int
- Description
- The timestamp for the end of the Nextflow job. 
 
- Name
- task_finished
- Type
- int
- Description
- The timestamp for the end of the celery process that submitted the execution. This may be some time after the Nextflow process itself ended, if there is a lot of post-processing to do. 
 
- Name
- owner
- Type
- ID
- Description
- The user who owns the execution. 
 
- Name
- creator
- Type
- ID
- Description
- The user who originally ran the execution. 
 
- Name
- group_owner
- Type
- ID
- Description
- The group who owns the execution. 
 
Process Executions
An execution will have zero or more process executions. Nextflow pipelines work by chaining together multiple processes which typically (and in Flow, always) run in their own containerised environment. The process execution model represents each of these for a given execution. The key attributes are:
- Name
- name
- Type
- string
- Description
- The full name of the process - typically the process name with some argument in parentheses at the end. 
 
- Name
- process_name
- Type
- string
- Description
- The name of the process itself, without and distinguishing arguments. 
 
- Name
- identifier
- Type
- string
- Description
- The Nextflow-generated identifier for the process. 
 
- Name
- started
- Type
- int
- Description
- The timestamp for the start of the process execution. 
 
- Name
- finished
- Type
- int
- Description
- The timestamp for the end of the process execution. 
 
- Name
- stdout
- Type
- string
- Description
- The full stdout produced by the process execution. 
 
- Name
- stderr
- Type
- string
- Description
- The full stderr produced by the process execution. 
 
- Name
- bash
- Type
- string
- Description
- The bash script generated by Nextflow for this process execution. 
 
- Name
- exit_code
- Type
- int
- Description
- The system exit code returned - 0 generally means it ran without issue. 
 
- Name
- status
- Type
- string
- Description
- The Nextflow reported status of the process execution.