Samples

Samples are one of the most important ways of organising data in Flow. Conceptually, a sample represents a biological sample - or more specifically, the data associated with that biological sample.

The sample model

The only required data for a sample is its initial raw data - the reads files obtained from the sequencer. These are associated with a sample by being part of a fileset, which is itself associated with a sample. Each sample can therefore have multiple filesets associated with it, which can represent resequencing of the original biological sample.

An unanalysed sample will therefore only have this raw data. When an execution is run, Flow looks at the samples that the input data is associated with (either because it is the raw data of that sample, or is associated through one the mechanisms being outlined now) and if they are all identified with one sample (or no sample), the execution is made 'part of' the sample in a many-to-one relationship. All the output data of that execution is also then considered associated with the sample.

Similarly, process executions can become associated with a sample for a similar reason - so in an execution that takes multiple samples as inputs, while the execution as a whole won't be associated with any one sample, most of the data produced will be associated with one of the input samples.

Name
name
Type
string
Description
The sample's human readable name.
Name
private
Type
boolean
Description
If false, anybody will be able to view the sample, even users not signed in (providing they can access the instance of Flow). Note that Flow requires certain criteria to be met before a sample can be made public, to protect the quality of the global public dataset.
Name
created
Type
int
Description
The timestamp for the creation of the sample.
Name
owner
Type
ID
Description
The user who owns the sample.
Name
creator
Type
ID
Description
The user who originally created the sample.
Name
group_owner
Type
ID
Description
The group who owns the sample.

Samples can be created either by uploading demultiplexed reads files, or by running a pipeline marked as a demultiplexing pipeline, or by running a pipeline marked as a sample importing pipeline. In the latter two cases, Flow has built-in logic for extracting the metadata from the files of the execution, and in the former case the metadata is provided directly by the user.

Sample metadata

Most of the attributes of sample objects are 'metadata' - distinguishing features of the original biological sample. Most of these are text, but some of them are other objects, including two that only exist as sample metadata - the sample source (a cell or tissue type that the sample was derived from) and the sample purification target (the target protein for purification). Each of these has the following attributes:

Name
name
Type
string
Description
The name of the source/target.
Name
user
Type
ID
Description
Optionally, the source/target can be associated with a specific user. If so, they are 'unvalidated' and visible only to that user - essentially a user contribution. Otherwise they are public, Flow-wide terms.
Name
created
Type
int
Description
The timestamp for the creation of the source/target.

The other sample metadata attributes are:

Name
scientist
Type
string
Description
The name of the researcher who prepared the original biological sample.
Name
pi
Type
string
Description
The PI of the lab who prepared the original biological sample.
Name
organisation
Type
string
Description
The organisation that prepared the original biological sample.
Name
purification_agent
Type
string
Description
The antibody used in sample preparation.
Name
experimental_method
Type
string
Description
This adds more specific detail to the sample category.
Name
condition
Type
string
Description
The experimental condition of the sample.
Name
sequencer
Type
string
Description
The sequencing equipment used to generate the data.
Name
comments
Type
string
Description
Any additional comments.
Name
five_prime_barcode_sequence
Type
string
Description
The 5' barcode sequence of the sample.
Name
three_prime_barcode_sequence
Type
string
Description
The 3' barcode sequence of the sample.
Name
three_prime_adapter_name
Type
string
Description
The 3' barcode adapter name of the sample.
Name
three_prime_adapter_sequence
Type
string
Description
The 3' barcode adapter sequence of the sample.
Name
rt_primer
Type
string
Description
The reverse transcription primer.
Name
read1_primer
Type
string
Description
The read 1 primer sequence.
Name
read2_primer
Type
string
Description
The read 2 primer sequence.
Name
umi_barcode_sequence
Type
string
Description
The UMI Barcode Sequence.
Name
umi_separator
Type
string
Description
The UMI separator string in the reads file.
Name
source_text
Type
string
Description
Any qualifying text to go with the sample source.
Name
purification_target_text
Type
string
Description
Any qualifying text to go with the sample purification target.
Name
geo
Type
string
Description
The GEO accession of the sample.
Name
ena
Type
string
Description
The ENA accession of the sample.
Name
pubmed
Type
string
Description
The pubmed ID associated with the sample.
Name
organism
Type
ID
Description
The organism the sample is associated with.

Sample Types

Different pipelines may wish to define their own sample types for the purposes of filtering. Often these will correspond to a particular kind of experimental technique or a particular kind of analysis that has to be performed on them. For this, admins can define custom sample types - in this regard they work much like data types:

Name
id
Type
string
Description
A unique string, which is how the type will be referred to in pipeline schema.
Name
name
Type
string
Description
The name of the sample type.
Name
description
Type
string
Description
A free text description of what the sample type represents.

One important additional distinction between sample types and data types is that some sample types can specify additional metadata that samples of that type can have. This is implemented via metadata attributes and metaadata options.

A metadata attribute is associated with one sample type, and specifies an additional attribute that users can populate:

Name
id
Type
string
Description
A unique string, which is how the attribute will appear in the metadata itself.
Name
name
Type
string
Description
The human-readable name of the attribute.
Name
description
Type
string
Description
A free text description of what the attribute represents.
Name
required
Type
boolean
Description
Whether this attribute is required when creating a sample. You can also use an existing metadata attribute as the ID and set this to true to make an otherwise optional attribute required.

Generally these custom attributes are free text, but where you want to specify values to choose from, metadata options (multiple options are associated with one attribute) provide this:

Name
value
Type
string
Description
A string representing some value to pick from a dropdown.

These custom values are stored in a JSON field of samples called type_specific_metadata.

Projects

Samples are organised into projects. What a project represents for a given group or organisation varies, but typically they represent a single research question. A project has a one-to-many relationship with samples.

Just as an execution is assigned to a sample if all its input data belongs to a single sample, they can also be assigned to a project if all the input data belongs to a single project. The executions of a project are therefore all of its samples' executions, and its directly contained executions.

Name
name
Type
string
Description
The project's name.
Name
private
Type
boolean
Description
If false, anybody will be able to view the project, even users not signed in (providing they can access the instance of Flow). All of its samples must be eligible to be public for a project to be made public.
Name
created
Type
int
Description
The timestamp for the creation of the project.
Name
owner
Type
ID
Description
The user who owns the project.
Name
creator
Type
ID
Description
The user who originally created the project.
Name
group_owner
Type
ID
Description
The group who owns the project.

Papers

Projects can have zero or more papers associated with them. These are not set directly, but are determined from the Pubmed IDs of the associated samples. Their attributes are:

Name
id
Type
string
Description
The Pubmed ID.
Name
title
Type
string
Description
The full title of the paper.
Name
year
Type
int
Description
The year of publication.
Name
journal
Type
string
Description
The name of the journal the paper was published in.