Samples

Samples are one of the most important ways of organising data in Flow. Conceptually, a sample represents a biological sample - or more specifically, the data associated with that biological sample.

The sample model

The only required data for a sample is its initial raw data - the reads files obtained from the sequencer. These are associated with a sample by being part of a fileset, which is itself associated with a sample. Each sample can therefore have multiple filesets associated with it, which can represent resequencing of the original biological sample.

An unanalysed sample will therefore only have this raw data. When an execution is run, Flow looks at the samples that the input data is associated with (either because it is the raw data of that sample, or is associated through one the mechanisms being outlined now) and if they are all identified with one sample (or no sample), the execution is made 'part of' the sample in a many-to-one relationship. All the output data of that execution is also then considered associated with the sample.

Similarly, process executions can become associated with a sample for a similar reason - so in an execution that takes multiple samples as inputs, while the execution as a whole won't be associated with any one sample, most of the data produced will be associated with one of the input samples.

  • Name
    name
    Type
    string
    Description

    The sample's human readable name.

  • Name
    private
    Type
    boolean
    Description

    If false, anybody will be able to view the sample, even users not signed in (providing they can access the instance of Flow). Note that Flow requires certain criteria to be met before a sample can be made public, to protect the quality of the global public dataset.

  • Name
    created
    Type
    int
    Description

    The timestamp for the creation of the sample.

  • Name
    owner
    Type
    ID
    Description

    The user who owns the sample.

  • Name
    creator
    Type
    ID
    Description

    The user who originally created the sample.

  • Name
    group_owner
    Type
    ID
    Description

    The group who owns the sample.

Samples can be created either by uploading demultiplexed reads files, or by running a pipeline marked as a demultiplexing pipeline, or by running a pipeline marked as a sample importing pipeline. In the latter two cases, Flow has built-in logic for extracting the metadata from the files of the execution, and in the former case the metadata is provided directly by the user.

Sample metadata

Most of the attributes of sample objects are 'metadata' - distinguishing features of the original biological sample. Most of these are text, but some of them are other objects, including two that only exist as sample metadata - the sample source (a cell or tissue type that the sample was derived from) and the sample purification target (the target protein for purification). Each of these has the following attributes:

  • Name
    name
    Type
    string
    Description

    The name of the source/target.

  • Name
    user
    Type
    ID
    Description

    Optionally, the source/target can be associated with a specific user. If so, they are 'unvalidated' and visible only to that user - essentially a user contribution. Otherwise they are public, Flow-wide terms.

  • Name
    created
    Type
    int
    Description

    The timestamp for the creation of the source/target.

The other sample metadata attributes are:

  • Name
    scientist
    Type
    string
    Description

    The name of the researcher who prepared the original biological sample.

  • Name
    pi
    Type
    string
    Description

    The PI of the lab who prepared the original biological sample.

  • Name
    organisation
    Type
    string
    Description

    The organisation that prepared the original biological sample.

  • Name
    purification_agent
    Type
    string
    Description

    The antibody used in sample preparation.

  • Name
    experimental_method
    Type
    string
    Description

    This adds more specific detail to the sample category.

  • Name
    condition
    Type
    string
    Description

    The experimental condition of the sample.

  • Name
    sequencer
    Type
    string
    Description

    The sequencing equipment used to generate the data.

  • Name
    comments
    Type
    string
    Description

    Any additional comments.

  • Name
    five_prime_barcode_sequence
    Type
    string
    Description

    The 5' barcode sequence of the sample.

  • Name
    three_prime_barcode_sequence
    Type
    string
    Description

    The 3' barcode sequence of the sample.

  • Name
    three_prime_adapter_name
    Type
    string
    Description

    The 3' barcode adapter name of the sample.

  • Name
    three_prime_adapter_sequence
    Type
    string
    Description

    The 3' barcode adapter sequence of the sample.

  • Name
    rt_primer
    Type
    string
    Description

    The reverse transcription primer.

  • Name
    read1_primer
    Type
    string
    Description

    The read 1 primer sequence.

  • Name
    read2_primer
    Type
    string
    Description

    The read 2 primer sequence.

  • Name
    umi_barcode_sequence
    Type
    string
    Description

    The UMI Barcode Sequence.

  • Name
    umi_separator
    Type
    string
    Description

    The UMI separator string in the reads file.

  • Name
    source_text
    Type
    string
    Description

    Any qualifying text to go with the sample source.

  • Name
    purification_target_text
    Type
    string
    Description

    Any qualifying text to go with the sample purification target.

  • Name
    geo
    Type
    string
    Description

    The GEO accession of the sample.

  • Name
    ena
    Type
    string
    Description

    The ENA accession of the sample.

  • Name
    pubmed
    Type
    string
    Description

    The pubmed ID associated with the sample.

  • Name
    organism
    Type
    ID
    Description

    The organism the sample is associated with.

Sample Types

Different pipelines may wish to define their own sample types for the purposes of filtering. Often these will correspond to a particular kind of experimental technique or a particular kind of analysis that has to be performed on them. For this, admins can define custom sample types - in this regard they work much like data types:

  • Name
    id
    Type
    string
    Description

    A unique string, which is how the type will be referred to in pipeline schema.

  • Name
    name
    Type
    string
    Description

    The name of the sample type.

  • Name
    description
    Type
    string
    Description

    A free text description of what the sample type represents.

One important additional distinction between sample types and data types is that some sample types can specify additional metadata that samples of that type can have. This is implemented via metadata attributes and metaadata options.

A metadata attribute is associated with one sample type, and specifies an additional attribute that users can populate:

  • Name
    id
    Type
    string
    Description

    A unique string, which is how the attribute will appear in the metadata itself.

  • Name
    name
    Type
    string
    Description

    The human-readable name of the attribute.

  • Name
    description
    Type
    string
    Description

    A free text description of what the attribute represents.

  • Name
    required
    Type
    boolean
    Description

    Whether this attribute is required when creating a sample. You can also use an existing metadata attribute as the ID and set this to true to make an otherwise optional attribute required.

Generally these custom attributes are free text, but where you want to specify values to choose from, metadata options (multiple options are associated with one attribute) provide this:

  • Name
    value
    Type
    string
    Description

    A string representing some value to pick from a dropdown.

These custom values are stored in a JSON field of samples called type_specific_metadata.

Projects

Samples are organised into projects. What a project represents for a given group or organisation varies, but typically they represent a single research question. A project has a one-to-many relationship with samples.

Just as an execution is assigned to a sample if all its input data belongs to a single sample, they can also be assigned to a project if all the input data belongs to a single project. The executions of a project are therefore all of its samples' executions, and its directly contained executions.

  • Name
    name
    Type
    string
    Description

    The project's name.

  • Name
    private
    Type
    boolean
    Description

    If false, anybody will be able to view the project, even users not signed in (providing they can access the instance of Flow). All of its samples must be eligible to be public for a project to be made public.

  • Name
    created
    Type
    int
    Description

    The timestamp for the creation of the project.

  • Name
    owner
    Type
    ID
    Description

    The user who owns the project.

  • Name
    creator
    Type
    ID
    Description

    The user who originally created the project.

  • Name
    group_owner
    Type
    ID
    Description

    The group who owns the project.

Papers

Projects can have zero or more papers associated with them. These are not set directly, but are determined from the Pubmed IDs of the associated samples. Their attributes are:

  • Name
    id
    Type
    string
    Description

    The Pubmed ID.

  • Name
    title
    Type
    string
    Description

    The full title of the paper.

  • Name
    year
    Type
    int
    Description

    The year of publication.

  • Name
    journal
    Type
    string
    Description

    The name of the journal the paper was published in.

Was this page helpful?