Samples

Samples are one of the most important ways of organising data in Flow. Conceptually, a sample represents a biological sample - or more specifically, the data associated with that biological sample.

The sample model

The only required data for a sample is its initial raw data - the reads files obtained from the sequencer. These are associated with a sample by being part of a fileset, which is itself associated with a sample. Each sample can therefore have multiple filesets associated with it, which can represent resequencing of the original biological sample.

An unanalysed sample will therefore only have this raw data. When an execution is run, Flow looks at the samples that the input data is associated with (either because it is the raw data of that sample, or is associated through one the mechanisms being outlined now) and if they are all identified with one sample (or no sample), the execution is made 'part of' the sample in a many-to-one relationship. All the output data of that execution is also then considered assoiated with the sample.

Similarly, process executions can become associated with a sample for a similar reason - so in an execution that takes multiple samples as inputs, while the execution as a whole won't be associated with any one sample, most of the data produced will be associated with one of the input samples.

  • Name
    name
    Type
    string
    Description

    The sample's human readable name.

  • Name
    private
    Type
    boolean
    Description

    If false, anybody will be able to view the sample, even users not signed in (providing they can access the instance of Flow). Noe that Flow requires certain criteria to be met before a sample can be made public, to protect the quality of the global public dataset.

  • Name
    created
    Type
    int
    Description

    The timestamp for the creation of the sample.

  • Name
    owner
    Type
    ID
    Description

    The user who owns the sample.

  • Name
    creator
    Type
    ID
    Description

    The user who originally created the sample.

  • Name
    group_owner
    Type
    ID
    Description

    The group who owns the sample.

Samples can be created either by uploading demultiplexed reads files, or by running a pipeline marked as a demultiplexing pipelines, or by running a pipeline marked as a sample importing pipeline. In the latter two cases, Flow has built-in logic for extracting the metadata from the files of the execution, and in the former case the metadata is provided directly by the user.

Sample metadata

Most of the attributes of sample objects are 'metadata' - distinguishing features of the original biological sample. Most of these are text, but some of them are other objects, including two that only exist as sample metadata - the sample source (a cell or tissue type that the sample was derived from) and the sample purification target (the target protein for purification). Each of these has the following attributes:

  • Name
    name
    Type
    string
    Description

    The name of the source/target.

  • Name
    user
    Type
    ID
    Description

    Optionally, the source/target can be associated with a specific user. If so, they are 'unvalidated' and visible only to that user - essentially a user contribution. Otherwise they are public, Flow-wide terms.

  • Name
    created
    Type
    int
    Description

    The timestamp for the creation of the source/target.

The other sample metadata attributes are:

  • Name
    category
    Type
    string
    Description

    The sample type - RNA-Seq, scRNA-Seq, ChIP-Seq or CLIP. The value of this attribute can determine which metadata fields are mandatory.

  • Name
    scientist
    Type
    string
    Description

    The name of the researcher who prepared the original biological sample.

  • Name
    pi
    Type
    string
    Description

    The PI of the lab who prepared the original biological sample.

  • Name
    organisation
    Type
    string
    Description

    The organisation that prepared the original biological sample.

  • Name
    purification_agent
    Type
    string
    Description

    The antibody used in sample preparation.

  • Name
    experimental_method
    Type
    string
    Description

    This adds more specific detail to the sample category.

  • Name
    condition
    Type
    string
    Description

    The experimental condition of the sample.

  • Name
    sequencer
    Type
    string
    Description

    The sequencing equipment used to generate the data.

  • Name
    comments
    Type
    string
    Description

    Any additional comments.

  • Name
    five_prime_barcode_sequence
    Type
    string
    Description

    The 5' barcode sequence of the sample.

  • Name
    three_prime_barcode_sequence
    Type
    string
    Description

    The 3' barcode sequence of the sample.

  • Name
    three_prime_adapter_name
    Type
    string
    Description

    The 3' barcode adapter name of the sample.

  • Name
    three_prime_adapter_sequence
    Type
    string
    Description

    The 3' barcode adapter sequence of the sample.

  • Name
    rt_primer
    Type
    string
    Description

    The reverse transcription primer.

  • Name
    read1_primer
    Type
    string
    Description

    The read 1 primer sequence.

  • Name
    read2_primer
    Type
    string
    Description

    The read 2 primer sequence.

  • Name
    umi_barcode_sequence
    Type
    string
    Description

    The UMI Barcode Sequence.

  • Name
    umi_separator
    Type
    string
    Description

    The UMI separator string in the reads file.

  • Name
    strandedness
    Type
    string
    Description

    Only needed for some sample categories - must be "unstranded", "forward", "reverse" or "auto".

  • Name
    rna_selection_method
    Type
    string
    Description

    Only needed for some sample categories - must be "polya", "ribominus", or "targeted".

  • Name
    source_text
    Type
    string
    Description

    Any qualifying text to go with the sample source.

  • Name
    purification_target_text
    Type
    string
    Description

    Any qualifying text to go with the sample purification target.

  • Name
    geo
    Type
    string
    Description

    The GEO accession of the sample.

  • Name
    ena
    Type
    string
    Description

    The ENA accession of the sample.

  • Name
    pubmed
    Type
    string
    Description

    The pubmed ID associated with the sample.

  • Name
    organism
    Type
    ID
    Description

    The organism the sample is associated with.

Projects

Samples are organised into projects. What a project represents for a give group or organisation varies, but typically they represent a single research question. A project has a one-to-many relationship with samples.

Just as an execution is assigned to a sample if all its input data belongs to a single sample, they can also be assigned to project if all the input data belongs to a single project. The executions of a project are therefore all of its samples' executions, and its directly contained executions.

  • Name
    name
    Type
    string
    Description

    The project's name.

  • Name
    name
    Type
    string
    Description

    The project's description - this should explain what the research aim is/was, and any other contextual information.

  • Name
    private
    Type
    boolean
    Description

    If false, anybody will be able to view the project, even users not signed in (providing they can access the instance of Flow). All of its samples must be eligible to be public for a project to be made public.

  • Name
    created
    Type
    int
    Description

    The timestamp for the creation of the project.

  • Name
    owner
    Type
    ID
    Description

    The user who owns the project.

  • Name
    creator
    Type
    ID
    Description

    The user who originally created the project.

  • Name
    group_owner
    Type
    ID
    Description

    The group who owns the project.

Papers

Projects can have zero or more papers associated with them. These are not set directly, but are determined from the Pubmed IDs of the associated samples. Their attributes are:

  • Name
    id
    Type
    string
    Description

    The Pubmed ID.

  • Name
    title
    Type
    string
    Description

    The full title of the paper.

  • Name
    year
    Type
    int
    Description

    The year of publication.

  • Name
    journal
    Type
    string
    Description

    The name of the journal the paper was published in.

Was this page helpful?