Organisms and Genomes

One of the most important ways of organising data in Flow is by the biological species it is associated with, and with the genome version it is aligned to.

Organisms

An organism is a specific, biological species that a Flow instance has data for - different instances will have different organisms defined. Its key properties are:

  • Name
    id
    Type
    string
    Description

    A two character identifier for the organism, such as "Hs". All Flow objects have IDs, but organisms are unique in having a meaningful string ID, rather than a random integer.

  • Name
    name
    Type
    string
    Description

    The short, everyday name of the organism, such as "Human" or "Mouse".

  • Name
    latin_name
    Type
    string
    Description

    The full, latin name of the organism, such as "Homo sapiens" or "Mus musculus".

Organisms are Flow-wide, and public - they are created by admins and are available to every user.

Genomes

In Flow a genome refers to a specific assembly of an organism's genome released by some authoriative body. Every Flow Genome must have a FASTA file and a GTF file, each of which represents a one-to-one relationship with the data model. It can also have multiple other files as needed, which are defined via a genome field on the data model, creating a one-to-many relationship between genome and data.

  • Name
    name
    Type
    string
    Description

    The name of the genome release.

  • Name
    long_name
    Type
    string
    Description

    If the genome has a longer, more formal name, that can be represented with this field.

  • Name
    created
    Type
    int
    Description

    The timestamp for when the genome was released.

  • Name
    url
    Type
    str
    Description

    A URL to the release's official page, if any.

  • Name
    fasta
    Type
    ID
    Description

    The data object for the genome's FASTA file.

  • Name
    gtf
    Type
    ID
    Description

    The data object for the genome's GTF file.

  • Name
    organism
    Type
    ID
    Description

    The organism the genome is for.

Some pipelines are tagged as being genome preparation pipelines - they take the data of a genome and generate useful indexes from them. As we have seen, these are pipelines where prepares_genome is set to True. Executions have a genome attribute to indicate the genome whose files they have used in this case, creating a one-to-many relationship between genomes and executions. Likewise, some pipelines use the files from these genome preparation executions, and these are likewise tagged with the original genome.

Was this page helpful?