Organisms
One of the most important ways of organising data in Flow is by the biological species it is associated with, and with the genome version it is aligned to.
Organisms
An organism is a specific, biological species that a Flow instance has data for - different instances will have different organisms defined. Its key properties are:
- Name
id
- Type
- string
- Description
A two character identifier for the organism, such as
"Hs"
. All Flow objects have IDs, but organisms are unique in having a meaningful string ID, rather than a random integer.
- Name
name
- Type
- string
- Description
The short, everyday name of the organism, such as
"Human"
or"Mouse"
.
- Name
latin_name
- Type
- string
- Description
The full, latin name of the organism, such as
"Homo sapiens"
or"Mus musculus"
.
Organisms are Flow-wide, and public - they are created by admins and are available to every user.
Genomes
Periodically, new releases of an organism's genome are released.
In Flow these are represented using filesets, using the fileset's organism
attribute to associate it with an organism.
Some pipelines are tagged by their schema files as being genome preparation pipelines - they take the data of a genome and generate useful indexes from them.
Executions have a fileset
attribute to indicate the fileset whose files they have used in this case, creating a one-to-many relationship between filesets and executions.
Likewise, some pipelines use the files from these genome fileset preparation executions, and these are likewise tagged with the original genome fileset.