Uploading Data with flowbio

flowbio allows you to upload data to a Flow instance - including specialised data upload such as that for demultiplexed sample files or multiplexed data.

Uploading Standard Data

A file can be uploaded using the client's upload_data method:

Upload standard data

data = client.upload_data("/path/to/file.fa")

For large data, you may wish to display a progress bar:

Upload standard data with a progress bar

data = client.upload_data("/path/to/file.fa", progress=True)

If you are experiencing network issues, you can instruct flowbio to retry any failed chunk upload - in this case up to a maximum of five times:

Upload standard data with retries

data = client.upload_data("/path/to/file.fa", retries=5)

The full arguments list:

  • Name
    path
    Type
    string
    Description

    The local path to the file to be uploaded.

  • Name
    chunk_size
    Type
    int
    Description

    Files are uploaded in chunks - this sets the size of those chunks in bytes (default 1,000,000).

  • Name
    progress
    Type
    bool
    Description

    Whether or not to display a progress bar (default False).

  • Name
    retries
    Type
    int
    Description

    How many times to re-attempt to upload a chunk before giving up (default 0).

Sample Upload

To upload the initial data for a sample, use the upload_sample method. Here you provide the name of the sample, at least one file path (depending on whether the sample is single-end or paired-end), and then a dictionary of sample metadata:

  • Name
    name
    Type
    string
    Description

    The name of the sample being created.

  • Name
    path1
    Type
    string
    Description

    The local path to the initiating data to be uploaded.

  • Name
    path2
    Type
    string
    Description

    If paired-end, the local path to the second initiating file (default None).

  • Name
    chunk_size
    Type
    int
    Description

    Files are uploaded in chunks - this sets the size of those chunks in bytes (default 1,000,000).

  • Name
    progress
    Type
    bool
    Description

    Whether or not to display a progress bar (default False).

  • Name
    retries
    Type
    int
    Description

    How many times to re-attempt to upload a chunk before giving up (default 0).

  • Name
    metadata
    Type
    dict
    Description

    Additional attributes for the sample.

Upload a sample

sample = client.upload_sample(
    "My Sample Name",
    "/path/to/reads1.fastq.gz",
    "/path/to/reads2.fastq.gz", # optional
    progress=True,
    retries=5,
    metadata={
        "category": "RNA-Seq",
        "strandedness": "unstranded",
    }
)

Metadata

The metadata is given as a Python dictionary of values. These are the full permitted attributes:

  • Name
    category
    Type
    string
    Description

    The sample's category. This can determine whether other fields are required.

  • Name
    organism
    Type
    string
    Description

    The ID (Hs, Mm etc.) of the organism the sample belongs to.

  • Name
    source
    Type
    string
    Description

    The name of the sample's cell type.

  • Name
    sourceText
    Type
    string
    Description

    Any additional text for the cell type.

  • Name
    purificationTarget
    Type
    string
    Description

    The name of the sample's purficiation target.

  • Name
    purificationTargetText
    Type
    string
    Description

    Any additional text for the purification target.

  • Name
    project
    Type
    ID
    Description

    The ID of the project to add the sample to.

  • Name
    scientist
    Type
    string
    Description

    The name of the person who performed the original experiment.

  • Name
    pi
    Type
    string
    Description

    The name of the PI for the original experiment.

  • Name
    organisation
    Type
    string
    Description

    The name of the organisation the sample was generated at.

  • Name
    purificationAgent
    Type
    string
    Description

    The purification agent used.

  • Name
    experimentalMethod
    Type
    string
    Description

    This adds more specific detail to the sample category.

  • Name
    condition
    Type
    string
    Description

    The experimental condition of the sample.

  • Name
    sequencer
    Type
    string
    Description

    The sequencing equipment used to generate the data.

  • Name
    comments
    Type
    string
    Description

    Any additional comments.

  • Name
    fivePrimeBarcodeSequence
    Type
    string
    Description

    The 5' barcode sequence of the sample.

  • Name
    threePrimeBarcodeSequence
    Type
    string
    Description

    The 3' barcode sequence of the sample.

  • Name
    threePrimeAdapterName
    Type
    string
    Description

    The 3' barcode adapter name of the sample.

  • Name
    threePrimeAdapterSequence
    Type
    string
    Description

    The 3' barcode adapter sequence of the sample.

  • Name
    rtPrimer
    Type
    string
    Description

    The reverse transcription primer.

  • Name
    read1Primer
    Type
    string
    Description

    The read 1 primer sequence.

  • Name
    read2Primer
    Type
    string
    Description

    The read 2 primer sequence.

  • Name
    umiBarcodeSequence
    Type
    string
    Description

    The UMI Barcode Sequence.

  • Name
    umiSeparator
    Type
    string
    Description

    The UMI separator string in the reads file.

  • Name
    strandedness
    Type
    string
    Description

    Only needed for some sample categories - must be "unstranded", "forward", "reverse" or "auto".

  • Name
    rnaSelectionMethod
    Type
    string
    Description

    Only needed for some sample categories - must be "polya", "ribominus", or "targeted".

  • Name
    geo
    Type
    string
    Description

    The GEO accession of the sample.

  • Name
    ena
    Type
    string
    Description

    The ENA accession of the sample.

  • Name
    pubmed
    Type
    string
    Description

    The pubmed ID associated with the sample.

Multiplexed data upload

Multiplexed data also has its own specialised method, upload_multiplexed:

Upload a multiplexed FASTQ file

multiplexed = client.upload_multiplexed(
    "/path/to/reads.fastq.gz",
    progress=True,
    retries=5,
)
  • Name
    path
    Type
    string
    Description

    The local path to the multiplexed FASTQ file to be uploaded.

  • Name
    chunk_size
    Type
    int
    Description

    Files are uploaded in chunks - this sets the size of those chunks in bytes (default 1,000,000).

  • Name
    progress
    Type
    bool
    Description

    Whether or not to display a progress bar (default False).

  • Name
    retries
    Type
    int
    Description

    How many times to re-attempt to upload a chunk before giving up (default 0).

The associated annotation sheet is uploaded with upload_annotation:

Upload an annotation sheet

annotation = client.upload_annotation(
    "/path/to/annotation.csv",
    progress=True,
    retries=5,
)
  • Name
    path
    Type
    string
    Description

    The local path to the annotation sheet to be uploaded.

  • Name
    chunk_size
    Type
    int
    Description

    Files are uploaded in chunks - this sets the size of those chunks in bytes (default 1,000,000).

  • Name
    progress
    Type
    bool
    Description

    Whether or not to display a progress bar (default False).

  • Name
    retries
    Type
    int
    Description

    How many times to re-attempt to upload a chunk before giving up (default 0).

Was this page helpful?