Uploading Data with flowbio

flowbio allows you to upload data to a Flow instance - including specialised data upload such as that for demultiplexed sample files or multiplexed data.

Uploading Standard Data

A file can be uploaded using the client's upload_data method:

Upload standard data

data = client.upload_data("/path/to/file.fa")

For large data, you may wish to display a progress bar:

Upload standard data with a progress bar

data = client.upload_data("/path/to/file.fa", progress=True)

If you are experiencing network issues, you can instruct flowbio to retry any failed chunk upload - in this case up to a maximum of five times:

Upload standard data with retries

data = client.upload_data("/path/to/file.fa", retries=5)

The full arguments list:

Name
path
Type
string
Description
The local path to the file to be uploaded.
Name
chunk_size
Type
int
Description
Files are uploaded in chunks - this sets the size of those chunks in bytes (default 1,000,000). Lowering this improves the reliability of the upload, increasing it reduces the overall time taken.

Name
progress
Type
bool
Description
Whether or not to display a progress bar (default False).
Name
retries
Type
int
Description
How many times to re-attempt to upload a chunk before giving up (default 0).

To upload the initial data for a sample, use the upload_sample method. Here you provide the name of the sample, at least one file path (depending on whether the sample is single-end or paired-end), and then a dictionary of sample metadata:

Name
name
Type
string
Description
The name of the sample being created.
Name
path1
Type
string
Description
The local path to the initiating data to be uploaded.
Name
path2
Type
string
Description
If paired-end, the local path to the second initiating file (default None).
Name
chunk_size
Type
int
Description
Files are uploaded in chunks - this sets the size of those chunks in bytes (default 1,000,000). Lowering this improves the reliability of the upload, increasing it reduces the overall time taken.
Name
progress
Type
bool
Description
Whether or not to display a progress bar (default False).
Name
retries
Type
int
Description
How many times to re-attempt to upload a chunk before giving up (default 0).
Name
metadata
Type
dict
Description
Additional attributes for the sample.

Upload a sample

sample = client.upload_sample(
    "My Sample Name",
    "/path/to/reads1.fastq.gz",
    "/path/to/reads2.fastq.gz", # optional
    progress=True,
    retries=5,
    metadata={
        "category": "RNA-Seq",
        "strandedness": "unstranded",
    }
)

Metadata

The metadata is given as a Python dictionary of values. These are the full permitted attributes:

Name
category
Type
string
Description
The sample's category. This can determine whether other fields are required.
Name
organism
Type
string
Description
The ID (Hs, Mm etc.) of the organism the sample belongs to.
Name
source
Type
string
Description
The name of the sample's cell type.
Name
sourceText
Type
string
Description
Any additional text for the cell type.
Name
purificationTarget
Type
string
Description
The name of the sample's purficiation target.
Name
purificationTargetText
Type
string
Description
Any additional text for the purification target.
Name
project
Type
ID
Description
The ID of the project to add the sample to.
Name
scientist
Type
string
Description
The name of the person who performed the original experiment.
Name
pi
Type
string
Description
The name of the PI for the original experiment.
Name
organisation
Type
string
Description
The name of the organisation the sample was generated at.
Name
purificationAgent
Type
string
Description
The purification agent used.
Name
experimentalMethod
Type
string
Description
This adds more specific detail to the sample category.
Name
condition
Type
string
Description
The experimental condition of the sample.
Name
sequencer
Type
string
Description
The sequencing equipment used to generate the data.
Name
comments
Type
string
Description
Any additional comments.
Name
fivePrimeBarcodeSequence
Type
string
Description
The 5' barcode sequence of the sample.
Name
threePrimeBarcodeSequence
Type
string
Description
The 3' barcode sequence of the sample.
Name
threePrimeAdapterName
Type
string
Description
The 3' barcode adapter name of the sample.
Name
threePrimeAdapterSequence
Type
string
Description
The 3' barcode adapter sequence of the sample.
Name
rtPrimer
Type
string
Description
The reverse transcription primer.
Name
read1Primer
Type
string
Description
The read 1 primer sequence.
Name
read2Primer
Type
string
Description
The read 2 primer sequence.
Name
umiBarcodeSequence
Type
string
Description
The UMI Barcode Sequence.
Name
umiSeparator
Type
string
Description
The UMI separator string in the reads file.
Name
strandedness
Type
string
Description
Only needed for some sample categories - must be "unstranded", "forward", "reverse" or "auto".
Name
rnaSelectionMethod
Type
string
Description
Only needed for some sample categories - must be "polya", "ribominus", or "targeted".
Name
geo
Type
string
Description
The GEO accession of the sample.
Name
ena
Type
string
Description
The ENA accession of the sample.
Name
pubmed
Type
string
Description
The pubmed ID associated with the sample.

Multiplexed data upload

Multiplexed data also has its own specialised method, upload_multiplexed:

Upload a multiplexed FASTQ file

multiplexed = client.upload_multiplexed(
    "/path/to/reads.fastq.gz",
    progress=True,
    retries=5,
)

Name
path
Type
string
Description
The local path to the multiplexed FASTQ file to be uploaded.
Name
chunk_size
Type
int
Description
Files are uploaded in chunks - this sets the size of those chunks in bytes (default 1,000,000). Lowering this improves the reliability of the upload, increasing it reduces the overall time taken.

Name
progress
Type
bool
Description
Whether or not to display a progress bar (default False).
Name
retries
Type
int
Description
How many times to re-attempt to upload a chunk before giving up (default 0).

The associated annotation sheet is uploaded with upload_annotation:

Upload an annotation sheet

annotation = client.upload_annotation(
    "/path/to/annotation.csv",
    progress=True,
    retries=5,
)

Name
path
Type
string
Description
The local path to the annotation sheet to be uploaded.
Name
chunk_size
Type
int
Description
Files are uploaded in chunks - this sets the size of those chunks in bytes (default 1,000,000). Lowering this improves the reliability of the upload, increasing it reduces the overall time taken.

Name
progress
Type
bool
Description
Whether or not to display a progress bar (default False).
Name
retries
Type
int
Description
How many times to re-attempt to upload a chunk before giving up (default 0).