Uploading Data with flowbio
flowbio allows you to upload data to a Flow instance - including specialised data upload such as that for demultiplexed sample files or multiplexed data.
Uploading Standard Data
A file can be uploaded using the client's upload_data method:
Upload standard data
data = client.upload_data("/path/to/file.fa")
For large data, you may wish to display a progress bar:
Upload standard data with a progress bar
data = client.upload_data("/path/to/file.fa", progress=True)
If you are experiencing network issues, you can instruct flowbio to retry any failed chunk upload - in this case up to a maximum of five times:
Upload standard data with retries
data = client.upload_data("/path/to/file.fa", retries=5)
The full arguments list:
- Name
path- Type
- string
- Description
The local path to the file to be uploaded.
- Name
chunk_size- Type
- int
- Description
Files are uploaded in chunks - this sets the size of those chunks in bytes (default 1,000,000). Lowering this improves the reliability of the upload, increasing it reduces the overall time taken.
- Name
progress- Type
- bool
- Description
Whether or not to display a progress bar (default
False).
- Name
retries- Type
- int
- Description
How many times to re-attempt to upload a chunk before giving up (default 0).
Sample Upload
To upload the initial data for a sample, use the upload_sample method. Here you provide the name of the sample, at least one file path (depending on whether the sample is single-end or paired-end), and then a dictionary of sample metadata:
- Name
name- Type
- string
- Description
The name of the sample being created.
- Name
path1- Type
- string
- Description
The local path to the initiating data to be uploaded.
- Name
path2- Type
- string
- Description
If paired-end, the local path to the second initiating file (default
None).
- Name
chunk_size- Type
- int
- Description
Files are uploaded in chunks - this sets the size of those chunks in bytes (default 1,000,000). Lowering this improves the reliability of the upload, increasing it reduces the overall time taken.
- Name
progress- Type
- bool
- Description
Whether or not to display a progress bar (default
False).
- Name
retries- Type
- int
- Description
How many times to re-attempt to upload a chunk before giving up (default 0).
- Name
metadata- Type
- dict
- Description
Additional attributes for the sample.
Upload a sample
import json
sample = client.upload_sample(
"My Sample Name",
"/path/to/reads1.fastq.gz",
"/path/to/reads2.fastq.gz", # optional
progress=True,
retries=5,
metadata={
"sample_type": "RNA-Seq",
"organism": "Hs",
"type_specific_metadata": json.dumps({
"strandedness": "unstranded",
})
}
)
Metadata
The metadata is given as a Python dictionary of values. These are the full permitted attributes:
- Name
category- Type
- string
- Description
The sample's category. This can determine whether other fields are required.
- Name
organism- Type
- string
- Description
The ID (
Hs,Mmetc.) of the organism the sample belongs to.
- Name
source- Type
- string
- Description
The name of the sample's cell type.
- Name
sourceText- Type
- string
- Description
Any additional text for the cell type.
- Name
purificationTarget- Type
- string
- Description
The name of the sample's purficiation target.
- Name
purificationTargetText- Type
- string
- Description
Any additional text for the purification target.
- Name
project- Type
- ID
- Description
The ID of the project to add the sample to.
- Name
scientist- Type
- string
- Description
The name of the person who performed the original experiment.
- Name
pi- Type
- string
- Description
The name of the PI for the original experiment.
- Name
organisation- Type
- string
- Description
The name of the organisation the sample was generated at.
- Name
purificationAgent- Type
- string
- Description
The purification agent used.
- Name
experimentalMethod- Type
- string
- Description
This adds more specific detail to the sample category.
- Name
condition- Type
- string
- Description
The experimental condition of the sample.
- Name
sequencer- Type
- string
- Description
The sequencing equipment used to generate the data.
- Name
comments- Type
- string
- Description
Any additional comments.
- Name
fivePrimeBarcodeSequence- Type
- string
- Description
The 5' barcode sequence of the sample.
- Name
threePrimeBarcodeSequence- Type
- string
- Description
The 3' barcode sequence of the sample.
- Name
threePrimeAdapterName- Type
- string
- Description
The 3' barcode adapter name of the sample.
- Name
threePrimeAdapterSequence- Type
- string
- Description
The 3' barcode adapter sequence of the sample.
- Name
rtPrimer- Type
- string
- Description
The reverse transcription primer.
- Name
read1Primer- Type
- string
- Description
The read 1 primer sequence.
- Name
read2Primer- Type
- string
- Description
The read 2 primer sequence.
- Name
umiBarcodeSequence- Type
- string
- Description
The UMI Barcode Sequence.
- Name
umiSeparator- Type
- string
- Description
The UMI separator string in the reads file.
- Name
strandedness- Type
- string
- Description
Only needed for some sample categories - must be
"unstranded","forward","reverse"or"auto".
- Name
rnaSelectionMethod- Type
- string
- Description
Only needed for some sample categories - must be
"polya","ribominus", or"targeted".
- Name
geo- Type
- string
- Description
The GEO accession of the sample.
- Name
ena- Type
- string
- Description
The ENA accession of the sample.
- Name
pubmed- Type
- string
- Description
The pubmed ID associated with the sample.
Multiplexed data upload
Multiplexed data also has its own specialised method, upload_multiplexed:
Upload a multiplexed FASTQ file
multiplexed = client.upload_multiplexed(
"/path/to/reads.fastq.gz",
progress=True,
retries=5,
)
- Name
path- Type
- string
- Description
The local path to the multiplexed FASTQ file to be uploaded.
- Name
chunk_size- Type
- int
- Description
Files are uploaded in chunks - this sets the size of those chunks in bytes (default 1,000,000). Lowering this improves the reliability of the upload, increasing it reduces the overall time taken.
- Name
progress- Type
- bool
- Description
Whether or not to display a progress bar (default
False).
- Name
retries- Type
- int
- Description
How many times to re-attempt to upload a chunk before giving up (default 0).
The associated annotation sheet is uploaded with upload_annotation:
Upload an annotation sheet
annotation = client.upload_annotation(
"/path/to/annotation.csv",
progress=True,
retries=5,
)
- Name
path- Type
- string
- Description
The local path to the annotation sheet to be uploaded.
- Name
chunk_size- Type
- int
- Description
Files are uploaded in chunks - this sets the size of those chunks in bytes (default 1,000,000). Lowering this improves the reliability of the upload, increasing it reduces the overall time taken.
- Name
progress- Type
- bool
- Description
Whether or not to display a progress bar (default
False).
- Name
retries- Type
- int
- Description
How many times to re-attempt to upload a chunk before giving up (default 0).