Uploading Data with flowbio
flowbio allows you to upload data to a Flow instance - including specialised data upload such as that for demultiplexed sample files or multiplexed data.
Uploading Standard Data
A file can be uploaded using the client's upload_data
method:
Upload standard data
data = client.upload_data("/path/to/file.fa")
For large data, you may wish to display a progress bar:
Upload standard data with a progress bar
data = client.upload_data("/path/to/file.fa", progress=True)
If you are experiencing network issues, you can instruct flowbio to retry any failed chunk upload - in this case up to a maximum of five times:
Upload standard data with retries
data = client.upload_data("/path/to/file.fa", retries=5)
The full arguments list:
- Name
path
- Type
- string
- Description
The local path to the file to be uploaded.
- Name
chunk_size
- Type
- int
- Description
Files are uploaded in chunks - this sets the size of those chunks in bytes (default 1,000,000). Lowering this improves the reliability of the upload, increasing it reduces the overall time taken.
- Name
progress
- Type
- bool
- Description
Whether or not to display a progress bar (default
False
).
- Name
retries
- Type
- int
- Description
How many times to re-attempt to upload a chunk before giving up (default 0).
Sample Upload
To upload the initial data for a sample, use the upload_sample
method. Here you provide the name of the sample, at least one file path (depending on whether the sample is single-end or paired-end), and then a dictionary of sample metadata:
- Name
name
- Type
- string
- Description
The name of the sample being created.
- Name
path1
- Type
- string
- Description
The local path to the initiating data to be uploaded.
- Name
path2
- Type
- string
- Description
If paired-end, the local path to the second initiating file (default
None
).
- Name
chunk_size
- Type
- int
- Description
Files are uploaded in chunks - this sets the size of those chunks in bytes (default 1,000,000). Lowering this improves the reliability of the upload, increasing it reduces the overall time taken.
- Name
progress
- Type
- bool
- Description
Whether or not to display a progress bar (default
False
).
- Name
retries
- Type
- int
- Description
How many times to re-attempt to upload a chunk before giving up (default 0).
- Name
metadata
- Type
- dict
- Description
Additional attributes for the sample.
Upload a sample
sample = client.upload_sample(
"My Sample Name",
"/path/to/reads1.fastq.gz",
"/path/to/reads2.fastq.gz", # optional
progress=True,
retries=5,
metadata={
"category": "RNA-Seq",
"strandedness": "unstranded",
}
)
Metadata
The metadata is given as a Python dictionary of values. These are the full permitted attributes:
- Name
category
- Type
- string
- Description
The sample's category. This can determine whether other fields are required.
- Name
organism
- Type
- string
- Description
The ID (
Hs
,Mm
etc.) of the organism the sample belongs to.
- Name
source
- Type
- string
- Description
The name of the sample's cell type.
- Name
sourceText
- Type
- string
- Description
Any additional text for the cell type.
- Name
purificationTarget
- Type
- string
- Description
The name of the sample's purficiation target.
- Name
purificationTargetText
- Type
- string
- Description
Any additional text for the purification target.
- Name
project
- Type
- ID
- Description
The ID of the project to add the sample to.
- Name
scientist
- Type
- string
- Description
The name of the person who performed the original experiment.
- Name
pi
- Type
- string
- Description
The name of the PI for the original experiment.
- Name
organisation
- Type
- string
- Description
The name of the organisation the sample was generated at.
- Name
purificationAgent
- Type
- string
- Description
The purification agent used.
- Name
experimentalMethod
- Type
- string
- Description
This adds more specific detail to the sample category.
- Name
condition
- Type
- string
- Description
The experimental condition of the sample.
- Name
sequencer
- Type
- string
- Description
The sequencing equipment used to generate the data.
- Name
comments
- Type
- string
- Description
Any additional comments.
- Name
fivePrimeBarcodeSequence
- Type
- string
- Description
The 5' barcode sequence of the sample.
- Name
threePrimeBarcodeSequence
- Type
- string
- Description
The 3' barcode sequence of the sample.
- Name
threePrimeAdapterName
- Type
- string
- Description
The 3' barcode adapter name of the sample.
- Name
threePrimeAdapterSequence
- Type
- string
- Description
The 3' barcode adapter sequence of the sample.
- Name
rtPrimer
- Type
- string
- Description
The reverse transcription primer.
- Name
read1Primer
- Type
- string
- Description
The read 1 primer sequence.
- Name
read2Primer
- Type
- string
- Description
The read 2 primer sequence.
- Name
umiBarcodeSequence
- Type
- string
- Description
The UMI Barcode Sequence.
- Name
umiSeparator
- Type
- string
- Description
The UMI separator string in the reads file.
- Name
strandedness
- Type
- string
- Description
Only needed for some sample categories - must be
"unstranded"
,"forward"
,"reverse"
or"auto"
.
- Name
rnaSelectionMethod
- Type
- string
- Description
Only needed for some sample categories - must be
"polya"
,"ribominus"
, or"targeted"
.
- Name
geo
- Type
- string
- Description
The GEO accession of the sample.
- Name
ena
- Type
- string
- Description
The ENA accession of the sample.
- Name
pubmed
- Type
- string
- Description
The pubmed ID associated with the sample.
Multiplexed data upload
Multiplexed data also has its own specialised method, upload_multiplexed
:
Upload a multiplexed FASTQ file
multiplexed = client.upload_multiplexed(
"/path/to/reads.fastq.gz",
progress=True,
retries=5,
)
- Name
path
- Type
- string
- Description
The local path to the multiplexed FASTQ file to be uploaded.
- Name
chunk_size
- Type
- int
- Description
Files are uploaded in chunks - this sets the size of those chunks in bytes (default 1,000,000). Lowering this improves the reliability of the upload, increasing it reduces the overall time taken.
- Name
progress
- Type
- bool
- Description
Whether or not to display a progress bar (default
False
).
- Name
retries
- Type
- int
- Description
How many times to re-attempt to upload a chunk before giving up (default 0).
The associated annotation sheet is uploaded with upload_annotation
:
Upload an annotation sheet
annotation = client.upload_annotation(
"/path/to/annotation.csv",
progress=True,
retries=5,
)
- Name
path
- Type
- string
- Description
The local path to the annotation sheet to be uploaded.
- Name
chunk_size
- Type
- int
- Description
Files are uploaded in chunks - this sets the size of those chunks in bytes (default 1,000,000). Lowering this improves the reliability of the upload, increasing it reduces the overall time taken.
- Name
progress
- Type
- bool
- Description
Whether or not to display a progress bar (default
False
).
- Name
retries
- Type
- int
- Description
How many times to re-attempt to upload a chunk before giving up (default 0).