bio: general tips
Substantial effort has been devoted to making the command line more error-tolerant and user friendly.
See the help
Each command, when invoked with no parameters, will produce help:
bio
All commands can read the standard input
bio fetch AF086833 KJ660346 > genome.gb
or if the file ids.txt
contains:
AF086833
KJ660346
you can run:
cat ids.txt | bio fetch > genome.gb
Some commands can create sequences on the fly
bio align GATCA GATTACA
or
bio fasta GATTACA --translate
Parameter order does not matter
You may write:
bio fasta --type CDS --end 10 genomes.gb
or:
cat genomes.gb | bio fasta --type CDS --end 10
both will work and produce the same results.
Parameter action
Each parameter will be applied sequentially in an internally determined order that makes the most sense:
bio fasta --type CDS --end 10 --translate genomes.gb
will produce the same results as:
bio fasta --translate --end 10 --type CDS genomes.gb
Both commands first select CDS
types, apply a slice on each sequence, and then use the translation operator.
Multiple accession numbers
Many commands allow using multiple accession numbers; in that case, the operations will take place sequentially on each.
bio fetch NC_045512 MN996532
or comma separated:
bio fetch NC_045512,MN996532
Parameter forms
You may use single or double dashes on parameters:
bio fasta genomes.gb --end 100
The command above is equivalent to:
bio fasta genomes.gb -end 100
The coordinate system is 1 based
Coordinates are one based (inclusive on both ends) identical to GFF coordinate formats.
bio fasta --start 10 --end 20 genomes.gb
The interval of 10 to 20 is 11 bases long! To make a single base long slice start and end on the same value:
bio fasta --start 10 --end 10 genomes.gb
Number formatting
Numbers for start and end coordinates may be written in human-friendly forms, like so:
bio fasta -start 1kb -end 2kb genomes.gb
accepted formats:
5000
5,000
5k
or5kb
5K
or5KB