abioscripts

abioscrips is a Java program aimed at facilitating phylogenetic analysis, currently the main feature of the program is the Sequence Concat function that will take multiple files containing alignments of different regions and concatenate into one file in Nexus- and Fasta- format

The program is built around Java Evolutionary Biology Library JEBL http://sourceforge.net/projects/jebl/

installation

1. download .zip archive
2. extract files- abioscript + abioscripts.jar into folder of your choise (recomended is a bin-folder specified in your PATH environment variable, eg. /home/username/bin or /urs/bin/)

download here (only mac and linux - windows version available soon)

sourcecode is available here

usage

usage: abioscripts [OPTIONS]
-debug,--debug Display debug info
-h,--help Display help message
-seqconcat,--seqconcat Concat sequence alignments
-topcount,--topologycount Count number of unique topologies in treefile(s)
example usage: abioscripts --help --seqconcat

--seqconcat option explained:
usage: abioscripts --seqconcat [OPTIONS] -f <alignment-filenames> -o <output file>
-ae,--applyexcludes Remove characters listed in alignment excludes block
-cd,--continueondupes Continues executing even if there are duplicate names (both before and after formating names) -d,--outputdir <arg> Output directory
-f,--filenames <arg> Space separated input filenames - nexus or fasta format
-h,--help Display help message
-lc,--leftcutposition <arg> Cuts sequence names after the n occurrence of space or underscore
-m,--minalign <arg> Minimum number of alignments a sequence name need to be in for participating in the concatenated alignment
-o,--outputfile <arg> Output file name
-oc,--outputclusterscript Also output files in prepared cluster subfolder including cluster-script
-os,--outputsingles Also output separate filer per each region
-r,--removefilter <arg> Comma separated sequence names to filter out, wildcard symbol * or ? can be used to do partial name match
-rx,--regexmatch <arg> Keeps the part of the name that matches the regular expression -seqconcat,--seqconcat Concat sequence alignments
-woodsianameformatter Uses specialised formatter for Woodsia project sequences


example usage:

abioscripts --seqconcat --minalign 2 --leftcutposition 2 --removefilter zzz --filenames *.nex --outputfile U2_all_incl_matK
(- This command will concat all *.nex" files in this catalog, the taxa must be present in minimum 2 alignments, cut taxanames after 2:nd space or underscore, remove all taxa that includes zzz in name and output to file U2_all_incl_matK)

abioscripts --seqconcat --minalign 1 --leftcutposition 2 --removefilter Notholaena_grayi Woodsia --filenames U2_atpA.nex U2_atpB.nex --outputfile U2_atpA_and_atpB
(- This command will concat U2_atpA.nex and U2_atpB.nex files in this catalog, the taxa must be present in minimum 1 of the alignments, taxanames will be cut after 2:nd space or underscore, it will remove all taxa that includes Notholaena_grayi or Woodsia in name and output to file U2_atpA_and_atpB)

abioscripts --seqconcat --regexmatch .*?_\d+ --filenames U2_atpA.nex U2_atpB.nex --outputfile U2_atpA_and_atpB
(- This command will concat U2_atpA.nex and U2_atpB.nex files in this catalog, the part of the taxa names that matches the regular expression will be kept in above case .*?_\\d+ means any character .*? until _ any digits \d+ ending with _ (there are more than 10Milj webpages dealing with regular expressions (regex)

 

contact:

Anders Larsson,
Dep. of Systematic Biology
Uppsala University
Sweden
email: anders.larsson [at] ebc.uu.se
Phone: +46 (0)18 471 2932
Fax: +46 (0)18 471 6457