|
LVB Manual – LVB phylogeny program, version 2.3 This manual was last updated on 27 July 2010. CONTENTS COPYRIGHT Part of this document is based on PHYLIP documentation (see ACKNOWLEDGEMENTS). The PHYLIP component of this document: © Copyright 1986-2000 by the University of Washington. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed. The remainder of this document: © Copyright 2003-2010 by Daniel Barker. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed. lvb seeks parsimonious trees from an aligned nucleotide data matrix. It uses heuristic searches consisting of simulated annealing followed by hill-climbing. In contrast to the more usual heuristic searches used to find parsimonious trees (e.g. stepwise addition followed by hill-climbing), simulated annealing can 'jump out' of local optima. Especially with large, complex data matrices, the simulated annealing heuristic may run faster and/or find a shorter tree. LVB 2.3 itself decides how long to run, given the apparent complexity of the input, without user intervention. CITING LVB
Please cite the following paper if you use LVB: Barker, D. 2004. LVB: Parsimony and simulated annealing in the search for phylogenetic trees. Bioinformatics, 20, 274-275. The following may also be relevant: Barker, D. 1997. LVB 1.0: Reconstructing Evolution with Parsimony and Simulated Annealing (Edinburgh: Daniel Barker) Barker, D. 1999. Simulated annealing in the Search for Phylogenetic Trees. PhD Thesis, University of Edinburgh. RUNNING LVB
lvb is a command-line program. lvb reads the alignment file from the current directory (folder) and writes its main output to a file in the current directory. The user is prompted for the matrix format, the approximate time to run, the interpretation of gaps in the alignment and whether bootstrap replicates are required. Answers are entered using the keyboard. lvb logs progress information and errors to the screen. MacOS X
The OS X version of LVB runs on OS X 10.6 (Snow Leopard), running on 64-bit Intel hardware. After downloading, extract
If
Linux and UNIX
After downloading, compile lvb from the source code (see COMPILING LVB). Once this is done, it may be launched as for Mac OS X. INPUT
Keyboard (standard input)
Keyboard
input is case-independent. So, for example, where the instructions below
suggest you type Matrix format
lvb can read matrices in PHYLIP 3.6 interleaved or PHYLIP 3.6 sequential format. These are described in the section on infile. When
prompted for the data matrix format, type Treatment of gaps
See the the table under Bases for a list of base codes allowed by lvb. A
gap represented by the letter ' ' ' When
prompted for the treatment of ' 'Fifth state' may give excessive weight to multi-site gaps, since each affected base position will be counted as one event. Random number seed
When
prompted for the random number seed, press The default value is taken from the system clock and hence will vary from one analysis to the next, changing every second. The default is usually appropriate. Bootstrapping
When
prompted for the number of bootstrap replicates, enter the number of
replicates required. If bootstrapping is not required, enter the number 0 or
just press lvb allows any number of replicates from 1 to 1000000 inclusive. For each replicate, a bootstrap sample of sites in the alignment is generated and analyzed. For an alignment matrix of m sites, each bootstrap replicate contains m sites, randomly sampled with replacement from the originals. Compared to the original alignment, it is likely that some sites are left out, some are present once, and others are present twice or more. In lvb the probability of including a site is equal for all sites, irrespective of whether the site varies or is constant. The most parsimonious tree(s) for each replicate are output. There will be at least one tree for each replicate. If the search for any replicate found more than one equally parsimonious tree, all are output and the number of trees will exceed the number of replicates. Generation of a consensus from all trees will over-represent those replicates for which more trees were found. If each bootstrap replicate finds a single tree, this is not an issue. infile
The
data matrix must be in a file called Layout
The simplest type of data matrix file looks something like this:
The first line of the input file contains the number of sequences and the number of characters (sites). These are in free format, separated by blanks. The information for each sequence follows, starting with a ten-character sequence name (which can include blanks and some punctuation marks), and continuing with the characters for that sequence. The name should come right at the start of the line, without any preceding blanks or tabs. It should be ten characters in length, filled out to the full ten characters by trailing blanks if shorter. Any printable ASCII/ISO character is allowed in the name, except for parentheses '(' and ')', square brackets '[' and ']', colon ':', semicolon ';' and comma ','. If you forget to extend the names to ten characters in length by blanks, an error message will result. The biological characters (bases or gaps) are each a single ASCII character, sometimes separated by blanks. The sequences can continue over multiple lines. When this is done the sequences must be either in interleaved format or sequential format. In sequential format all of one sequence is given, possibly on multiple lines, before the next starts. In interleaved format the first part of the file should contain the first part of each of the sequences, then possibly a line containing nothing but a carriage-return character, then the second part of each sequence, and so on. Only the first parts of the sequences should be preceded by names. The name must be on the same line as the first character of the data for that sequence. Here is a hypothetical example of interleaved format:
while in sequential format the same sequences would be:
If
each sequence only occupies one line in the matrix file, there is no
difference between sequential and interleaved format and lvb
can read the file in either way. Other than this special case, it is
important not to read an interleaved matrix as sequential or a sequential
matrix as interleaved. A Note that a portion of a sequence like this:
is perfectly legal, assuming that the sequence name has gone before and is filled out to full length by blanks. The above digits and blanks will be ignored, the sequence being taken as starting at the first base symbol (in this case an A). This should enable you to use output from many multiple-sequence alignment programs with only minimal editing. lvb
may have difficultires with spaces at the end of lines. The symptoms of this
problem are that lvb complains about a In interleaved format the present version of lvb may sometimes have difficulties with the blank lines between groups of lines, and if so you might want to retype those lines, making sure that they have only a carriage-return and no blank characters on them, or you may perhaps have to eliminate them. The symptoms of this problem are that lvb complains that the sequences are not properly aligned, and you can find no other cause for this complaint. Bases
The
sequences may contain A's, G's, C's and T's (or U's, which lvb
treats as equivalent to T's). Each ASCII character in the sequence must be
one of the letters These characters can be either upper or lower case, because the algorithms convert all input characters to upper case (which is how they are treated). The characters constitute the IUPAC (IUB) nucleic acid code plus some slight extensions. They enable input of nucleic acid sequences taking full account of any ambiguities in the sequence. For
further information on '
OUTPUT
Screen (standard output)
lvb logs its version, details of the analysis, indication of progress and any errors encountered to the standard output, which is usually the screen. Without bootstrapping, the rearrangement number (iteration) of the search and current tree length is logged every 10000 trees and every time tree length changes. During simulated annealing, the tree length can go up as well as down. LVB keeps and outputs the shortest trees encountered at any point during the search. The length of this tree or trees is logged to the screen near end of the analysis. With bootstrapping, the replicate number is logged, along with the number of rearrangements tries, the number of trees found and length of trees found for that replicate. outtree
Without
bootstrapping, the file With
bootstrapping, Trees use a subset of the 'Newick standard' tree format. This is accepted by many other programs. Trees
may be converted to graphics files using the Without
bootstrapping, if more than one equally parsimonious tree is found, these may
be combined in various ways using Output
trees are unrooted and branch lengths are not given. Trees may be rooted with
the COMPILING LVB
lvb is available at the LVB Web page as ready-to-run software for Mac OS X. For other platforms, or if you wish to modify the source code, you will have to compile lvb. It is written in ANSI C and is expected to compile and run on a variety of operating systems. However, before release it is currently only tested when compiled in 64-bit mode with the GNU C compiler for OS X (Intel CPU) and Linux (AMD CPU). Assuming
your system is UNIX-like, uses GNU Unpacking the source code
Assuming
This
gives you a main directory Compiler
options
By default, LVB is built using compiler options which
make sense for GNU C (gcc). To use other compiler options,
edit the file Compilation
Now,
assuming you begin in the
Results of the above commands are:
After
changing the source code or Documentation
The
main documentation (i.e. this file) is Internal
documentation will be of interest to people who wish to modify or re-use the
source code of LVB. During a successful build, documentation in Documentation
of PHYLIP code within LVB is given separately, in BIOINFORMATICS APPLICATIONS
For
automated use of lvb, a 'wrapper' in the Perl language may
be used. This is
SUPPORT
AND REGISTRATION
Please send questions and bug reports to: db60@st-and.ac.uk To
be placed on an email list to receive information on new versions, please
email ACKNOWLEDGEMENTS
lvb contains portions of PHYLIP 3.6a. This allows lvb to read PHYLIP-format matrix files. Also, most of the above documentation for infile is taken from the PHYLIP 3.6a manual. I wish to thank Joe Felsenstein for making PHYLIP freely available, and for advising on how to re-use it in lvb. SEE
ALSO
http://biology.st-andrews.ac.uk/cegg/lvb.aspx http://evolution.genetics.washington.edu/phylip.html http://phylogeny.arizona.edu/macclade/macclade.html http://mesquiteproject.org/mesquite/mesquite.html http://taxonomy.zoology.gla.ac.uk/software.html |