SeqVis: Visualization of compositional heterogeneity in large alignments of nucleotides

Joshua WK Ho, Cameron E Adams, Jie Bin Lew, Timothy J Matthews, Chiu Chin Ng, Arash Shahabi-Sirjani, Leng Hong Tan, Yu Zhao, Simon Easteal, Susan R Wilson, Lars S Jermiin.

News

31th October 2008
NEW! SeqVis Version 1.4 is released. A number of bugs in previous versions were fixed, and some of the functionalities were renamed.


4th June 2007
NEW! SeqVis Version 1.3 is released. The problem of reading FASTA files was fixed.


30th May 2007
WARNING! A bug was detected in the fasta file reading module in SeqVis. In some cases, SeqVis fails to signal a warning when sepcial characters, such as the new line '\n' character is present. This may result in incorrect calculation of nucleotide frequency. Therefore, it is advised to AVOID reading fasta sequence file before further annoucement is made. The SeqVis team will attempt to rectify the problem as soon as possible.


NEW! SeqVis Version 1.2 is released. This version of SeqVis enables any four attributes data that sums to one to be visualized. Details are available in Feature.


SeqVis is published in Bioinformatics! Please cite our paper:

Ho JWK, Adams CE, Lew JB, Matthews TJ, Ng CC, Shahabi-Sirjani A, Tan LH, Zhao Y, Easteal S, Wilson SR, Jermiin LS (2006) SeqVis: Visualization of compositional heterogeneity in large alignments of nucleotides, Bioinformatics 22, 2162-2163

A detailed description of the program's features and how to use it is available from:

Jermiin LS, Ho JWK, Lau KW, Jayaswal V (2009). SeqVis: A tool for detecting compositional heterogeneity among aligned nucleotide sequences. Pp ???-???. In Bioinformatics for DNA sequence analysis (Ed. Posada D), Humana Press, Totowa, NJ. [Preprints are available from LSJ]

Introduction

SeqVis, a Java standalone application, is an interactive three-dimensional visualization tool to explore compositional heterogeneity in large alignments of nucleotide sequences. Existing methods for assessing compositional heterogeneity among nucleotide sequences are either not reliable or computationally expensive for large alignments. SeqVis visualizes the nucleotide composition in a tetrahedron model (extension of the de Finetti plot). The user-friendly features provided by SeqVis allows compositional heterogeneous sequences to be visually identified. The use of SeqVis is illustrated by two real phylogenetics examples. The tool is freely downloadable here.


Fig. 1. Snapshot of SeqVis

Background

Compositional heterogeneity

Most phylogenetic methods assume that the sequences evolved under a single time-reversible Markov process (homogeneous, stationary, and reversible conditions). Compositional heterogeneity (ie, significant deviation of frequency of nucleotide A,T,G and C) in sequence data suggest that they did not evolve under these conditions and therefore phylogeny may not be accurately inferred.

Existing methods for assessing compositional heterogeneity

Currently, there are four categories of methods that detect compositional heterogeneity (Jermiin et al., 2004) in the alignments of nucleotides. The first category uses graphs or tables to visualize the compositional heterogeneity. Other categories perform evaluations against the expected distributions based on test statistics. The use of the first category is fairly limited to some species, whereas the latter is usually either statistically invalid or not accommodated by the scientific community. Matched-pairs tests of homogeneity for analyzing aligned nucleotides were inspired from the second problem. These tests provide useful details on Markov process. However, the results of data containing many sequences may be impractical.


Jermiin,L.S. et al. (2004). The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst. Biol., 53, 638-644.

© University of Sydney, 2004-2008. All rights reserved.