Washington University BLAST (WU BLAST) version 2.0 is a powerful software package for gene and protein identification, using sensitive, selective and rapid similarity searches of protein and nucleotide sequence databases. The feature list for WU BLAST 2.0 is long and continues to expand. Much of this is outlined below. A complete suite of search programs (blastp, blastn, blastx, tblastn and tblastx) is included in the package, along with database management and support programs that include nrdb, patdb, xdformat, xdget, seg, dust and xnu.
WU BLAST has been built to be the most trusted database search tool in your software toolbox, doing what you tell it, doing precisely what it says it's doing, and able to handle even your biggest jobs with aplomb. WU BLAST was built from the start to offer superior performance and flexibility. Its unique combination of features, sensitivity, speed and reliability is achieved by using advanced algorithms, through painstaking software coding, the use of extensive error checks, and through a superior design that anticipates future needs. To help users keep pace with the latest technology, with every new release of WU BLAST a high degree of backward compatibility has been provided.
WU BLAST is neither a re-hashed nor “Mac-ified” version of NCBI BLAST, although WU BLAST is in many ways easier to use. WU BLAST shares essentially no code with NCBI BLAST, except for some portions that both packages copied from the public domain ungapped BLAST 1.4 (W. Gish, unpublished). For more information about the lineage and history of WU BLAST development, please go here.
Information on licensing of WU BLAST 2.0 can be found here.
WU BLAST 2.0 is copyrighted and may not be sold, redistributed or modified in any form or by any means, without prior express written consent from the Office of Technology Management at Washington University in St. Louis.
DISCLAIMER: THIS SOFTWARE IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND.
Some key features of WU BLAST 2.0 are described below.
In support of the eXtended Database Format, a new database formatting tool named xdformat was introduced in the WU BLAST 2.0 package in 1999. The distinct features and advantages to using XDF and the xdformat program include:
*.xp? for peptide sequence databases and *.xn? for nucleotide sequence databases
(or *.x[np]? to encompass both peptide and nucleotide databases);
blastn "pri rod mam vrt htg" myquery.nt
CFTIME environment variable.
Dates will be reported in
ISO 8601
format,
if CFTIME is set to '%Y-%m-%dT%H:%M:%S'.
Dates produced according to ISO 8601 are single tokens that
can be compared lexicographically to immediately recognize their relative
chronological order,
without having to parse out and compare the individual date components
(year, month, day, hour, etc.)
Dates reported by the
xdformat program are also governed by CFTIME.
The format of many dates reported by the search programs
for XDF databases is determined by the setting (if any) of
CFTIME when the database was created or last modified
by xdformat.
filter=<filter> specifications can be requested on the BLAST
command line.
Each filter is executed independently and their results are OR-ed at the end.
wordmask=<mask> option,
where <mask> may be a classical
filter program such as seg, xnu, or dust.
Whereas sequence filters convert certain letters
in the query sequence into ambiguity codes
(X for amino acid and N for nucleotide),
word masks do not alter the sequence itself.
Instead, word masks cause certain portions of the query
sequence to be skipped during the neighborhood word generation
step of the BLAST algorithm.
This leaves the query sequence intact for generating comprehensive
alignments seeded by neighborhood word hits involving
more informative, unmasked regions of the sequence.
echofilter
option to display the filtered segments in the search output.
Please send bug reports, questions, or suggestions to
The BLAST 2.0 package from Washington University includes the following data analysis and utility programs:
-s option of the program).
A Patricia tree is used by the program (hence its name),
automatically followed (when necessary) by one or more clean-up stages
that use finite state automata.
patdb, with its substring identification option,
may be most usefully applied to protein sequences,
which often differ only in their inclusion or exclusion
of the initiator methionine and other post-translational modifications.
When identification of perfect substrings is not desired,
the nrdb program is more practical than patdb
for processing nucleotide sequences,
because of data compression techniques effectively used
by nrdb that are not available in patdb.
If the gapped alignments are nice, but even more speed or less memory use are desired, read how to make the programs fly.
1. One timed benchmark is a BLASTN comparison
of unmasked Arabidopsis thaliana chromosomes 2 and 4,
which are respectively 19.6 Mbp and 17.5 Mbp in length.
The computer used in this particular example
was a quad-processor PentiumIII Xeon system
(550 MHz, 512 KB L2 cache per processor)
running the Mandrake Linux 2.4.22-10mdk kernel.
Using a single thread of execution (one processor),
BLASTN 2.0 [29-Apr-2004] required 10 minutes, 42 seconds elapsed (wall clock) time
(10 minutes, 38 seconds CPU time)
and approximately 850 MB of memory
to search both strands at once with the command:
wu-blastall -p blastn -d at.chr2 -i at.chr4 -Ff
The same job took 3 minutes, 00 seconds elapsed time to complete on a single 2.4 GHz AMD Opteron model 250 processor running the Linux 2.6.9 kernel.
2. Using the wu-blastall wrapper script and a single processor thread,
the 2.18 Mb genome of Neisseria meningitis serogroup A
was compared against the 2.27 Mb genome of Neisseria meningitis serogroup B
in less than 8 seconds elapsed and CPU time.
3. Using the wu-blastall wrapper script,
the entire 3.098 Gb human genome reference sequence
was compared to itself using BLASTN 2.0 [15-Nov-2004]
on a dual 900 MHz Itanium2 ("Merced") processor computer system
with 10 GB memory, running Linux 2.4.18,
in less than 13 days, 2 hours elapsed time,
using two threads (both processors).
On a dual 2.4 GHz AMD Opteron system, the same search was completed
within 6 days, 21 hours elapsed time.
On a 3.0 GHz quad-core Mac Pro, the search was completed
within 4 days, 1 hour elapsed time.
In this study, the human genome sequence was
masked in advance for interspersed repeats and low-complexity regions.
Note that the Itanium system used here was 1-1/2 years older than
the Opteron system, and so these benchmarks should not be construed
as pitting two contemporary technologies against each other.
Note as well that the execution times reported here would be nearly halved,
with no loss of information,
had the whole genome
cross-comparison not compared the chromosomal sequences to each other twice
(e.g., compare chromosome 1 against chromosome 2 but not chromosome 2 against chromosome 1).
When the seed word length was increased to 14 from its default of 11, the same whole human genome cross-comparison was completed in less than 19 hours, 26 minutes elapsed time on the dual Opteron system. For this search, approximately 16 GB memory were required. On a quad-processor 1.5 GHz Itanium2 system, the job was completed in under 8 hours, 54 minutes; for this search, approximately 24 GB memory were required.
The longest sequence in the 42-record human genome sequence data set was a 246 Mb contig for chromosome 1. It was compared using the default seed word length against the entire genome (including the chromosome 1 query itself) within 1 day, 2 hours elapsed time on the quad Itanium2 system; within 12 hours, 51 minutes elapsed time on the dual Opteron system; within 11 hours, 23 minutes using all four cores of a 2.5 GHz Quad-core PowerMac G5 (Processor Performance set to “Highest” in the Energy Saver system preferences); and within 21 hours, 14 minutes using a single core of a 3.0 GHz Quad-core Mac Pro, within 12 hours, 15 minutes using two cores (and 7 GB memory), and within 7 hours, 36 minutes using all four cores (and 12 GB memory).
4. Below are some sample WU BLAST 2.0 results
produced using default parameters,
with the addition of the often-recommended
seg
low-complexity filter
and the frequently used -postsw option of WU BLASTP 2.0.
The specific exceptions to the defaults are noted in each case.
Default parameters for NCBI blastall were also used,
with the exception of using -G7 -E2
to make the scoring system identical to the WU default
gap penalty of 9 for the first residue in a gap and 2 for subsequent residues
in the gap.
Descriptions of the command line options and parameters available in WU BLAST 2.0 are here.
As described below and elsewhere,
WU BLAST 2.0 supports several environment variables
to adapt its behavior to different computing environments:
BLASTDB, BLASTFILTER and BLASTMAT.
To support dual WU/NCBI BLAST installations,
WU BLAST also supports the environment variables
WUBLASTDB, WUBLASTFILTER and WUBLASTMAT,
with the WU versions of these variables taking precedence
over the corresponding non-WU versions when both are set.
In WU BLAST 2.0, the BLASTDB (or WUBLASTDB) environment variable
can be a list of one or more directory names in which the programs
are to look for database files.
In UNIX parlance, such an environment variable might be called a path
for the database files.
Directory names should be delimited from one another by a colon
(":") and listed in the order that they should be searched.
If the BLASTDB environment variable is not set, the programs use a default
path of ".:/usr/ncbi/blast/db",
such that the programs first look in the
current working directory (".") for the requested database
and then look in the /usr/ncbi/blast/db directory.
For backward compatibility with
programs that expect BLASTDB to be a single directory specification and
not a path, if the user has set a value for BLASTDB but omitted the current
working directory,
the version 2 programs will still look for database files
in the current working directory as a last resort.
The BLASTFILTER (or WUBLASTFILTER) environment variable
can be set to the directory containing the filter programs,
such as
seg and
xnu.
The default directory for the filter programs is /usr/ncbi/blast/filter.
This usage is unchanged from version 1.4.
The BLASTMAT (or WUBLASTMAT) environment variable can be set to the parent
directory for all scoring matrix files.
The default directory for these files is /usr/ncbi/blast/matrix,
beneath which are nt and aa subdirectories for storing scoring matrix
files appropriate for nucleotide and amino acid alphabets.
This usage is unchanged from version 1.4.
For more information about environment variables, see the Installation instructions.
WU BLAST provides highly flexible means
for applying both “hard” and “soft” masks to a query sequence;
supports alternative, user-defined filter programs;
and allows the use of non-standard parameters to the standard filters.
The
filter
option (for hard masking) and the
wordmask
option (for soft masking)
provide the basic interface.
Multiple specifications of each type are acceptable
on the BLAST command line;
and individual filter and wordmask specifications may
consist of entire pipelines of commands.
For example, three filters are used in succession by this pipeline:
filter="myfilter1 | myfilter2 | myfilter3 -x5 -"
The first two filters in this case are expecting to read their input from UN*X standard input (also known as stdin), whereas myfilter3 apparently needs to be told (with the usual "-" or hyphen argument) to read data from stdin. The standard output (stdout) from myfilter1 will be read via stdin by myfilter2, which in turn processes the query before handing its results to myfilter3; finally, myfilter3 reports its results to stdout, which the BLAST program itself reads to obtain the fully masked sequence. The final output from the filter pipeline is expected by the BLAST program to be in FASTA format.
Instead of running all 3 filters in the above example as part of one pipeline, they could instead be specified as separate filter options like this:
filter=myfilter1 filter=myfilter2 filter="myfilter3 -x5 -"
The same choice of running as a pipeline or running separately is available for wordmasks, too. And of course the two approaches can be combined on the same command line. An advantage to using the pipeline approach is that all 3 filters in the example above may complete a little bit faster, because much of the I/O is avoided. Furthermore, when used in the pipeline, there's no requirement that the output from myfilter1 and myfilter2 actually be in FASTA format. Those two programs could potentially pass any information between themselves and to myfilter3. The only absolute requirement is that myfilter1 must read FASTA data from stdin and myfilter3 must output FASTA data (of the same length as the query!) to stdout.
It should be noted that with some filter programs,
passing the query sequence sequentially through
a pipeline of filters may yield
a different result than processing the query independently with each filter
and OR-ing the results.
The script seg+xnu included in the filter/ directory provides
an example with which to test this.
Specifying filter=seg+xnu on the BLAST command line
invokes a seg and xnu pipeline that is built-in to the search programs;
whereas specifying filter="seg+xnu -"
causes the seg+xnu script to be invoked on the query, which independently
executes seg and xnu, then ORs the separate results with pmerge.
(The
echofilter
option can be used to see the results of filtering displayed
in search program output).
While the built-in seg+xnu pipeline is historically the way these two filters have
been implemented,
the latter interpretation, as illustrated by the seg+xnu script with pmerge,
may be more desirable.
WU BLAST is certainly not bug free, but historically bugs have been fixed typically within 24 hours of their being reported. However, the software is currently unsupported, with no known date for resumption of its support. The currently known bugs are:
-C X” option
of xdformat,
which will replace any offending letters like J with the letter X.
However, the -C option of xdformat itself has a bug,
which causes the residue after the substituted one to be deleted.
A better solution would be to pre-process the database with a shell script that
substitutes X for J, O and U in the sequence portion of FASTA records.
A short PERL script which makes even more intelligent substitutions
is available
here.
The best solution would of course be an updated BLAST package that natively supports
these amino acid codes.
mformat=7),
and the -novalidctxok option is specified,
and a query sequence is encountered that actually does not contain any valid contexts
(given the nature of the query and any other parameters that may have been specified),
then the search program aborts and contaminates the HTML output with non-conforming text.
altscore option may
crash blastn.
gapsepQmax and gapsepSmax parameters
were first announced as being “deprecated”,
settings of the hspsepQmax and hspsepSmax parameters
have been ineffective
(i.e., their settings have been ignored by the software),
if-and-only-if the nogaps option was also specified.
To avoid this bug when using the nogaps option,
specify the same values for the
gapsepQmax parameter
as one would specify for hspsepQmax;
and do the same for the gapsepSmax parameter
as for hspsepSmax.
Warnings will be issued regarding the deprecated nature of
gapsepQmax and gapsepSmax,
but these can be safely ignored.
WU BLAST also has some characteristics or behaviors worth mentioning here that could trip up or confuse even the most knowledgeable of BLAST users. Any unexpected behavior might rightfully be construed as being a bug, so the following information is provided to help avoid the unexpected. If you should encounter problems or confusing areas other than those described below, or if you have questions or suggestions, please send them to
gapK,
gapL
and
gapH
command line options should be used to set them.
"+1,-3", {3,3} {3,2} {3,1}
"+1,-2", {2,2} {2,1} {1,1}
"+3,-5", {10,5} {6,3} {5,5}
"+4,-5", {10,5}
"+1,-1", {3,1} {2,1}
"+5,-4", {20,10} {10,10}
"+5,-11", {22,22} {22,11} {12,2} {11,11}
and for the Purine-Pyrimidine scoring matrix named “pupy”:
pupy =
{ 20, 10}
{ 10, 10}
Precomputed values for λ, K and H are available for protein-level searches with the following scoring matrix and gap penalty combinations (or gap penalty ranges for R) {Q, R}:
blosum50 =
{ 16, 1-4}
{ 15, 1-4,6,8}
{ 14, 1-5,8}
{ 13, 1-5,8}
{ 12, 2-5,7}
{ 11, 2-4,6,8}
{ 10, 2-6,8}
{ 9, 3-5,7}
{ 8, 4-8}
{ 7, 6,7}
blosum55 =
{ 16, 1-4}
{ 15, 1-4,5,6,8}
{ 14, 1-5,7}
{ 13, 2-5,8}
{ 12, 2-5,8}
{ 11, 2-6,8}
{ 10, 3-6,9}
{ 9, 3-5,7}
{ 8, 4-8}
{ 7, 7}
blosum62 =
{ 12, 1-3}
{ 11, 1-3}
{ 10, 1-4}
{ 9, 1-5}
{ 8, 2-7}
{ 7, 2-6}
{ 6, 3-5}
{ 5, 5}
blosum80 =
{ 12, 2-12}
{ 11, 2-11}
{ 10, 2-10}
{ 9, 3-9}
{ 8, 4-8}
{ 7, 5-7}
pam40 =
{ 12, 1,2,6}
{ 11, 1,2,7}
{ 10, 1-3,7}
{ 9, 1-3,6}
{ 8, 1-4}
{ 7, 1-4}
{ 6, 2-5}
{ 5, 2-5}
{ 4, 3,4}
pam120 =
{ 12, 1,2,4}
{ 11, 1-3}
{ 10, 1-3,5}
{ 9, 1-3,5}
{ 8, 1-4,6}
{ 7, 2-4,6}
{ 6, 2-5}
{ 5, 3-5}
pam250 =
{ 16, 1-4}
{ 15, 1-5}
{ 14, 1-6}
{ 13, 1-6}
{ 12, 2-7}
{ 11, 2-7}
{ 10, 3-8}
{ 9, 3-7}
{ 8, 5-7}
{ 7, 7}
The computing platforms currently supported by BLAST 2.0 include the following:
*X64 is shorthand for the AMD “AMD64” and Intel “EM64T” microprocessor architectures or technologies, which support 64-bit virtual addressing and improved instruction sets. X64 binaries often provide significantly better performance than their 32-bit counterparts built for the legacy 32-bit architecture known as X86. The X64 architecture is sometimes referred to by another vendor-agnostic name, X86_64.
The list of supported platforms is subject to change without notice. It was last updated 28-Jan-2006.
Multiple processors (multithreading or parallel processing) are effectively and efficiently supported by WU BLAST on all of the above platforms. The software also supports large files (files greater than 2 GB in size) when the underlying operating system and file system support large files (which is typically the case these days).
WU BLAST was the only BLAST available for Mac OS X when Mac OS X became publicly available — and for months thereafter. Under Mac OS X, WU BLAST has been observed to be the only BLAST that runs faster on multiple G4 processors, conditions in which some implementations actually run slower and are unstable. There is no evidence that vector processing instructions (such those provided by Velocity Engine) increase the speed of BLAST searching, but these instructions certainly can restrict the software to running only on certain computers. WU BLAST obtains its superior speed through painstaking optimizations, without using specialized instructions, so users can run WU BLAST even on a G3. Because the software employs a command line interface, it can also be run on the freely available OpenDarwin operating system.
Please refer to the README.html file
that comes bundled with the software
for more detailed and specific installation instructions.
Low-complexity sequence filters or masking programs —
e.g.,
seg,
xnu
and
dust
— are bundled in WU BLAST software packages.
Whatever directory you install the filter programs in,
the BLASTFILTER environment variable should be set to point there.
In the absence of this environment variable being set,
the programs look for masking programs in /usr/ncbi/blast/filter.
NOTE: unlike the NCBI search programs,
WU BLAST does not employ sequence filtering by default.
Databases can be downloaded from any of the many sources available on the Internet. After downloading, the database files are typically uncompressed and processed into FASTA format, then into a BLAST-able database format. Included with WU BLAST software are several utility programs for converting text-based database files into FASTA format:
The NCBI software Toolbox also contains parsers, including one named asn2fast that can convert both nucleotide and peptide sequences in GenBank ASN.1 format into FASTA format files.
All of the above parsers can read from standard input (sometimes signified by a dash, “-”), so their input files can be maintained on disk in compressed format and dynamically zcat-ed or gunzip-ed directly into the parsers, thus saving the time and storage required for the uncompressed data. To specify standard input for a required input filename argument, some of these programs require that a double-dash (--) precede the single-dash. This double-dash signifies the end of the command line options and the start of the required arguments.
Once databases are in FASTA format, the xdformat program is used to convert them into a blastable eXtended Database Format. Terse usage instructions for this program can be obtained by invoking it without command line arguments. When producing a blastable database, xdformat creates 3 or 4 output files whose names by default are derived from the name of the input FASTA-format file. The output files are given distinct filename extensions and together comprise the blastable database. More information about blastable database file formats is available here.
The blastable database files can be placed anywhere,
but the BLASTDB environment variable
should point to their directory location.
If the BLASTDB environment variable is not set,
the programs look for databases in /usr/ncbi/blast/db
and in the current working directory.
On systems where NCBI BLAST will not be used,
databases can be maintained in multiple directories listed
in the BLASTDB environment variable,
delimiting the directory names with colons,
just as directory names are delimited
in the PATH environment variable used by UNIX command shells.
On multi-processor computer systems, the search programs will by default employ as many CPUs as are installed (or up to 4 CPUs in the case of BLASTN). Using too many processors — sometimes even two processors — can be inefficient or lead to prohibitive memory requirements. Depending on how many processors and how much memory are installed in your computer, you may want to wrap the search programs in a shell script that sets a lower number of CPUs via the cpus=# command line option. Another approach to changing the default number of CPUs follows below, for BLAST managers brandishing “root” or “SuperUser” privileges.
A sample file named sysblast.sample is bundled with the software,
to help in establishing system-wide configuration parameters
governing the behavior of BLAST processes.
When installed under the name /etc/sysblast,
The file /etc/sysblast resides in a directory
that is local to any given computer,
so parameter values can be configured differently for different computers,
even if the software itself is accessed from a shared disk partition.
sysblast is only effective if installed
in the /etc directory;
and the /etc directory should only be writable by “root”.
See the comments included in sysblast.sample file
for further details.
Unlike the shell script wrapper approach described earlier,
limits set in /etc/sysblast can not be easily
or unwittingly circumvented.
Citations or acknowledgements of WU-BLAST usage are greatly appreciated, as are any personal accounts of how the software is being used that you might wish to share. When URLs are acceptable, please cite with:
Gish, W. (1996-2004) http://blast.wustl.edu
When URLs are not acceptable, please use:
Gish, W., personal communication.
The WU-BLAST unified search program may also be referred to by the name BLASTA.
In scientific communications, it is important to report the program name, as well as the specific version(s) used. In the case of WU-BLAST or BLASTA, the version is a combination of the "2.0" moniker and the release date. The release date can be found on the first line of output, and it is the first date displayed. For example, consider this introductory line of output:
BLASTN 2.0MP-WashU [02-Apr-2002] [sol8-ultra-ILP32F64 2002-04-03T01:25:46]
In the above, the software release date is April 2, 2002, whereas the build date of the Solaris 8 UltraSPARC binary executable was April 3rd at 1:25 AM.
WU BLAST 2.0 is the original gapped BLAST with statistics. It builds upon BLAST 1.4 written by Warren Gish in 1994, while a fellow at the NCBI. (See http://blast.wustl.edu/blast-1.4; Altschul et al., 1990; Gish and States, 1993). Both NCBI BLAST and WU BLAST 1.4 (but not 2.0) are in the public domain.
Development of BLAST version 2 with gapped alignments was begun by W. Gish as an independently funded research and development effort at Washington University in late 1994, where it continues as such today. WU BLAST 2.0 was initially released as free, copyrighted software in May 1996, before the NCBI expressed an interest in pursuing this area and 16 months ahead of the NCBI releasing its public domain gapped alignment tools in September 1997. Beginning in October 1997, in response to the NCBI release, more advanced versions of WU BLAST were made available only under the covenant of a license agreement. The last freely available (and now long obsolete) version of WU BLAST is 2.0a19, posted in February 1998.
Historical notes and additional citation information for some earlier versions of NCBI and WU BLAST include:
Altschul, SF, and W Gish (1996). Local alignment statistics. ed. R. Doolittle. Methods in Enzymology 266:460-80.
Altschul, SF, and DJ Lipman (1990). Protein database searches for multiple alignments. Proc. Natl. Acad. Sci. USA 87:5509-13.
Altschul, SF, Gish, W, Miller, W, Myers, EW, and DJ Lipman (1990). Basic local alignment search tool. J. of Mol. Biol. 215:403-10.
Altschul, SF, Madden, TL, Schaffer, AA, Zhang, J, Zhang, Z, Miller, W, and DJ Lipman (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17):3389-402.
Claverie, JM, and DJ States (1993). Information enhancement methods for large scale sequence analysis. Computers in Chemistry 17:191-201.
Gish, W, and DJ States (1993). Identification of protein coding regions by database similarity search. Nature Genetics 3:266-72.
Hancock, JM, and JS Armstrong (1994). SIMPLE34: an improved and enhanced implementation for VAX and Sun computers of the SIMPLE algorithm for analysis of clustered repetitive motifs in nucleotide sequences. Comput. Appl. Biosci. 10:67-70.
Karlin, S, and SF Altschul (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87(6):2264-8.
Karlin, S, and SF Altschul (1993). Applications and statistics for multiple high-scoring segments in molecular sequences. Proc. Natl. Acad. Sci. 90:5873-7.
Pearson, WR, and DJ Lipman (1988). Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. 85(8):2444-8.
Smith, TF, and MS Waterman (1981). Identification of common molecular subsequences. J. Mol. Biol. 147:195-7.
States, DJ, and W Gish (1994). Combined use of sequence similarity and codon bias for coding region identification. J. Comp. Biol. 1:39-50.
Wootton, JC, and S Federhen (1993). Statistics of local complexity in amino acid sequences and sequence databases. Computers in Chemistry 17:149-63.
Wootton, JC, and S Federhen (1996). Analysis of compositionally biased regions in sequence databases. ed. R. Doolittle. Methods in Enzymology 266:554-71.
Zhang, Z, Schaffer, AA, Miller, W, Madden, TL, Lipman, DJ, Koonin, EV, and SF Altschul (1998). Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res. 26:3986-90.
Return to the WU BLAST Archives home page
Copyright © 2005 Warren R. Gish, Saint Louis, Missouri 63108 USA. All rights reserved.