Index of /network/experimental/unix

Icon  Name                    Last modified      Size  Description
[DIR] Parent Directory - [TXT] INSTALL 25-Apr-1995 21:07 4.3K [TXT] Makefile 10-Jul-1996 12:56 1.9K [TXT] blast.c 05-Oct-1999 17:08 23K [TXT] blastd.README 14-Mar-1997 00:36 9.2K [TXT] blastd.c 21-Feb-1997 17:18 67K [TXT] blastsrv.h 25-Apr-1995 21:03 3.1K
Last Revision September 1994

Thank you for your interest in the NCBI's Experimental BLAST Network Service
for performing sequence similarity searches of standard peptide and nucleotide
sequence databases.  This service is available for use from potentially any
computer connected to the Internet/NSFNet.  Client interfaces may be readily
obtained for SunOS, Solaris, SGI IRIX, and other UNIX platforms, Macintosh with
MacTCP, DEC VMS with some kinds of TCP/IP driver support, and limited support
for MS-DOS (but not MS-Windows).

Please do not contact the NCBI about any problems that may arise with the
installation or operation of this experimental software, but contact the
author of the specific client program involved instead.

*** Before the client software will work, client computer Internet addresses
*** must be registered with the NCBI as described below.


*** The Experimental BLAST Network Service is provided free of charge.
As a publicly shared resource, please treat it as such by limiting the
amount of production-level work that is asked of it. ***
Public domain, UNIX-compatible source code for the BLAST application programs
themselves is available via anonymous FTP on ncbi.nlm.nih.gov (130.14.25.1)
beneath the /pub/blast directory.

Requests to use the network service should be sent via electronic mail or
printed on institutional letterhead and addressed to:

        Network BLAST Registration
        National Center for Biotechnology Information
        National Library of Medicine
        Building 38A, Room 8N-806
        8600 Rockville Pike
        Bethesda, MD 20894-0001
        (301) 496-2475
        FAX:  (301) 480-9241
        e-mail:  blast-help@ncbi.nlm.nih.gov

In the request, please describe the nature of the intended use of the service
(e.g., general molecular biology or genome studies), the approximate number of
users, and the estimated average number of searches that will be asked of the
system per day or per week.  The name, postal address, Internet electronic mail
address, and telephone number of a designated computer system administrator(s)
at your institution who will install the client software, field questions about
the service from its users, and inform the local community of pending changes
to the service must be provided.  The Internet address in dotted-decimal
notation (e.g., 130.14.25.1) for each computer which will need access to the
service must also be provided.  If all hosts on a particular subnet are
"trusted" and many of them will need access to the service, the entire subnet
may be registered to use the service in order to reduce the administrative
burden of maintaining a dynamic access control list.  A description of the
client computer hardware and operating systems would be helpful but is not
required.

**************
* Users of the experimental service should be aware that the software and the
* databases are evolving rapidly.  The current "experimental" service will
* be supplanted by a new form of the service within the coming months; and
* a transition period of approximately one year is expected, during which time
* both the experimental and the new services will be provided.  As with other
* NCBI services, an effort is made to support as broad a range of computing
* platforms as possible, but it is not possible at the present time to provide
* a list of the specific platforms that will be supported by client software
* developed by the NCBI to access the new BLAST network service.  Users should
* receive ample prior notice of any substantive changes to the service, via the
* introductory blast service output, network news, e-mail to the network
* service administrators, and in the free "NCBI News" newsletter.
**************

The programs currently available via the network service are:

  blastp  - protein query sequence vs. protein sequence database
  blastn  - nucleotide query sequence (both strands) vs. nt. sequence database
  blastx  - nt. query translated in 6 frames vs. protein database
  tblastn - protein query vs. nt. database translated in 6 frames
  blast3  - protein query vs. protein database, identifying 3-way alignments

Nearly all of the programs' command line parameters are user-modifiable through
the network service, although client software may limit or guide the user to
particular choices of parameters.  The main functional restriction between
using the network service and running the same programs locally is:  the client
is restricted to searching the databases and to using the substitution scoring
matrix files that reside on the server.

While one of the features of the BLAST algorithm is that, with an appropriate
choice of parameters, sensitivity can be sacrificed for an increase in speed,
the NCBI's default parameters for BLASTP have been chosen to achieve a high
degree of sensitivity (marginally significant matches are rarely missed); a
good rate of speed is yet achieved by employing parallel processing.  The
default parameters for BLASTX and TBLASTN achieve a moderately high degree of
sensitivity (occasionally, matches which are only marginally significant will
be missed), while the default BLASTN parameters have been chosen exclusively
for speed (marginally significant matches are often missed while significant
matches are rarely missed).  Independently of the chosen level of sensitivity,
all of the BLAST programs retain their selective nature of only reporting the
subset of matches they find which satisfy a statistically determined cutoff
score.

At the time of this writing, the default cutoff score is chosen by the BLAST
programs so as to obtain about 10 satisfying matches by chance alone; it is not
surprising then to find at least a few matches to almost any query sequence
when searching with the more sensitive programs (i.e., not BLASTN), but the
lowest-scoring matches reported may be statistically insignificant.  To reduce
the amount of low-significance output produced, the cutoff score may be raised
directly via the S command line parameter or indirectly with the E (expect)
command line parameter.  Many uninteresting matches can often be eliminated
from the output by using query sequence filtering options.  Please refer to the
UNIX-style blast manual page included with the BLAST application program source
code for a description of these and other parameters.

*** Because the format of output from the BLAST programs through the
experimental BLAST network service is subject to change in subtle and not so
subtle ways without notice, users are strongly discouraged from developing
specialized software to parse this output or from otherwise becoming too
dependent on its format.  On the other hand, the new BLAST service will provide
structured output in messages specified in ASN.1; users will be encouraged to
parse its output instead.  ***


The databases currently available via the network service are listed below.
For either peptide or nucleotide sequence database searches, the "nr" database
provides an efficient and effective way to search all of the component
databases.

PEPTIDE SEQUENCE DATABASES
  nr         - a non-redundant merger of the PDB, PIR, SWISS-PROT, GenPept,
               SWISS-PROT weekly update, and GenPept daily update, compiled
               approximately daily.  Only 100% identical sequences are merged
               into a single entry in the "nr" database with each of their
               associated description lines concatenated into one.
  pir        - the last quarterly release of the NBRF PIR protein database.
  swissprot  - the last quarterly release of the SWISS-PROT protein database.
  spupdate   - cumulative weekly update to the last major release of SWISS-PROT
  genpept    - coding sequence translations from the major release of GenBank(R)
  gpupdate   - coding sequence translations of the cumulative daily updates
               to the major release of GenBank(R)
  pdb        - sequences derived from the 3-d structure Brookhaven Protein
               Data Bank
  kabatpro   - Kabat's Sequences of Proteins of Immunological Interest
  tfd        - TFD transcription factor amino acid sequence database
  alu        - Translations of select Alu repeats from REPBASE

NUCLEOTIDE SEQUENCE DATABASES
  nr         - a non-redundant merger of PDB, GenBank(R), the GenBank(R)
               cumulative daily updates, EMBL Data Library, and the EMBL
               weekly update, compiled approximately daily.  Only 100%
               identical sequences are merged into a single entry with each
               of their associated description lines concatenated into one.
  genbank    - the last major release of GenBank(R)
  gbupdate   - cumulative daily updates to the GenBank(R) major release.
  embl       - the last quarterly release of the EMBL Data Library
  emblu      - the cumulative weekly updates to the EMBL Data Library
  pdb        - sequences derived from the 3-d structure Brookhaven Protein
               Data Bank
  repbase    - Human and other primate Alu repeats, Dr. Jerzy Jurka
  alu        - Select Alu repeats from REPBASE
  epd        - eukaryotic promoter database
  kabatnuc   - Kabat's Sequences of Nucleic Acid of Immunological Interest
  dbest      - Database of Expressed Sequence Tags, NCBI
  vector     - a vector subset of GenBank

When publishing results of a BLAST search, please mention that the search was
performed at the NCBI using the GENINFO(R) Experimental BLAST Network Service
and cite the following publication for the algorithm employed by the programs
BLASTP, BLASTN and TBLASTN:

Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers and
David J. Lipman (1990).  Basic local alignment search tool.  J. Mol. 
Biol. 215:403-410. 

For BLASTX, please cite:

Gish, W. and D. J. States (1993).  Identification of protein coding
regions by database similarity search.  Nature Genetics 3:266-72.

And for the BLAST3 program, please cite:

Altschul, Stephen F. and David J. Lipman (1990).  Protein database
searches for multiple alignments.  Proc. Natl. Acad. Sci. USA 87:
5509-5513.