Several characteristics of a BLAST search will determine its heap memory requirements. As BLAST is currently implemented, memory requirements are significantly different from earlier versions of the software. Today, the memory required is often dominated by a linear function of the length of the query sequence (with a proportionality constant of at least 5 but potentially many fold larger than this), plus up to 2 times the length of the longest database sequence (to accommodate its full-length translation into protein in the TBLASTN and TBLASTX search modes). The database component should also be multiplied by the number of CPUs or threads employed.
For older, rote implementations of the “classical” ungapped BLAST algorithm (Altschul et al., 1990), the contributors to memory use include:
As an example, not counting the memory required to save intermediate results and for post-processing, the storage required (in bytes) for a classical BLASTN search is approximately: 5SQ + C[8S(Q+D) + D] + B, where S=1 or S=2 depending on whether one or both strands of the query are searched. In this example, B will be no more than (and often much less than) P4W. Using one processor or thread, a typical BLASTN search using both strands of the query and database will require at least 26Q + 17D + B bytes. If an additional processor or thread is used, the minimum memory requirement will increase by 16Q + 17D, for a total of 42Q + 34D + B bytes.
Memory requirements for the current BLAST implementation are often greatly reduced from the above.
Clearly, when long database sequences are involved,
memory requirements can be reduced
by limiting the number of threads employed (C) on multiprocessor systems.
The default behavior is to use all available processors
(one thread per processor)
in the case of BLASTP, BLASTX, TBLASTN and TBLASTX;
and up to 4 processors in the case of BLASTN.
This default behavior can be altered in a local file named /etc/sysblast.
(An example sysblast.sample file is provided in licensed BLAST 2.0
software distributions).
The most efficient use of central processing unit (CPU) resources is
perhaps most often obtained by limiting individual BLAST jobs
to using just a single thread (cpus=1),
so that the computational overhead of thread creation,
synchronization, thread-local memory allocation and destruction can be avoided.
Single-threading of BLAST jobs is less likely to be beneficial
when searches are performed only sporadically (as opposed to being executed
virtually non-stop in a high-throughput pipeline)
or if the database(s) being searched can not all be cached in memory.
File caching allows all CPUs to run virtually unhindered
without contending for services from slow I/O devices, such as locally
attached or network attached disk drives (see below).
If a multiprocessor computer system with n CPUs
is to run single-threaded BLAST jobs,
to avoid the inefficiency of letting the other n-1 CPUs sit idle,
some mechanism for running multiple, single-threaded BLAST jobs
should be employed — e.g., Sun Grid Engine.
Running many single-threaded BLAST jobs simultaneously is not always
practical, however.
A major component of BLAST memory requirements is correlated
with the length of the query sequence and is independent of the number
of threads employed.
In some memory limiting cases
— particularly when the queries are very long
and the database sequences are shorter —
allowing BLAST to use multiple threads may provide the most
efficient utilization of CPU resources.
When activated for a computer system, Intel HyperThreads typically appear like separate processors to application programs like BLAST. BLAST can spawn an additional thread of execution for each HyperThread. Use of HyperThreads often speeds up a search, but with the attendant increase in memory required for each additional thread.
Further memory savings can be had by requesting that just one strand be searched at a time, because the default behavior is to search both strands of the query and/or database. While it does not reduce heap storage requirements, processor address space can be conserved by using the -mmio option, which may then allow more of the address space to be utilized for heap memory allocation; the -mmio option is not recommended for general use, however, because memory-mapped I/O is faster than buffered I/O.
Sufficient real memory should be provided to the search programs that they can run without spilling over into virtual memory swap storage, as it can be disastrous to BLAST performance to be hitting disk.
Long queries and database sequences may exceed the limits of 32-bit virtual addressing and require 64-bit processing. On computing platforms that support 64-bit virtual addressing, WU BLAST often supports both 32-bit and 64-bit virtual addressing, in separately distributed software distributions. While 32-bit virtual addressing may at first seem out-dated and a waste of a 64-bit computer, if a given problem can be computed within 32-bit limits, less memory will be required and execution may proceed significantly faster than it would in 64-bit mode. Even when 32-bit virtual addressing will suffice, a 64-bit computer can be advantageous, in that the 64-bit system may be configurable with more memory in which to cache database files, more memory in which to utilize more threads on a given BLAST job, or more memory in which to run more jobs simultaneously.
For more information on 32-bit versus 64-bit computing and the proper configuration of operating systems to support large memory, please see http://blast.wustl.edu/blast/KernelTuning.html.
Beyond the above requirements for program storage, any additional memory available may improve BLAST performance, through caching of database files in what would otherwise be unused memory. When the same database(s) are to be searched repeatedly (e.g., by an automated analysis pipeline), caching of the database files avoids the latency of disk I/O and potential contension between different processes for the same disk resources.
Regardless of where they reside, on a local disk or over the network, files are usually cached as they are written or read by an operating system, in a FIFO (first in/first out) manner. If sufficient memory is only available to cache a subset of BLAST database files that are to be repeatedly searched, file caching will not improve throughput unless the job stream is restructured. The restructuring should be designed to search all queries against one cache-able database subset before proceeding to search all queries against the next cacheable subset, and so on, until all of the subsets have been searched. In this manner, the database files can be cached for every search, except of course for the first search against each subset which primes the cache.
How much additional memory is useful for file caching?
Typical BLAST searches involve a sequential search through an entire database.
Each search then requires that the entirety of the ntdb.xns or aadb.xps file be read, in addition to the associated .x[np]t file.
For any database hits, the associated .x[np]d file will be read to obtain
the sequence descriptions.
Sufficient memory should be available to cache the .x[np]s and .x[np]t files,
plus some additional memory for perhaps a few thousand database descriptions.
Adding memory is unlikely to improve performance
if there will still be insufficient memory with which to cache
the entire .x[np]s and .x[np]t files,
due to the FIFO nature of cache management.
File caching occurs automatically with contemporary operating systems, albeit with varying degrees of effectiveness. For instance, at least some versions of Linux on IA32 platforms seem to have difficulty caching "large" files (files on the order of 1 GB or larger). One should be wary of other jobs executing simultaneously with BLAST, as well, whose actions may purge the cache of BLAST database files. If other jobs besides BLAST are active on the computer system, sufficient memory should be provided for them to function within memory, as well, on top of the requirements for the BLAST algorithm and BLAST database file caching.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. 215:403-10.
Last updated: 2006-02-16
Copyright © 2005-2006 Warren R. Gish, Saint Louis, Missouri 63110 USA. All rights reserved.