Genes in an organism's DNA ( genome) have embedded in them information about proteins, which are the molecules that do most of a cell's work. A typical bacterial genome contains on the order of 5,000 genes. Mammalian genomes can contain tens of thousands of genes. For each genome sequenced, the challenge is to identify protein components ( proteome) being actively used for a given set of conditions. Fundamentally, sequence alignment is a sequence matching problem focused on unlocking protein information embedded in the genetic code, making it possible to assemble a "tree of life" by comparing new sequences against all sequences from known organisms. But, the memory footprint of sequence data is growing more rapidly than per-node core memory. Despite years of research and development, high-performance sequence alignment applications either do not scale well, cannot accommodate very large databases in core, or require special hardware. We have developed a high-performance sequence alignment application, ScalaBLAST, which accommodates very large databases and which scales linearly to as many as thousands of processors on both distributed memory and shared memory architectures, representing a substantial improvement over the current state-of-the-art in high-performance sequence alignment with scaling and portability. ScalaBLAST relies on a collection of techniques-distributing the target database over available memory, multilevel parallelism to exploit concurrency, parallel I/O, and latency hiding through data prefetching-to achieve high-performance and scalability. This demonstrated approach of database sharing combined with effective task scheduling should have broad ranging applications to other informatics-driven sciences.