The distribution of all trinucleotide microsatellite sequences in the GenBank database was surveyed to provide insight into human genetic disease syndromes that result from expansion of microsatellites. The microsatellite motif (CAG)(n) is one of the most abundant microsatellite motifs in human GenBank DNA sequences and is the most abundant microsatellite found in exons. This fact may explain why (CAG)(n) repeats are thus far the predominant microsatellites expanded in human genetic diseases. Surprisingly, (CAG)(n) microsatellites are excluded from intronic regions in a strand-specific fashion, possibly because of similarity to the 3' consensus splice site, CAGG. A comparison of the positions of microsatellites in human vs rodent homologous sequences indicates that some arrays are not extensively conserved for long periods of time, even when they form parts of protein coding sequences. The general lack of conservation of trinucleotide repeat loci in diverse mammals indicates that animal models for some human microsatellite expansion syndromes may be difficult to find. (C) 1994 Academic Press, Inc.