Objectives: Molecular studies for genetic polymorphisms are being carried out for a number of different applications, such as genetic disorders in different populations, pharmacogenomics, genetic identification of ethnic groups for forensic and legal applications, genetic identification of breed/stock in animals and plants for commercial applications and conservation of germ plasm. In this paper, for a random sampling scheme, we address two questions: (A) What should be the minimum size of the sample so that, with a prespecified probability, all alleles at a given locus (or haplotypes at a given set of loci) are detected? (B) What should be the sample size so that the allele frequency distribution at a given locus (or haplotype frequency distribution at a given set of loci) is estimated reliably within permissible error limits? Methods: We have used combinatorial probabilistic arguments and Monte Carlo simulations to answer these questions. Results: We found that the minimum sample size required in case A depends mainly on the prespecified probability of detecting all alleles, while in case B, it varies greatly depending on the permissible error in estimation (which will vary with the application). We have obtained the minimum sample sizes for different degrees of polymorphism at a locus under high stringency, as well as a relaxed level of permissible error. We present a detailed sampling procedure for estimating allele frequencies at a given locus, which will be of use in practical applications. Conclusion: Since the sample size required for reliable estimation of allele frequency distribution increases with the number of alleles at the locus, there is a strong case for using biallelic markers (like single nucleotide polymorphisms) when the available sample size is about 800 or less. Copyright (C) 2001 S. Karger AG, Basel.