Dysfunctions of the genes coding for the two chains of the human type-I procollagen result in genetic disorders that affect the integrity of bone, ligaments, tendons, and other connective tissues. While the primary amino acid (aa)sequence of one of the two type-I subunits, pro.alpha.2(I), has been derived in its entirety from the analysis of overlapping cDNAs, the sequence of the first 247 aa residues of the helical domain of the other polypeptide, pro.alpha.1(I), had yet to be determined. To this end, we have sequenced nearly 4 kb of the human pro.alpha.1(I) collagen gene and identified twelve open reading frames whose conceptual amino acid translation exhibits 95% homology to the first 247 aa of rat .alpha.1(I) chain. Furthermore with these and other data, some of which previously unpublished, we have derived the complete sequence of the first 7618 bp of the gene. This region comprises the 25 exons encoding the N-terminal pre-propeptide and five of the eight cyanogen-bromide-derived peptides. This information therefore represents a most useful for the characterization of molecular defects in individuals affected by various connective tissue disorders.