The complete nucleotide sequence of the celH gene of Clostridium th cellum was determinned. The open reading frame extended over 2.7-kb DNA fragment and encoded a 900-amino acid (aa) protein (Mr 102301) which hydrolyzes carboxy-methylcellulose, p-nitrophenyl-β-d-cellobioside, methylumbelliferylβ-d-cellobioside, barley β-glucan, and larchwood xylan. The N terminus showed a typical signal peptide, and a cleavage site after Ser44 was predicted. Two Pro-Thr-Ser-rich regions divided the protein into three approximately equal domains. The central 328-aa region was similar to the N-terminal part, carrying the active site, of C. thermocellum endoglucanase E(EGE; 30.2%). The C-terminal ended with two conserved 24-aa stretches showing close similarity with those previously described in EGA, EGB, EGD, EGE, EGX, and xylanas from C. thermocellum. Deletions of celH removing up to 327 condons from the 5′ end and up to 245 condons from the 3′ end of the coding sequence did not affect enzyme activity, confirming that the central domain was indeed responsible for catalytic activity. Production of truncated EGH in Escherichia coli was increased up to 120-fold by fusing fragments containing the 3′ portion of the gene with the start of lacZ′ present in pTZ19R. © 1990.