454 antibody sequencing - Error characterization and correction

被引:13
作者
Prabakaran P. [1 ,2 ]
Streaker E. [1 ,2 ]
Chen W. [1 ]
Dimitrov D.S. [1 ]
机构
[1] Protein Interactions Group, National Cancer Institute (NCI)-Frederick, National Institutes of Health (NIH), Frederick
[2] Basic Research Program, Science Applications International Corporation-Frederick, Inc., NCI-Frederick, Frederick
基金
美国国家卫生研究院;
关键词
Single Nucleotide Substitution; Complementarity Determine Region; Antibody Repertoire; Substitution Error; User Sequence;
D O I
10.1186/1756-0500-4-404
中图分类号
学科分类号
摘要
Background: 454 sequencing is currently the method of choice for sequencing of antibody repertoires and libraries containing large numbers (106 to 1012) of different molecules with similar frameworks and variable regions which poses significant challenges for identifying sequencing errors. Identification and correction of sequencing errors in such mixtures is especially important for the exploration of complex maturation pathways and identification of putative germline predecessors of highly somatically mutated antibodies. To quantify and correct errors incorporated in 454 antibody sequencing, we sequenced six antibodies at different known concentrations twice over and compared them with the corresponding known sequences as determined by standard Sanger sequencing. Results: We found that 454 antibody sequencing could lead to approximately 20% incorrect reads due to insertions that were mostly found at shorter homopolymer regions of 2-3 nucleotide length, and less so by insertions, deletions and other variants at random sites. Correction of errors might reduce this population of erroneous reads down to 5-10%. However, there are a certain number of errors accounting for 4-8% of the total reads that could not be corrected unless several repeated sequencing is performed, although this may not be possible for large diverse libraries and repertoires including complete sets of antibodies (antibodyomes). Conclusions: The experimental test procedure carried out for assessing 454 antibody sequencing errors reveals high (up to 20%) incorrect reads; the errors can be reduced down to 5-10% but not less which suggests the use of caution to avoid false discovery of antibody variants and diversity. © 2011Dimitrov et al; licensee BioMed Central Ltd.
引用
收藏
相关论文
共 11 条
[1]  
Glanville J., Zhai W., Berka J., Telman D., Huerta G., Mehta G.R., Ni I., Mei L., Sundar P.D., Day G.M., Cox D., Rajpal A., Pons J., Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire, Proceedings of the National Academy of Sciences of the United States of America, 106, 48, pp. 20216-20221, (2009)
[2]  
Boyd S.D., Marshall E.L., Merker J.D., Maniar J.M., Zhang L.N., Sahaf B., Jones C.D., Simen B.B., Hanczaruk B., Nguyen K.D., Nadeau K.C., Egholm M., Miklos D.B., Zehnder J.L., Fire A.Z., Measurement and Clinical Monitoring of Human Lymphocyte Clonality by Massively Parallel V-D-J Pyrosequencing, Science Translational Medicine, 1, 12, (2009)
[3]  
Boyd S.D., Gata B.A., Jackson K.J., Fire A.Z., Marshall E.L., Merker J.D., Maniar J.M., Zhang L.N., Sahaf B., Jones C.D., Simen B.B., Hanczaruk B., Nguyen K.D., Nadeau K.C., Egholm M., Miklos D.B., Zehnder J.L., Collins A.M., Individual Variation in the Germline Ig Gene Repertoire Inferred from Variable Region Gene Rearrangements, Journal of Immunology, 184, 12, pp. 6986-6992, (2010)
[4]  
Ilie L., Fazayeli F., Ilie S., HiTEC: Accurate error correction in high-throughput sequencing data, Bioinformatics, 27, 3, pp. 295-302, (2011)
[5]  
Lassmann T., Hayashizaki Y., Daub C.O., SAMStat: Monitoring biases in next generation sequencing data, Bioinformatics, 27, 1, pp. 130-131, (2011)
[6]  
Nguyen P., Ma J., Pei D., Obert C., Cheng C., Geiger T.L., Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire, BMC Genomics, 12, (2011)
[7]  
Kircher M., Kelso J., High-throughput DNA sequencing - Concepts and limitations, Bioessays, 32, 6, pp. 524-536, (2010)
[8]  
Dimitrov D.S., Therapeutic antibodies, vaccines and antibodyomes, Mabs, 2, 3, pp. 347-356, (2010)
[9]  
Hall T.A., BioEdit: A user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT, Nucl Acids Symp ser, 41, pp. 95-98, (1999)
[10]  
Alamyar E., Giudicelli V., Duroux P., Lefranc M.P., IMGT/HighV-QUEST: A High-Throughput System and Web Portal for the Analysis of Rearranged Nucleotide Sequences of Antigen Receptors - High-Throughput Version of IMGT/V-QUEST, JOBIM, (2010)