On the applications of multimedia processing to communications

被引:26
作者
Cox, RV [1 ]
Haskell, BG [1 ]
Lecun, Y [1 ]
Shahraray, B [1 ]
Rabiner, L [1 ]
机构
[1] AT&T Bell Labs, Speech & Image Proc Serv Res Lab, Florham Pk, NJ 07932 USA
关键词
AAC; access; agents; audio coding; cable modems; communications networks; content-based video sampling; document compression; fax coding; H.261; HDTV; image coding; image processing; JBIG; JPEG; media conversion; MPEG; multimedia; multimedia browsing; multimedia indexing; multimedia searching; optical character recognition; PAC; packet networks; perceptual coding; POTS telephony; quality of service; speech coding; speech compression; speech processing; speech recognition; speech synthesis; spoken language interface; spoken language understanding; standards; streaming; teleconferencing; video coding; video telephony;
D O I
10.1109/5.664272
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The challenge of multimedia processing is to provide services that seamlessly integrate text sound, image, and video information and to do it in a way that preserves the ease of use and interactivity of conventional plain old telephone service (POTS) telephony, irrelevant of the bandwidth or means of access of the connection to the sen,ice. To achieve this goal, there are a number of technological problems that must be considered, including: compression and coding of multimedia signals, including algorithmic issues, standards issues, and transmission issues; synthesis and recognition of multimedia signals, including speech, images, handwriting, and text; organization, storage, aid retrieval of multimedia signals, including the appropriate method and speed of delivery (e.g., streaming versus full downloading), resolution (including layering or embedded versions of the signal), and quality of service, i.e., perceived quality of the resulting signal; access methods to the multimedia signal (i.e., matching the user to the machine), including spoken natural language interfaces, agent interfaces, and media conversion tools; searching (i.e., based on machine intelligence) by text, speech, and image queries; browsing (i.e., based on human intelligence) by accessing the text, by voice, or by indexed images. In each of these areas, a great deal of progress has been made in the past few years, driven in parr by the relentless growth in multimedia personal computers and in part by the promise of broad-band access from the home and from wireless connections. Standards have also played a key role in driving new multimedia services, both on the POTS network and on the Internet. It is the purpose of this paper to review the status of the technology in each of the areas listed above and to illustrate current capabilities by describing several multimedia applications that have been implemented at AT&T Labs over the past several years.
引用
收藏
页码:755 / 824
页数:70
相关论文
共 79 条
  • [1] ABELLA A, 1996, P ECAI96 SPOK DIAL P, P1
  • [2] [Anonymous], PROGR SPEECH SYNTHES
  • [3] [Anonymous], AUTOMATIC SPEECH SPE, DOI DOI 10.1007/978-1-4613-1367-0_1
  • [4] [Anonymous], P INT C COMP SYST SI
  • [5] [Anonymous], FAX DIGITAL FACSIMIL
  • [6] [Anonymous], 1994, MANAGING GIGABYTES C
  • [7] MEANS FOR ACHIEVING A HIGH DEGREE OF COMPACTION ON SCAN-DIGITIZED PRINTED TEXT
    ASCHER, RN
    NAGY, G
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1974, C 23 (11) : 1174 - 1179
  • [8] BALDWIN RW, 1997, SPECTRUM FEB, P40
  • [9] LEREC - A NN/HMM HYBRID FOR ONLINE HANDWRITING RECOGNITION
    BENGIO, Y
    LECUN, Y
    NOHL, C
    BURGES, C
    [J]. NEURAL COMPUTATION, 1995, 7 (06) : 1289 - 1303
  • [10] BENGIO Y, 1994, INT C PATT RECOG, P409, DOI 10.1109/ICPR.1994.576966