Speech technologies and applications conference session - Speakers and companies profiles

Presentation: Interactivity as the Key to Real-world Development 

Dr. Mark Seligman is founder and President of Spoken Translation, Inc. His background combines ivory tower and school of hard knocks: he is both a researcher (with a PhD in computational linguistics from UC Berkeley, granted in 1991) and a Silicon Valley veteran (with technical and managerial participation in four high-tech start-ups). His experience in industry began in 1983 as the founding software trainer at IntelliCorp, Inc., a forefront developer of artificial intelligence programming tools. He was lead trainer for seven years. In parallel, he pursued doctoral studies, with a dissertation on the automatic generation of multi-paragraph discourses. After graduation, he spent three years at ATR (Advanced Telecommunications Research) Institute International near Kyoto, where he studied numerous aspects of speech-to-speech translation. In 1997, after returning to the US, he proposed to CompuServe, Inc. that a ?quick and dirty? speech translation system for English and French be created by adding dictation front ends and text-to-speech back ends to the company?s experimental chat translation system. The result was the first speech translation system anywhere demonstrating broad coverage with acceptable quality. In 1998, Dr. Seligman became Manager of Technical Publications for Linguistic Products at Inxight Software, Inc., a Xerox spin-off aiming to commercialize linguistic and visualization programs developed at PARC. In early 2002, he left to establish Spoken Translation, Inc.

Spoken Translation, Inc. is a pioneering supplier of solutions for speech-to-speech translation and related components for the online world. Its flagship product provides an economical and convenient alternative to human interpreting. The company's interactive text translation products allow interactive monitoring and correction even by monolingual users, thus enabling higher output quality and increasing user confidence. Its innovative dictation facilities enable voice-driven rather than typed text entry with fully interactive self-correction, thus saving time and increasing productivity, and are completely server-based for ease of maintenance.

Presentation: Automatic Identification of Spoken Names and Addresses – and why we should abolish account numbers!

Melvyn Hunt has a doctorate in physics from Oxford University. In the 1970s, while with BNR (now Nortel) in Montreal, he pioneered the use of LDA for acoustic representations and MLLR for speaker adaptation, both techniques being now in widespread use. In the 1980s, his team at the National Research Council of Canada developed a system that allowed a helicopter to be piloted by voice. In 1990 as Chief Scientist for Marconi Speech & Information Systems, he headed the largest UK government Alvey project in speech technology, which resulted among other things in a telephone system giving callers access to Heathrow flight arrival and departure information and containing one of the earliest applications of barge-in. In 1992 he joined John Bridle to form Dragon Systems UK and subsequently Novauris UK.
Hunt has served on the IEEE Speech Technical Committee, on the editorial advisory boards of Voice+ magazine and Speech Communication and on numerous British and Canadian government and European Commission advisory panels concerning speech technology. His current research interests centre on noise-robust acoustic representations and the exploitation of phonetic knowledge in speech recognition, particularly phonotactics and syllable structure. The latter has been successfully applied in Novauris's name-and-address recognition technology, and patent protection is being sought.

Novauris is a UK-based company developing and supplying advanced speech recognition technology, which began operations in March 2002. Although new, it is staffed by an experienced team of researchers. Its Chairman is James Baker, celebrated for introducing hidden Markov modelling to speech recognition, and cofounder of Dragon Systems, Inc, the pioneer and long-time market leader in general-purpose automatic dictation products. Its joint Managing Directors, Melvyn Hunt and John Bridle, previously headed Dragon Systems UK R&D Ltd, which developed in-car speech recognition now fitted in Jaguars and other cars. All the members of the Novauris team are former employees of Dragon Systems.

Presentation: Putting Speech Technologies in your Hands

Dr. Jordan Cohen, Ph.D., is Chief Technology Officer, of Voice Signal Technologies. He guides Voice Signal's technical vision, to investigate alliances with other technical or marketing organizations, to analyze technical solutions relative to market opportunities, and to assist with resource planning, recruiting, and intellectual property protection.

Prior to joining Voice Signal, Cohen was the Director of Business Relations at Dragon Systems in Newton Mass, served on the staff of the Institutes for Defense Analyses in Princeton, NJ, was active in speech recognition research at IBM, and he developed speech and signals algorithms for the Department of Defense.

Dr. Cohen was the founder of a series of speech recognition workshops, which began at the Center for Aids to Industrial Productivity at Rutgers, and is now an annual event at Johns Hopkins University. Jordan serves on the AURORA subcommittee of the European Telecommunications Standards Institute, and is a member of the board of trustees of the International Computer Science Institute in Berkeley, California. He is also a member of the Acoustical Society of America, the IEEE, and the ESCA. Dr. Cohen holds a Masters Degree in Electrical Engineering from the University of Illinois, and a PhD in Linguistics from the University of Connecticut.

Voice Signal Technologies, based in Boston, MA, specializes in developing state-of-the art speech interface solutions for mobile devices in Smartphones as well as traditional handsets. Founded in 1995, the company is comprised of a world-class team of speech scientists, engineers, and User Interface specialists. It has received several honors for its embedded speech recognition technology, which is currently implemented on several mobile devices worldwide. The company continues to lead the market in introducing new Speaker Independent applications and is teaming with the world's leading brand name OEM's to make them available for end users who are seeking increased ease-of-use in their devices.

Presentation: Spoken Document Retrieval

Since 1992, Franciska de Jong is full professor of language technology at the Computer Science Department of the University of Twente. She also works for TNO TPD, Delft.
She has a background in theoretical and computational linguistics. From 1985-1992 she worked as research fellow at Philips Research on the machine translation project Rosetta. She has initiated and/or coordinated a number of projects focussing on cross-language retrieval and content based multimedia indexing, among which the EU projects Twenty-One, Pop-Eye, Olive and ECHO. She was coordinator of the IST-project MUMIS. From 2001-2002 she was a member of the Delos/NSF working group on Spoken Word Archives.

Within the department of Computer Science at the University of Twente, research on natural language processing and human computer interaction machine takes place within the so-called Parlevink group. This group has strong expertise in language and speech processing, multimedia retrieval and multimodal interaction. The group has participated in a large number of externally funded projects. Current EU-funded projects are M4 (IST-5FP), SAFIR and AMI (IST-6FP). CTIT participates in several international networks of excellence: Eunite (EUropean Network on Intelligent TEchnologies for Smart Adaptive Systems), ELSNET (Human Language Technology) and DELOS (Digital Libraries). Parlevink has organized a series of 25 international workshops on natural language processing, the so-called TWLT-workshops.

Presentation: Multilingual and mixed-lingual TTS Applications

Simona Fina, Manager Linguistics at SVOX since March 2002, is responsible for the development of text analysis resources. From 1992 to 2000 she worked as a translator, beginning as a freelancer, then running her own translation company and language school. In 2000 she joined Sail Labs GmbH in Munich, a subsidiary of Lernout&Hauspie, where she worked in the Content Management department. Currently completing her second master's degree, majoring in Theoretical Linguistics at the University of Munich, Germany.


SVOX AG was founded in April 2000 as a spin-off of the Swiss Federal Institute of Technology in Zurich (ETH Zurich). Its core product is text-to-speech software. The multiple award-winning SVOX technology is used in fields such as telecommunications, call centers, multimedia/Internet and embedded systems. SVOX already belongs to an elite group of global technology leaders. SVOX differentiates itself by offering customized text-to-speech. With SVOX´s software architecture customers are offered a text-to-speech engine adaptable to their technical and market needs. SVOX is a Swiss based company, with representation in Germany, Austria and USA. Next to European markets, SVOX is also active in US and Asian markets.

Presentation: How can the ASR Research Community help the ASR Industry to deliver best Products, Applications and Services

Graduate of the Ecole Polytechnique and the Ecole National Supérieure des Télécommunications, and with a PhD in speech and signal processing, Francis Charpentier, from the beginning of his career has targeted speech technologies and speech applications. In 1981, Francis Charpentier started at the CNET (France Télécom R&D) in the field of speech technology and he invented the Psola technique, still widely used for delivering high quality text-to-speech (TTS) synthesis. He afterwards spent five years with Cap Gemini as International Product Manager for speech applications, taking an active part in the SUNDIAL (Speech Understanding and Dialogue) ESPRIT European project involved in man-machine spoken dialogue. He then joined CNET again to manage the Industrial Relations for speech technologies developed at the CNET. In 1998, Francis Charpentier is appointed head of the laboratory Dialogues and interactions of Speech and Sound at the CNET, with fifty research engineers in areas such as speech technologies, intelligent dialogue software, speech coding, sound and audio processing, speech and audio applications development. In September 2000, with a group of colleagues of his laboratory, he confounded TELISMA where he currently serves as Chief Scientific Officer.

Telisma provides scalable carrier-grade ASR solution available to telecommunications carriers who require superior performance, robustness and scalability. Telisma's product line can also be used for smaller scale enterprise solutions, and it is currently available for a dozen of European languages. New languages can be made available in a few months through a rapid and efficient product development process. Telisma also provides its customers with professional services to help in designing applications with a special attention brought to human factors and ASR reliability.

Presentation: Reaching New Horizons with Innovative Speech Technologies

Vincent Fontaine is co-founder and Chief Executive Officer of Babel Technologies (est. 1997). He holds a degree in Electrical Engineering from the Mons Institute of Technology. He started his career in speech technologies as a researcher in automatic speech recognition at this very same Institute in 1992. He was involved in several European research projects such as Esprit Himarnnet (isolated word, speaker independent speech recognition over the phone), Sprach and THISL. In the course of these projects, he addressed various research aspects in speech recognition, such as feature extraction, noise robustness and acoustic modelling. From 1995 to 1997, he was in charge of the management of the Speech Recognition Group at the Mons Institute of Technology. Joining the forces of the R&D results and the knowledge gained by the university teams, he was one of the founding members of the spin-off called “Babel Technologies”. Created to commercialize the speech signal processing activities, he has been leading and managing the company ever since, through the acquisition of the former Telia company called “Infovox” in Sweden to its current and established role as one of the main players in the European Speech Industry.

Babel Technologies S.A. and Babel-Infovox AB form a corporate group, designing, developing and licensing cutting-edge voice processing technologies. The companies are located in Mons, Belgium and Stockholm, Sweden, closely linked to university research centres like Faculté Polytechnique de Mons, Multitel asbl, KTH Stockholm, etc. It comprises an award-winning team of engineers, with comprehensive expertise in all aspects of Speech Processing who have now launched a world-beating range of products in the areas of Automatic Speech Recognition and Text-To-Speech. Babel Technologies rapidly became a major player on the market thanks to its outstanding multi-lingual, multi-platform products. Babel Technologies collaborates with leading companies in the telecommunications, rehabilitation, multimedia, automotive, mobile devices and industry sectors, including Texas Instruments, Comverse, Franklin, Telia, etc.

Printed from: http://www.lang-tech.org/Speakers%20and%20Presentations/voice