|
Watch Google And Speech Technology
By Barry Welford
Expert Author
Article Date: 2007-12-20
Phonemes wanted - talk to Google...If you want confirmation that speech technology is the next big technical and economic opportunity, then keep an eye on Google.
This year they encouraged the formation of the Open Handset Alliance. This undermines the walled gardens created by the existing telecom companies. The picture now is very much a more level and competitive playing field.
It is interesting to see how Google is now developing its own stake in what will be a highly profitable marketplace. Marissa Mayer, Google's vice president of Search Products & User Experience, in an interview (Google wants your phonemes) revealed one part of the effort. You may have heard about our [directory assistance] 1-800-GOOG-411 service. The reason we really did it is because we need to build a great speech-to-text model.
The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. … So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up, we can (understand) with high accuracy. This approach is adopted because Google Is All About Large Amounts of Data. Peter Norvig, director of research at Google, believes the following: The way to get better understanding of text is through statistics rather than through handcrafted grammars and lexicons. The statistical approach is cheaper, faster, more robust, easier to internationalize, and so far more effective.
We wanted speech technology that could serve as an interface for phones and also index audio text. After looking at the existing technology, we decided to build our own. We thought that, having the data and computational resources that we do, we could help advance the field. Currently, we are up to state-of-the-art with what we built on our own, and we have the computational infrastructure to improve further. As we get more data from more interaction with users and from uploaded videos, our systems will improve because the data trains the algorithms over time. Google is certainly in a privileged position to gain access to large amounts of data that can be used to improve other services. However it seems somewhat paradoxical to be using number crunching to better understand language and speech.
Others take a different view. For example, Powerset is building a consumer search engine based on breakthrough natural language processing technology licensed from PARC and developed internally. The search engine aims to leverage the structure and nuances of natural language to ultimately transform the way humans interact with computers.
It will be interesting to see which approach wins out.
Related: Can You Hear The Future?
Comments
About the Author:
Barry Welford, President of SMM Strategic Marketing Montreal works with business owners and senior management on Internet Marketing strategy and action plans to grow their companies. He is a moderator at the Cre8asite Forums and writes on current issues on the Internet and on the Mobile Web in three blogs, BPWrap, StayGoLinks and The Other Bloke's Blog.
|
|