For sheer availability in executing trades, it's hard to beat the telephone. But using Touch-Tone keypad codes is a drag for the traders, and the number of customer service reps needed to accept trade orders in a call center is way too expensive for the companies.
Voice recognition to the rescue! Though only a promised technology for many years, voice recognition has at long last become sophisticated enough, at a reasonable enough price, to be usable by online trading firms.
Interactive Voice Response (IVR) systems have been in use for a while, mostly just recognizing the sounds produced by a Touch-Tone phone. But Charles Schwab, the first discount broker to introduce a voice system in 1996, ultimately found that people would simply refuse to use the keypad to translate letters (for example, a "D" would be button number 3). Instead they'd just hit the 0, to speak with a person. The cost of that kind of call can be up to $15 for the brokerage house. In contrast, Nuance Communications, a speech recognition firm from Menlo Park, Calif., contends that the cost of an equivalent call using its new voice technology is less than $1.
In order for voice recognition to get to a really usable stage, it had to be able to deal with "natural speech," instead of just processing simple responses to questions. Now both Applied Language Technologies (ALTech), out of Boston, and Nuance offer this advanced speech interactive capability. In fact, E*Trade has been using the full-blown capabilities from ALTech for about a year, and has proven that the system is able to process sentences, in varying structures. For instance, an investor can say, "Buy 500 shares of IBM at the market," or "Buy at the market IBM, 500 shares," and the system will understand.
Another important feature for viable voice technology is the ability to support vocal interruptions by users. It is no longer necessary to wait until the computer voice gets all the way through its spiel and says, "push one now." Explains Stuart Patterson, president and chief executive officer of ALTech, "Interrupting the lady the computer is one of the key features to the system. Once you know the system, you can cut in and skip right ahead to what you want to do. The ability to interrupt is fundamental to the friendliness of this system."
But investors are not always familiar enough with the technology to know they can interrupt. Indeed, it is not uncommon for people to apologize or say, "No, that was my fault," to the computer. Even though experienced users will use the short cuts, research done by ALTech and universities indicates that people are still most comfortable with directed dialogue. So computer voice-prompts are still useful. Nevertheless, the systems generally instruct at the end how the user can skip directly to the desired function next time.
And in actual practice, voice recognition technology is still only able to handle clear statements in specific contexts. Explains Dave McGraw, chief executive officer of digiTRADE, a software developer and service bureau for financial services companies, "You can't just start rambling, you know," which might be the response when a first-time user is confronted with a computer asking, "What do you want?" To handle this, digiTRADE is first rolling out its ALTech technology just for quotes and balances, not trading. "Quotes are 95% of any IVR system today," says McGraw, so the majority of functionality is thus covered. "Also there are far fewer ways to say 'I'd like to get a quote,' than to explain what and how you'd like to buy something."
The practical aspects of implementing voice technology are not that complicated. The hardware is the same IVR voice processing platforms in current use for Touch-Tone services. Tools from ALTech and Nuance consist of libraries or vocabularies of words, and user-interface modules. ALTech's trading-specific vocabulary includes at least 30,000 words, and all the different ways to recognize a stock. For example, International Business Machines could be specified with all three words, or IBM, or just B, or Big Blue.
Nor do brokerage firms have to reinvent the wheel. ALTech has dialogue modules, which are objects that package a user interface and a speech recognition capability, such as recognizing numbers, dates, times, zip codes, alpha numerics or continuous digits (as in account numbers). Building blocks smooth the development curve for a brokerage firm, so "they don't have to become speech recognition specialists," says ALTech's Patterson.
For connections to the databases and stock quotes, Nuance and ALTech offer professional services. "But that connectivity is the same as for many other systems," says Patterson. "For United Airlines, the speech system and the Internet system talk to the same server." In fact, the ubiquity of Internet connections eases the process, since companies have generally already connected their legacy databases to the Internet. Nuance's system runs on NT and Unix platforms. Its engine is written in C, and it runs in a client/server architecture.
Because a lot of the hardware needed to run voice recognition tends to already be in use at firms, deploying it can be quick. Nuance set up systems for UPS and Sears in three months. A natural language system can take longer, at six months. But that amount of time can be more about integrating the telephones, the hardware and the T1 lines, says Steve Ehrlich, vice president of marketing for Nuance. "At the end of the day, it is just writing an application like any other. The specialist speech portion is a very small part."
What's behind this nifty technology? Huge statistical models, says ALTech's Patterson, that crunch an enormous amount of data. The phonetic recognition engine captures a voice wave form, then segments it into phonemes (the smallest speech units that distinguish one word from another, i.e. "m" in mat and "b" in bat). Those phonemes are then compared with all the models the engine has for what its form looks like. "Then it comes back and says 'I'm 90% confident that it is this phoneme,'" says Patterson. There are a few more steps to combining the phonemes into a word, and to get a confidence level for that word. The computer ends up with maybe a 90% confidence level that the waveform was "Intel," and an 80% confidence level that it was "Pintel," for example. All this in a matter of nanoseconds. Parameters are defined to stipulate that if 100% confident then accept, if 80% to 90% confident then confirm, if below 80% then ask the person to speak again.
Nuance's Ehrlich explains that it is not artificial intelligence, and Michael Welton, manager of international and IVR products with E*Trade, concurs, since the engine does not learn as it goes along. Rather, the information goes into certain slots in a template. In the context of ordering a trade, there are certain required pieces of information. And if someone does not state, say, the number of shares to be bought, then the system will know that slot is empty and will ask for that amount.
One of the biggest benefits of such robust trading technology is the ability to handle the increasingly common spikes in trading volume. "People who want speed (as in a volatile day) will go to the IVR system," says E*Trade's Welton, since dialing an 800 number takes a matter of seconds, less than logging onto the computer, going onto the Web and getting to the site. E*Trade's telephone system can handle over 1,000 calls simultaneously. Even the busiest times have not pushed above 500 calls at once. After the voice recognition system takes the order, if it fits into a parameter - it is a market order, the customer has the shares in his account - the system will automatically route it to the market.
Of course, the people who don't have a computer or access to the Internet, or who don't like to struggle with translating letters to Touch-Tone keypad codes or only have rotary dial service really like the voice recognition technology.
A drawback, however, is wanting to trade while people can hear what you are saying. E*Trade finds that usage of its voice system falls off between 8:00 a.m. and 4:30 p.m. because, people explain, they are sitting in cubicles without privacy. To compensate, sometimes they use a combination of Touch-Tone codes and voice, says Welton, or speak very softly.
What about institutional trading? "It will apply where people are calling each other with repetitive, periodic requests," says ALTech's Patterson. The process will probably follow the pattern of Internet adoption. Institutions will realize it is possible to automate, but will question whether the technology is trustworthy and efficient. "As we convince people it is, I think it will become relied upon," he says. "I personally trust the system to do a wire transfer," since nothing happens until the order is confirmed a couple of times and a PIN is supplied.
Currently, ALTech covers the various flavors of North American English and North American Spanish. Soon to be released is German, U.K. and Australian English, Mandarin, Cantonese and French. Nuance also covers the North American accents and foreign-language English speakers, and has German, Japanese, Swedish and Canadian French on the way.
Dave McGraw at digiTRADE is optimistic about voice recognition. "This is the fourth front-end we've put on our system," he says. "And we think it may be the most successful we've ever done." Nevertheless, he says, from his daily meetings with brokerage companies, he can tell that "it hasn't sunken in to most of them yet that this is the next wave of applications."