Adding a Dash of SALT to Logistics

Speech technology developers cook up a new recipe for web applications.

A delivery driver can’t find his next stop. Pulling off the road, he reaches for a handheld device that is both a computer and wireless phone. He speaks the customer’s address, and a map appears on screen with the destination highlighted in red. Realizing he’ll probably arrive late, the driver commands, ‘Call,’ and the phone dials the customer’s number.

That’s the vision members of the SALT Forum paint when they talk about adding speech to the web. SALT (Speech Application Language Tags) is a programming language that lets users interact with web-based applications by talking and listening. It’s meant to create “multimodal applications,” adding a new option to the methods we commonly use for computer input and output—keyboards, mouse buttons, touch screens, graphics, and text displays.

One day, proponents say, you won’t struggle with tiny buttons to get stock quotes on your digital phone, or risk a fender bender while you search for the nearest gas station on your onboard computer: you’ll just ask.

The SALT Forum was founded in October 2001 by six companies—Intel, Microsoft, Philips, Cisco, Speechworks, and Comverse. Together with 50 other contributing firms, the group has developed a markup language for speech. SALT is related to HTML, XML, and other coding systems that define functions on web sites. The SALT Forum has submitted its specifications to the World Wide Web Consortium (W3C) to have them established as a standard.

Input, Output, Control

In developing SALT, forum members focused on three things: speech input, speech output, and control functions such as placing calls and hanging up, says Rob Kassel, product manager for emerging technologies at Boston-based Speechworks. Good standards already exist for non-speech functions on the web, so there was no need to re-invent them, he says.

SALT can also supplement other programming languages and be used in non-web applications.

The SALT Forum came together, Kassel says, because portable computing and communications technologies are merging, and the hardware born of this marriage requires a new kind of interface. A computer small and light enough to double as a cell phone won’t sport a conventional keyboard or screen. Speech, he says, offers a good alternative.

“When you want to call a phone number, you say, ‘Call phone number,’ and then say the number,” Kassel explains. “You could build a wrist phone that way,” Dick Tracy-style. “Or in more sophisticated applications, you could just say what you wanted to do, then you might get the response either by recordings or synthetic speech, or with visual displays.”

Open the Garage Door, Hal

“I’ve seen prototypes for cell phones from Philips that have no touch buttons,” says Tom Falk, director of marketing at Philips Speech Processing, Dallas. “You either touch the screen or speak into the phone.” Such phones, along with personal digital assistants and even appliances in the home, might one day take advantage of SALT to respond to voice commands.

But don’t look for a SALT-enabled microwave at the mall this weekend. “You won’t see any applications until next year,” Kassel says. And the first ones will probably be developed for telephone services, rather than for web access from portable devices.

What will these applications do? They could work like a system Speechworks built for Union Pacific Railroad with an earlier speech-based technology. Shippers dialing an automated attendant use this when they need the railroad to retrieve empty freight cars from their facilities. The user speaks into the phone, providing details such as the account number and pickup time. Analyzing the voice print, the system identifies the caller to make sure that person is authorized to release the car.

“They don’t have to type using the telephone keypad,” Kassel says. “They just say it. Then Union Pacific picks up the rail car as authorized.” Because it offers a new way to negotiate the web, SALT could provide access to helpful information in venues where browsing wasn’t practical in the past.

Besides asking for directions, for instance, a truck driver in transit might use voice commands to get weather and traffic updates, Falk says.

Adding SALT to web-based management systems could also provide easier access to supply chain data, says Kassel. “Whether it’s over the phone, or in a true multimodal application, SALT provides an efficient way for companies to manage multiple sites, or tap into inventory databases.”

Also, Falk says, voice input might speed certain tasks. A courier who has just picked up a package could speak the transaction details into a portable device while walking back to the truck, rather than stopping to enter them by hand.

Some logistics managers have already figured that out. Several vendors offer speech-enabled terminals for use in warehouses, distribution centers, and sortation facilities. These wearable devices use speech output to direct users in their tasks. Rather than pressing keys or scanning bar codes, the user speaks into a headset microphone to provide data such as order numbers and quantities picked.

For example, Petco Animal Supplies recently implemented the Voice Logistics System from Voxware in its DC in Mira Loma, Calif.

Similarly, Corporate Express, a major distributor of business machines and supplies, is integrating the Talkman system from Vocollect with an automated order fulfillment system in 22 of its DCs.

Jim Logan, vice president of strategic initiatives at Vocollect in Pittsburgh, has been watching the development of SALT, just as he follows the evolution of many standards. SALT is well designed, but it’s merely “bringing the same low-level voice capabilities that have been available on PCs for a decade, and phones for years, to web applications,” he says.

SALT Understands Everyone

SALT addresses a different need than the technology in today’s voice-enabled logistics systems, says Ken Finkel, vice president of sales and business development at Voxware in Princeton, N.J. Because the speech tags were created for web sites and automated telephone systems, which serve large numbers of users, SALT applications must be speaker-independent, he says. That is, they must be able to understand nearly anyone.

“Speaker-independent solutions work well when there’s not too much noise or variation from the norm in terms of an individual’s speech patterns,” Finkel says.

But Voxware developed its voice-enabled systems with different conditions in mind. The voice recognition technology is tailored to the individual user. Users “train” their portable units to understand their speech so the system can interpret any language, dialect, or accent, and it can understand speech in noisy industrial environments.

Beyond the DC, logistics is a largely untried market for speech technology. But it holds a great deal of potential, Kassel says. “Because of mobile workers with hands-busy, eyes-busy tasks, logistics is a natural place for speech technology to be deployed.”