Voice Biometrics
To verify someone's identity, experts recommend a three-fold test: "something you know, something you have, and something you are." Most computer applications, particularly those that work over the Internet, can only use "something you know" — a password.
In the November, 2002 issue of Dr. Dobb's Journal (Nov., 2002 — requires registration), I show how to use VoiceXML to provide authentication through "biometrics" —the measurement of "something you are."
The demonstration combines VoiceXML with standard web-based services. In conjunction with a telephone call to your cell phone — "something you have" — and a request for a password — "something you know "— voice biometrics can complete the security triad.
Download the source code (report broken links) which shows how to use VoiceXML for voice biometrics. This package uses publicly-accessible VoiceXML servers, publicly-accessible telephony servers to control the telephone network from the Internet, and publicly-accessible voice biometrics — you do not need any special hardware or software. At present, the package does not include documentation; see the Dr. Dobb's article.
Voice2IM: Speech Technologies and Instant Messaging
Voice2IM is a package that ties VoiceXML with Instant Messaging (IM) to produce a multimodal user interface, a user interface that lets the user choose different modes. The input modes are speech and text, and the output modes are voice and text.
Example: In this demonstration package, the user is a business traveler who calls a call center to change travel plans; the user also has a wireless device connected to the Internet (e.g., a Palm/cellphone combination). The automated call center sends voice and text to the user, and the user can either speak or write his choice — an integrated, multimodal experience.
Multimodal interfaces are particularly useful for call centers. Agents (whether human or automated) are spared the task of reading long lists of information to callers, the user experience improves as the user receives complicated information in written form, transactions are more accurate, and the overall cost of the transaction drops.
This package uses publicly-accessible VoiceXML servers, publicly-accessible telephony servers to control the telephone network from the Internet, and publicly-accessible Instant Messaging servers — you do not need your own servers (i.e., you do not need specialized hardware or software).
Dr. Dobb's Journal published my article about this technology in the January, 2004 edition. This explains a bit more about multimodal systems and provides some brief hints about successful speech user interfaces.
Download Voice2IM (report broken links).
For the pedantic:
- The package includes VoiceXML scripts to place on the server, and Python-based CGI scripts
- The next release of this package — which is already running on this machine in the demo — includes some rudimentary security features, better Python scripts (better programming techniques), and other improvements. I will release it if there's demand.
- The package is largely self-explanatory if you understand VoiceXML, Python, CGI, and instant messaging. Otherwise, read the article in Dr. Dobb's; or contact me for more information.
