Posts Tagged with 'voice-recognition'

Speech-to-Text: Dictation software for Mac OS X

A microphone

Speech-to-text software, sometimes known as dictation software, is something that lets you talk to the computer in some form and have the computer react appropriately to what you are saying. This is totally different to text-to-speech software, which is software can read out text already in the computer.

Command and Control Software

There are two types of speech-to-text software available. One type is called "command and control" and it lets you speak commands to your computer to control it; hence the name. For example, a command that the computer understands might be, "go to the Apple website" or, "tell me the time". Each command is pre-programmed and the computer will only recognise those commands it's been programmed for; you can't use this software to write an email or use iChat for example.

Command and control software for the Mac - known as "Speakable Items" (or sometimes, confusingly, "spoken commands") - is already built into every OS X computer, although most people don't know about it. You don't need to download, buy, or install anything to get this software to work, just a microphone that works with your computer. The main drawback is that the Speakable Items software programmed for English with a standard American accent, and has significant trouble with any other accent. It doesn't function at all with languages other than English.

Some resources for getting you up and running with Speakable Items include:

Dictation Software

The other type of speech-to-text software is usually called "dictation" software. This is the type that lets you write an article like this one, type stuff to your friends in iChat, or type an email. The most common Windows software for speech to text dictation - you've probably heard of it - is Dragon NaturallySpeaking. There is only one dictation-capable speech-to-text software available for OS X which is being updated and developed and it's [msd]. Dictate is the successor to a program named iListen which MacSpeech used to produce.

MacSpeech Dictate iconLike all dictation-capable text-to-speech products, MacSpeech Dictate works very well for some people and very badly for others. Whether it will work for you depends on many things including: how much effort you're willing to put into learning it, how good your microphone is, your age (text to speech usually works less well for children), how much your accent matches what the program expects, and whether your voice changes a lot through the day.

MacSpeech Dictate is also still fairly new software - it was only released on the 15th of February, 2008. In comparison, the premiere speech recognition program for Windows is Dragon NaturallySpeaking which has been in development since the 1980s[1].

When MacSpeech Dictate was originally released it had several major problems which made it unusable for people with disabilities, but most of these have now been resolved:

  • There was no good help functions inside the application - this was rectified in Dictate version 1.3
  • It didn't learn from corrections - this was rectified in Dictate version 1.2
  • Couldn't spell words out by voice - this was rectified in Dictate version 1.2
  • Couldn't request individual key presses (such as command-s or command-option-escape) by voice - this was rectified in Dictate version 1.3
  • Couldn't be taught new words, such as names or jargon specific to your profession - this was largely rectified in Dictate version 1.2, although some words still resist training
  • There was no way to control the mouse by voice - this was finally rectified in Dictate version 2.0.

I tried using the old iListen program a few years ago and could not get results that were useful, an on-screen keyboard was the best solution at the time. Although MacSpeech Dictate is in its early days as a program, its recognition of my particular voice is hugely better than iListen's was. This is not surprising though, as MacSpeech Dictate's speech recognition engine is based on the same engine used by Windows' Dragon NaturallySpeaking - widely recognised as the best consumer speech recognition available.

[msd] requires the requires Intel-based Macintosh hardware and requires Mac OS X 10.5.6 (Leopard) and higher. Thirteen English dialects/accents are supported, and US and UK spelling options. These are:

  • US Spelling
    • American
    • American - Inland Northern
    • American - Southern
    • American - Teens
    • Australian
    • British
    • Indian
    • Latino
    • Southeast Asian
  • UK Spelling
    • Australian

    • British

    • Indian

    • Southeast Asian

Specialised versions - Dictate Medical and Dictate Legal - are available for dictating in these language areas, and Dictate International is now available and recognises speech in French, German, and Italian. MacSpeech have strongly hinted that Spanish language recognition is next on their agenda.

MacSpeech Dictate is a great program for dictation and some computer control, but it is not something that will let you control the computer completely "hands free". For quadriplegic users and others who need full computer control, you will need to supplement Dictate with use of a mouth stick and keyboard, or a program such as SwitchXS for switch access to functions not available by voice. I highly recommend Dictate though, it's part of my suite of accessibility technology and I use it whenever I am able to.

Website: [msd]

- Ricky Buchanan

[msddisclaim]

[msdbanner]

The Ultimate MacSpeech Dictate 1.5 Global Commands List

Icon for MacSpeech Dictate[msd] is a great program but learning so many commands at once can be intimidating. I've put together another document to help you learn and remember all the global commands found in Dictate version 1.5.*.

MacSpeech Dictate has two types of commands - global commands and application specific commands. The global commands work in all programs and the application-specific commands work only in a single application, for example Mail, Safari, or iChat. This document is only concerned with the global commands, which you'll need to know best and are likely to do most often.

These documents aren't in any way meant to replace the Dictate User's Manual - every Dictate user should absolutely read the manual, even if you're not "the manual reading type". Trust me, you'll get far better use of Dictate if you have read the manual! But nobody's memory is perfect, especially for a program with so many commands, so I've made this commands list to help you out.

The first "Global Commands List" I created, for MacSpeech Dictate 1.2.1, was three pages long - this new one contains fourteen full pages of commands! Dictate has really matured and grown in just a few versions. I've been through all of MacSpeech's available documentation and looked at the AppleScript commands within the program to pull this together. There's no hidden "behind the scenes" knowledge included here, but it took many hours and a lot of organisation to get all of these commands together and in one place in a useful format.

Instead of downloading this one directly, I'm asking you to sign up to download it. As soon as you've confirmed your subscription you'll be taken to a page containing the zipped PDF file ready to download:

















Why sign up? I'll occasionally be sending you information about MacSpeech Dictate and the new MacSpeech Scribe, letting you know there's a new blog post on the topic, and telling you about important upgrades. If you aren't interested in the information you can always unsubscribe right away.

Once you've downloaded the list, I suggest you print it out and read through it, highlighting commands that you often forget or ones that you didn't know about but think you might find useful. This way you can find them quickly when you need them.

If you have any trouble signing up to receive the Ultimate MacSpeech Dictate 1.5 Global Commands List, please contact me and I'll happily help you out.

- Ricky Buchanan

[msddisclaim]

[msdbanner]

Nuance Buys MacSpeech: What Now?

Icon for MacSpeech DictateIt's been announced that the MacSpeech company has been purchased by Nuance. Nuance are the company behind the Windows product "Dragon NaturallySpeaking" and other recent Dragon products for iPod Touch and iPhone.

So what does this mean for [msd] and the other MacSpeech products? Nuance are quick to assure us that "nothing will change in the near term" but I think things will change for the better. Nuance is a much bigger company than MacSpeech - the Windows market has always been more than ten times bigger than the Mac OS X market, and the company which is now Nuance has been around much longer than MacSpeech has. I don't know the number of employees that either Nuance or MacSpeech actually has, but I'd be willing to bet that Nuance has a lot more. And this will probably mean good things for MacSpeech's products, as more talent is available to work on them they can progress more quickly.

MacSpeech Dictate still has some major features missing, such as mouse control, but it's growing and maturing quickly. One big problem for people wanting to switch from using Windows with NaturallySpeaking to using OS X with Dictate is that the names of various commands are completely different between the two products. Actually, Dictate's command set has grown in a haphazard way and commands are difficult to memorise because different commands are constructed in different ways. I would think that with the acquisition of MacSpeech, there's the possibility of the Dictate command set becoming more like the NaturallySpeaking command set. This may be confusing for existing customers, but it would be a huge blessing for customers switching from Windows to Mac and, I think, in the long run it would be a good thing.

Nuance obviously thinks the OS X market for voice recognition is growing and viable, and they're willing to spend money to get into it. This might mean that MacSpeech products eventually cost the same as the equivalent NaturallySpeaking products - at the moment the Mac versions cost significantly more, despite having fewer features. Reducing the cost would not have been possible for MacSpeech alone, as they needed cash flow, but Nuance are in a stronger financial position and they're bigger so they can probably cope with a bumpier cash flow. Cheaper assistive technology is good for everybody, so I hope this one comes true!

I've got a MacSpeech Dictate-related free download coming up soon too - so regular readers stay tuned!

What else do you think will change, or hope will change, with Nuance's buyout of MacSpeech?

- Ricky Buchanan

[msdbanner]

[msddisclaim]

New MacSpeech Scribe For Transcription

Icon for MacSpeech ScribeOne of the major things that the MacSpeech Dictate family has been lacking is the ability to take pre-recorded files and convert them to text. Not any more: MacSpeech Scribe will do just that for you, with up to 99% accuracy.

MacSpeech Scribe will accept any file in one of these formats:

  • .wav
  • .aif or .aiff
  • .m4v, .mp4, or .m4a

Audio file quality will affect the quality of your recognition, of course, so using a certified recording device is recommended, but not required - anything that will produce the correct file format will work. At the moment, the iPhone, iPod Touch, and several Olympus digital voice recorders are the only devices certified but I would expect that MacSpeech expands this range fairly quickly.

Recording a sound file to run through Scribe is pretty much like using MacSpeech Dictate itself, but without the ability to correct and train phrases as you go. If you want your transcribed document to include punctuation, you need to speak the punctuation signs into the recording, and you need to train MacSpeech Scribe to the voice of the person who recorded the audio file before it can transcribe.

So what are the limitations? Bear in mind that I have not had access to MacSpeech Scribe myself, but these are the limits that have been described by MacSpeech or can be inferred from the behaviour of other products in the MacSpeech family:

Photo of an iPhone in somebody's hand

MacSpeech Scribe lets you record sound on your iPhone, iPod Touch, or other recording device, then transcribes it when you're back at your computer.

  • You can only have one speaker per file, so MacSpeech Scribe will not be helpful for transcribing a meeting or class or any other situation where there is more than one speaker.
  • The program must be trained to the voice in the recording, so it's also unlikely to be useful for transcribing a speech or lecture unless the speaker is willing to spend some time with you creating a profile for MacSpeech Scribe.
  • Because of the need for punctuation to be spoken aloud, I am not sure if the accuracy would be adequate in a situation where punctuation was not spoken - from Scribe's perspective the text produced would be one really long paragraph.
  • We know from other MacSpeech products that the distance from mouth to microphone is very important for recognition, so I would think any speaker who is moving around would significantly degrade accuracy. If you need to record a speaker like this for MacSpeech Scribe's use I would suggest investing in a lapel microphone for your recorder.
  • Background noise or any other non-speech noise in your recording will also degrade accuracy. Get a directional microphone for your voice recorder so it only picks up your own voice, or dictate in a quiet place.
  • Changes in voice quality from emotion or emphasis also degrade recognition. [msd], in my experience, does best with a very steady tone of voice - not a monotone but no getting excited or sad or speaking too fast or too slowly - so I would expect that MacSpeech Scribe is similar in this respect.

MacSpeech quotes that:

MacSpeech Scribe lets you easily add new words and acronyms, edit and navigate transcribed documents, and so much more. MacSpeech Scribe makes it easy to work with your transcribed document so you can create the perfect document for your needs.

which leaves me unsure if its editing abilities are the same as other MacSpeech products and, if they are, does it let you verbally add a word or phrase that was missed by the dictation engine? If so, what does Scribe not have that Dictate has? I'll have to get hold of it to clarify that one for you!

MacSpeech Scribe is available immediately, in English only, for all the dialects of English usually recognised by MacSpeech products. There is a special price of US$99 for currently registered MacSpeech Dictate 1.5 customers, the regularly suggested retail priced is US$149.

- Ricky Buchanan

[msdbanner]

[msddisclaim]

Photo credit to Twon.

Dictation For Your iPhone/iPod Touch

Icon for Dragon DictateBack in December Nuance, makers of the award-winning Dragon NaturallySpeaking, surprised everybody by releasing two apps for the iPhone - Dragon Dictation and Dragon Search. The former allows you to dictate text into an iPhone much like Dragon NaturallySpeaking for the PC and [msd] for the Mac. The latter allows you to do a variety of Internet searches using filters such as YouTube, Google, and Wikipedia using your voice. Both apps were received extremely well and were instantly considered must-haves for any iPhone user, especially considering they were both free for a limited time.

Unfortunately at the time of release they were not compatible with the iPod Touch and Nuance provided no real explanation for why this was so. Both apps require an Internet connection to function but since an iPod Touch can access the Internet via WiFi it was a mystery why they weren't compatible with the iPod Touch. Nuance received a lot of feedback about this and thankfully they responded rather quickly as both apps are now compatible with the iPod Touch and I couldn't be happier! Before the iPod Touch update was released I did get to try both of the apps on my brother's iPhone over the Christmas break and I was very impressed. With iPod Touch compatibility now added I've been able to extensively test these two apps so I thought I'd share my experiences with you.

Initiating dictation with both apps is done by simply tapping a large button in the center of the screen. Until the latest update you had to hit a "Done" button when you were finished dictating. But the new update adds a really cool feature where both apps automatically detect when you're done speaking and start processing your input without having to press the "Done" button at all. You can turn this feature off and on in the settings for both apps but I really don't see why anybody would want it off because it works so well.

Input screen for Dragon Dictate

Input screen for Dragon Dictate

Dragon Dictation is not going to replace Dragon NaturallySpeaking or MacSpeech Dictate any time soon but it does work astonishingly well. The way that Nuance achieved this is that all inputs are processed on their servers rather than the device itself, hence the need for an Internet connection. Somehow it only takes a few seconds for your inputs to be processed. All things considered the accuracy is pretty good but there are some mistakes here and there on occasion. However when you're done dictating you can bring up the touchscreen keyboard and make edits where necessary. You can also tap on words to bring up a contextual menu of other words to choose from that might fit better. I usually only bother making corrections if the translation is really off or if I'm writing something really important.

Once you're ready to send the text that you just dictated you simply tap on the little "Send" button in the bottom right-hand corner of the screen at which point you're presented with some options. On an iPhone you can "send to email", "send it as text", or "copy to clipboard". On an iPod Touch you can do the same except for the texting part. Sending any dictations to email or for texting will open up the appropriate apps with the text inserted into the correct location. Then you only need to select a contact. The "copy to clipboard" button will allow you to theoretically paste what you just dictated into any other app that allows copy and paste, like Facebook and Twitter for example. There is a limit to the amount that you can dictate at once but you can keep stacking dictations on top of each other to create long emails or whatever. Basically a new dictation will pick up right from where the last dictation left off until you clear the screen.

Options after you're done dictating

Options after you're done dictating

When using Dragon Search you simply speak whatever your search query is and a few seconds later the Google search results will appear on your screen. You can cycle through the different filters by scrolling through them at the top of the screen. So for example, let's say I say "the Rolling Stones". A few seconds later I'd see the Google search results for that query. If I then changed the filter to YouTube the screen would then present me with a list of YouTube videos that match that query. I could then tap on a video to play it. You can also open links and view them right within the Dragon Search app itself. You could also copy the current link to the clipboard or send it to the mobile Safari app. Once in the mobile Safari app you can then bookmark it, send the link to somebody, and whatever else you can do within the app. Since I usually prefer to view web pages on my big computer screen I'll often send links to myself from my iPod Touch, unless I'm already in front of my computer of course in which case I'll just be using Safari there.

Dragon Search

Dragon Search

These apps have made a huge difference for me in a couple of ways. For one, if I need to send a quick email, update my Facebook status, or post something on Twitter I can now easily and quickly do this whether I'm in front of my computer or not. I'm also no longer confined to my computer room for anything involving dictation. In fact, the first half of this article was done from my bedroom with Dragon Dictation! Once again it's not quite as accurate as MacSpeech Dictate but it's definitely acceptable. As soon as I was up in front of my computer I simply opened the dictations that I emailed to myself and edited them for accuracy using Keystrokes. I rarely used the Google mobile app because typing on my iPod Touch is a real pain for me but that's no longer a problem thanks to Dragon Search. Now if I'm not in front of my computer doing an Internet search is only two taps away (open the app and tap to begin dictating my query). It's just so incredibly simple and useful!

Now there are a couple caveats here. For one, you have to obviously be able to use an iPod Touch (or an iPhone), at least in a limited fashion, in order to use these apps. But since there is no pinching or any other complicated finger gestures required they are pretty easy to use. So if you currently can't use one of these devices at all these two apps won't change that. But if you can use these devices but have trouble with anything involving typing then you're in for a big surprise. Suddenly your iPod Touch or iPhone will become much more useful than they already are!

If you're going to use these apps with an iPod Touch your going to need to get an external microphone because the iPod Touch (2nd & 3rd generation) doesn't have a microphone built in. I highly recommend the TouchMic Handsfree Lapel Microphone & Adapter. It's inexpensive, works really well, and it's small enough that you can mount it just about anywhere without it getting in the way. It uses the headphone jack on your iPod Touch but it has a headphone jack built right into it so you can still use headphones, earphones, or whatever simultaneously with it.

The final caveat is the Internet connection requirement. This won't be an issue with an iPhone but if you have an iPod Touch your usage of these two apps will be limited to wherever you are within range of a usable WiFi hotspot. In my case I have a WiFi network in my home and I'm there most of the time so it's not that big of an issue. However if you have an iPod Touch and aren't in range of WiFi hotspot most of the time you might get frustrated - perhaps frustrated enough to get an iPhone. :-) I have to admit that these two apps are so incredibly useful I'm strongly considering getting an iPhone myself so I can use them anywhere. I'm just not that thrilled with paying for a data plan because I'm not sure the amount of time I'd want to do something Internet-related away from my home would justify the cost of a data plan. But this certainly has me thinking about it.

I'm not certain these two apps are intended for assistive technology users because anybody can use them. But nevertheless they are about the biggest assistive technology upgrades to the iPhone and iPod Touch that I've seen to date. As of this writing they are still free so check them out!

- Paul Natsch

[msdbanner]

[disclaim]