Posts Tagged with 'dictation'

Speech-to-Text: Dictation software for Mac OS X

A microphone

Speech-to-text software, sometimes known as dictation software, is something that lets you talk to the computer in some form and have the computer react appropriately to what you are saying. This is totally different to text-to-speech software, which is software can read out text already in the computer.

Command and Control Software

There are two types of speech-to-text software available. One type is called "command and control" and it lets you speak commands to your computer to control it; hence the name. For example, a command that the computer understands might be, "go to the Apple website" or, "tell me the time". Each command is pre-programmed and the computer will only recognise those commands it's been programmed for; you can't use this software to write an email or use iChat for example.

Command and control software for the Mac - known as "Speakable Items" (or sometimes, confusingly, "spoken commands") - is already built into every OS X computer, although most people don't know about it. You don't need to download, buy, or install anything to get this software to work, just a microphone that works with your computer. The main drawback is that the Speakable Items software programmed for English with a standard American accent, and has significant trouble with any other accent. It doesn't function at all with languages other than English.

Some resources for getting you up and running with Speakable Items include:

Dictation Software

The other type of speech-to-text software is usually called "dictation" software. This is the type that lets you write an article like this one, type stuff to your friends in iChat, or type an email. The most common Windows software for speech to text dictation - you've probably heard of it - is Dragon NaturallySpeaking. There is only one dictation-capable speech-to-text software available for OS X which is being updated and developed and it's [msd]. Dictate is the successor to a program named iListen which MacSpeech used to produce.

MacSpeech Dictate iconLike all dictation-capable text-to-speech products, MacSpeech Dictate works very well for some people and very badly for others. Whether it will work for you depends on many things including: how much effort you're willing to put into learning it, how good your microphone is, your age (text to speech usually works less well for children), how much your accent matches what the program expects, and whether your voice changes a lot through the day.

MacSpeech Dictate is also still fairly new software - it was only released on the 15th of February, 2008. In comparison, the premiere speech recognition program for Windows is Dragon NaturallySpeaking which has been in development since the 1980s[1].

When MacSpeech Dictate was originally released it had several major problems which made it unusable for people with disabilities, but most of these have now been resolved:

  • There was no good help functions inside the application - this was rectified in Dictate version 1.3
  • It didn't learn from corrections - this was rectified in Dictate version 1.2
  • Couldn't spell words out by voice - this was rectified in Dictate version 1.2
  • Couldn't request individual key presses (such as command-s or command-option-escape) by voice - this was rectified in Dictate version 1.3
  • Couldn't be taught new words, such as names or jargon specific to your profession - this was largely rectified in Dictate version 1.2, although some words still resist training
  • There was no way to control the mouse by voice - this was finally rectified in Dictate version 2.0.

I tried using the old iListen program a few years ago and could not get results that were useful, an on-screen keyboard was the best solution at the time. Although MacSpeech Dictate is in its early days as a program, its recognition of my particular voice is hugely better than iListen's was. This is not surprising though, as MacSpeech Dictate's speech recognition engine is based on the same engine used by Windows' Dragon NaturallySpeaking - widely recognised as the best consumer speech recognition available.

[msd] requires the requires Intel-based Macintosh hardware and requires Mac OS X 10.5.6 (Leopard) and higher. Thirteen English dialects/accents are supported, and US and UK spelling options. These are:

  • US Spelling
    • American
    • American - Inland Northern
    • American - Southern
    • American - Teens
    • Australian
    • British
    • Indian
    • Latino
    • Southeast Asian
  • UK Spelling
    • Australian

    • British

    • Indian

    • Southeast Asian

Specialised versions - Dictate Medical and Dictate Legal - are available for dictating in these language areas, and Dictate International is now available and recognises speech in French, German, and Italian. MacSpeech have strongly hinted that Spanish language recognition is next on their agenda.

MacSpeech Dictate is a great program for dictation and some computer control, but it is not something that will let you control the computer completely "hands free". For quadriplegic users and others who need full computer control, you will need to supplement Dictate with use of a mouth stick and keyboard, or a program such as SwitchXS for switch access to functions not available by voice. I highly recommend Dictate though, it's part of my suite of accessibility technology and I use it whenever I am able to.

Website: [msd]

- Ricky Buchanan

[msddisclaim]

[msdbanner]

Dictation For Your iPhone/iPod Touch

Icon for Dragon DictateBack in December Nuance, makers of the award-winning Dragon NaturallySpeaking, surprised everybody by releasing two apps for the iPhone - Dragon Dictation and Dragon Search. The former allows you to dictate text into an iPhone much like Dragon NaturallySpeaking for the PC and [msd] for the Mac. The latter allows you to do a variety of Internet searches using filters such as YouTube, Google, and Wikipedia using your voice. Both apps were received extremely well and were instantly considered must-haves for any iPhone user, especially considering they were both free for a limited time.

Unfortunately at the time of release they were not compatible with the iPod Touch and Nuance provided no real explanation for why this was so. Both apps require an Internet connection to function but since an iPod Touch can access the Internet via WiFi it was a mystery why they weren't compatible with the iPod Touch. Nuance received a lot of feedback about this and thankfully they responded rather quickly as both apps are now compatible with the iPod Touch and I couldn't be happier! Before the iPod Touch update was released I did get to try both of the apps on my brother's iPhone over the Christmas break and I was very impressed. With iPod Touch compatibility now added I've been able to extensively test these two apps so I thought I'd share my experiences with you.

Initiating dictation with both apps is done by simply tapping a large button in the center of the screen. Until the latest update you had to hit a "Done" button when you were finished dictating. But the new update adds a really cool feature where both apps automatically detect when you're done speaking and start processing your input without having to press the "Done" button at all. You can turn this feature off and on in the settings for both apps but I really don't see why anybody would want it off because it works so well.

Input screen for Dragon Dictate

Input screen for Dragon Dictate

Dragon Dictation is not going to replace Dragon NaturallySpeaking or MacSpeech Dictate any time soon but it does work astonishingly well. The way that Nuance achieved this is that all inputs are processed on their servers rather than the device itself, hence the need for an Internet connection. Somehow it only takes a few seconds for your inputs to be processed. All things considered the accuracy is pretty good but there are some mistakes here and there on occasion. However when you're done dictating you can bring up the touchscreen keyboard and make edits where necessary. You can also tap on words to bring up a contextual menu of other words to choose from that might fit better. I usually only bother making corrections if the translation is really off or if I'm writing something really important.

Once you're ready to send the text that you just dictated you simply tap on the little "Send" button in the bottom right-hand corner of the screen at which point you're presented with some options. On an iPhone you can "send to email", "send it as text", or "copy to clipboard". On an iPod Touch you can do the same except for the texting part. Sending any dictations to email or for texting will open up the appropriate apps with the text inserted into the correct location. Then you only need to select a contact. The "copy to clipboard" button will allow you to theoretically paste what you just dictated into any other app that allows copy and paste, like Facebook and Twitter for example. There is a limit to the amount that you can dictate at once but you can keep stacking dictations on top of each other to create long emails or whatever. Basically a new dictation will pick up right from where the last dictation left off until you clear the screen.

Options after you're done dictating

Options after you're done dictating

When using Dragon Search you simply speak whatever your search query is and a few seconds later the Google search results will appear on your screen. You can cycle through the different filters by scrolling through them at the top of the screen. So for example, let's say I say "the Rolling Stones". A few seconds later I'd see the Google search results for that query. If I then changed the filter to YouTube the screen would then present me with a list of YouTube videos that match that query. I could then tap on a video to play it. You can also open links and view them right within the Dragon Search app itself. You could also copy the current link to the clipboard or send it to the mobile Safari app. Once in the mobile Safari app you can then bookmark it, send the link to somebody, and whatever else you can do within the app. Since I usually prefer to view web pages on my big computer screen I'll often send links to myself from my iPod Touch, unless I'm already in front of my computer of course in which case I'll just be using Safari there.

Dragon Search

Dragon Search

These apps have made a huge difference for me in a couple of ways. For one, if I need to send a quick email, update my Facebook status, or post something on Twitter I can now easily and quickly do this whether I'm in front of my computer or not. I'm also no longer confined to my computer room for anything involving dictation. In fact, the first half of this article was done from my bedroom with Dragon Dictation! Once again it's not quite as accurate as MacSpeech Dictate but it's definitely acceptable. As soon as I was up in front of my computer I simply opened the dictations that I emailed to myself and edited them for accuracy using Keystrokes. I rarely used the Google mobile app because typing on my iPod Touch is a real pain for me but that's no longer a problem thanks to Dragon Search. Now if I'm not in front of my computer doing an Internet search is only two taps away (open the app and tap to begin dictating my query). It's just so incredibly simple and useful!

Now there are a couple caveats here. For one, you have to obviously be able to use an iPod Touch (or an iPhone), at least in a limited fashion, in order to use these apps. But since there is no pinching or any other complicated finger gestures required they are pretty easy to use. So if you currently can't use one of these devices at all these two apps won't change that. But if you can use these devices but have trouble with anything involving typing then you're in for a big surprise. Suddenly your iPod Touch or iPhone will become much more useful than they already are!

If you're going to use these apps with an iPod Touch your going to need to get an external microphone because the iPod Touch (2nd & 3rd generation) doesn't have a microphone built in. I highly recommend the TouchMic Handsfree Lapel Microphone & Adapter. It's inexpensive, works really well, and it's small enough that you can mount it just about anywhere without it getting in the way. It uses the headphone jack on your iPod Touch but it has a headphone jack built right into it so you can still use headphones, earphones, or whatever simultaneously with it.

The final caveat is the Internet connection requirement. This won't be an issue with an iPhone but if you have an iPod Touch your usage of these two apps will be limited to wherever you are within range of a usable WiFi hotspot. In my case I have a WiFi network in my home and I'm there most of the time so it's not that big of an issue. However if you have an iPod Touch and aren't in range of WiFi hotspot most of the time you might get frustrated - perhaps frustrated enough to get an iPhone. :-) I have to admit that these two apps are so incredibly useful I'm strongly considering getting an iPhone myself so I can use them anywhere. I'm just not that thrilled with paying for a data plan because I'm not sure the amount of time I'd want to do something Internet-related away from my home would justify the cost of a data plan. But this certainly has me thinking about it.

I'm not certain these two apps are intended for assistive technology users because anybody can use them. But nevertheless they are about the biggest assistive technology upgrades to the iPhone and iPod Touch that I've seen to date. As of this writing they are still free so check them out!

- Paul Natsch

[msdbanner]

[disclaim]

Make Your Own Macspeech Dictate Commands

Icon for MacSpeech DictateBakari, of the Mac Photography Tips blog, has made a great tutorial video which shows one way to create new commands for MacSpeech Dictate - using the "Menu Item" command type.

There are other types of MacSpeech Dictate commands which you can make that will let you do things that don't have a menu item, but this is one very simple way to create a new command without needing to know anything about programming or other complicated "geek stuff". I suggest you watch the movie in full screen mode so you can see what Bakari's doing:

[embed width="640" height="385"]http://www.youtube.com/watch?v=iaE4VzWUxZ8[/embed]

At the end of the video Bakari says that one of the limitations is that MacSpeech Dictate can't push the "Post" button in the Tweetie program because it has no menu item. However, as he hovered the mouse over that button I clearly saw the tooltip showing "⌘⏎" which are the symbols for the comand and enter keys. MacSpeech Dictate can do keystroke combinations with the verbal command "Press the keys..." or "Press the key combo..." so if we tell MacSpeech Dictate "Press the keys command enter" or, "Press the key combo command enter" then the tweet will be sent!

It's not as neat as making a new command named "send tweet" or similar, but it works. Note that a new AppleScript command could also be used, but this is a bit more "geeky" and requires a touch of programming, so I won't go into it here.

A note: I found that to use the "press the key combo" command that it's important to say the whole command as if it's one phrase. My instinct is to say the command as two phrases, as if the punctuation were something like this:

Press the key combo, "Command Enter".

But this makes MacSpeech fail to recognise the command because I'm pronouncing it as two phrases where the program expects one. If I say it all as one phrase, more like this:

Press the key combo command enter.

Then the recognition is very good.

Do you have any programs you've made MacSpeech Dictate commands for, or that you'd like to make commands for and don't know how? Leave a comment and I might write about commands for your specific needs in future articles!

- Ricky Buchanan

Dictating Well: Principles From A Master

Icon for MacSpeech DictateGuest Post by Colin Oberin.

[Ed: Colin Oberin has very kindly agreed to write about knowledge of the art of dictation. His dictation was - and is - to a secretary taking shorthand or to a tape recorder for later transcription, but I believe many of the principles are the same as when dictating for a speech to text program such as [msd] or Dragon NaturallySpeaking. I have added notes where appropriate to explain how these ideas can be used specifically by MacSpeech Dictate users. - Ricky (pictured below with her Teddy)]

I have lost count of the many thousands of letters, memos etc. which I dictated in a career of nearly 50 years (and counting). However, I clearly remember the first one. Exactly 2 months after my sixteenth birthday I started work and during the very first week I was asked to write a letter to one of our customers. I remember sitting at my desk jotting a few words on a pad and then crossing them out again as I tried to work out how to write a professional sounding letter. At that moment the Sales Manager walked by and asked how I was getting on in my first week. He then asked what the sales staff had me doing. I naively said I had been asked to write this letter.

Portrait of Colin OberinWithout hesitation the Sales Manager said he would teach me how to write a letter. I was instantly relieved - but then I found out what he had in mind. After sitting me down in his office the Sales Manager called in his secretary. In those days letters were typed on manual typewriters using carbon paper between the pages to generate a copy for the file. That was a specialist job and mistakes were hard to correct so it was important to get it right first time.

Most letters were dictated to a secretary (always female in those sexist days) who took down the letter in shorthand and then typed it up later. Letters could also be hand written and handed to the secretary to be typed up but I later discovered that this was frowned upon. The secretary was friendly and polite but old enough to be my mother and as she sat there with her pencil poised and shorthand pad balanced on her knee I was petrified. The Sales Manager asked me what I wanted to say in my letter. Not surprisingly I was dumbstruck. He then said: "This is how to do it" and promptly dictated the letter for me with no notes and no apparent preparation. He then told me that in future I was to dictate all my letters to one of the secretaries and that although he was happy for me to write my letters out by hand while I was still learning what to say, I should screw up the handwritten draft before I started dictating.

It was a tough initiation but a lesson which stood me in good stead for the rest of my career. I followed the advice and slowly improved to the point where, after a few months, I didn't need to write out my letters in full any more. I just jotted down a few points to guide my thinking and was then ready to start dictating. After a year or two I was able to dictate even complex letters without any written notes and only a rough outline in my head.

The aim of dictation is, of course, to clearly and concisely convey verbally, the wording you want converted to written form. While I was learning the art of dictating to a secretary, the secretary would give me tips on how to dictate in a way which made it easier for her to understand what I wanted typed. If I was mumbling or not speaking clearly enough, or not explaining what punctuation I wanted, the secretary would remind me. In that way mistakes were minimised and accuracy improved. Today the dictation program on your computer won't give you that type of personalised guidance (or ask about your weekend) and it will not be effective in transcribing your dictation if you aren't dictating in a way the program can interpret accurately. Therefore you need to work out for yourself the technique which gets the best results from your program.

Colin's office desk circa 1978, with a 3 year old girl sitting at it.

Whether dictating for human or automatic transcription, the same principles apply and, based on my experience, attention to the following points should improve your dictation skills and hence the accuracy of the transcribed result:

  • Engage your brain before you mouth - knowing what you want to say before you start to speak is important. Correcting/changing the spoken word is easy in conversation but extremely difficult during dictation - better to get it right first time
  • Have a plan - if you are not experienced at dictating, learn the art. Start by writing out what you want to say and then reading it aloud to the dictation program. Then try dictating from memory what you have written rather than actually reading aloud and see if that works better. As your technique improves so will the accuracy of transcription by the program. With practice you should progress from writing out what you want to dictate to just jotting down an outline and ultimately to just making a written (or in time mental) note of points you want to cover before you start dictating [Ed: For those of us using MacSpeech dictate because of disabilities writing out a draft won't be practical in most cases, but I find it still helps to think about what I'm going to dictate before starting. Figure out what comes first, what points I need to make, what comes last. I don't know if this is less effective than writing things out as I've never been able to try writing things out, but it works for me.]
  • Practice makes perfect - in dictation as in other things. If you're inexperienced try dictating into a recorder and then playing it back. Listen critically as if you had to write down what is said and see if you can improve clarity next time. [Ed: This is easily done when you already have a computer with a microphone. You can use a stand-alone program like Voice Candy, or simply use the "Press play" command in MacSpeech dictate to have it play the audio of the most recently dictated phrase.]
  • Don't gabble - speaking slowly and clearly yields better results
  • Don't become a metronome - speaking in phrases, just as you would if giving a speech, is much more effective than s p e a k i n g s l o w l y a n d c l e a r l y b u t w i t h o u t a n y i n t o n a t i o n or p h r a s i n g.
  • Don't speak too softly - if the machine can't hear you it won't ask you to speak up
  • The program will record what you say - not what you meant to say, so try to speak clearly and only say what you want written.
  • Avoid fillers - you know when we are not sure what to say, you know, we fill the space with, umm, fillers. Avoid them or, you know, the program will, err, type them.
  • Use the pause function - if you are not sure what to say next then simply hit pause while you think before starting again. That sure beats having to go back and edit out all the "you know" type filler words later. [Ed: Using MacSpeech Dictate you could use "Go to sleep" to turn the microphone off temporarily while you think, then "Wake up" to turn the microphone back on for dictating again.]
  • If writer's block strikes dictate an instruction to yourself - such as "finish this paragraph later" - and proceed with the parts you can get done rather than sitting worrying about what to say next. You can always fix the order with judicious cutting and pasting during editing
  • Don't use truncated speech - written language differs from common speech and if "ya wanna look OK written" dictate that "you want your wording to look appropriate when written"
  • Remember to punctuate as you dictate - when you want a comma or a full stop inserted or a new paragraph started, include the instruction in your dictation. Various programs may handle this in different ways so take the time to learn what your program does and where possible change the settings to the version of punctuation (US English, Australian English etc.) you prefer [Ed: I believe that MacSpeech Dictate will adjust its expectations of punctuation names - such as "period" or "full stop" - according to the region set in the System Preferences "International" pane, Formats tab.]
  • Learn how to make a correction - either just say "Correct" then proceed by saying what you should have dictated and correct it later when editing or learn the options available in your program such as how to back space and over-record if your program allows this [Ed: With MacSpeech Dictate it's best if you make corrections by voice as you go, as the program will learn from your corrections. Use the "Show recognition window" command so you can see MacSpeech's other guesses for your phrases - then you can easily use the "Pick 1/2/3/etc." command to select an alternative, editing it if necessary. Watch the How to Use Phrase Training video if you aren't sure how this works.]

Once the dictation is finished the job is not done. Proof reading the written word to correct mistakes (whether your mistake or the program's mistake doesn't matter) is vital. If, like most people, you tend to read what you meant to say rather than what is actually written down when you proof read, try reading the written word aloud. That way is easier to notice problems with the written word. Another trick is to read what is written to see if it conveys the right message - not to look for mistakes. [Ed: With OS X it is simple to have the computer read the text back to you, this function is built into the operating system. You can simply set up a shortcut key to speak the selected text or if you would like fancier functions such as word/sentence highlighting and the ability to pause speech I recommend the GhostReader application.]

Dictation is an art but once learned it will save time and allow you to order your thoughts so as to create a coherent narrative requiring minimal editing. For most people, dictation results in better structured and more creative writing of letters, essays etc. than either handwriting or typing out your own thoughts. Somehow the mechanics of recording your thoughts onto paper or a screen gets in the way of interesting and creative writing for most people.

- Colin Oberin

[Ed: [msddisclaim] - Ricky]

[msdbanner]

Set A Default MacSpeech Dictate Profile

Icon for MacSpeech DictateWhen [msd] is launched it will prompt you to choose which voice profile to use, even if only one profile exists. This can be annoying and certainly unnecessary, but luckily there's a simple way around the problem:

First open MacSpeech Dictate, selecting the profile which you want to be the default. Select "Preferences" from the "Dictate" menu, and make sure the "General" icon (the left-most one) is selected. Your window should look like this:

MacSpeech Dictate Preferences Window, with Show Startup Window checkbox highlighted.

Then simply un-check the box beside "Show Startup Window", as I've highlighted above. This will stop the profile selection at startup request.

If you later want to switch to a different profile, or manage your profiles in some way, you can easily display the Startup Window manually by opening the Tools menu and selecting "Profiles...".

What other tricks have you learned to make it easier to use [msd]? Please share them in the comments!

- Ricky Buchanan

[msddisclaim]

[msdbanner]