Dispelling Myth about Dictation / Speech Recognition

When I first started writing this I had just finished listening to episode 156 of The Bestseller Experiment; as a patron supporter I get early access to episodes, as well as being a member of the wonderful BXP Team. The marvellous episode focused on interviewing the author Julian Barr about his new book The Way Home. Julian is also a long time listener and member of the BXP Team. I highly recommend Julian’s book, a gripping tale that was well paced, characters with connections and motivations. His book has also now earned an Amazon bestseller tag! I’m very much looking forward to the next book in the series.

Important paranoid associated thought: like many writers I feel like a fraud that just needs to write more and thus I feel awkward about asking for advice, after all I’ve already answered my own request for advice “Write more!” Anyway, later in the episode the two Marks discuss writing using Speech Recognition (SR) and gave a call-to-action regarding listeners experiences with writing via dictation. I was surprised to find that I felt empowered and not a fraud, since this is a topic I know quite well.

As someone with long-term chronic Repetitive Strain Injury (RSI) in both of my wrists I have a lot of experience with speech recognition, going back nearly twenty years to the horrendous days of massively inaccurate software; the frustration and stress of trying to use the software often made me feel even worse! Fortunately the various programs have improved so dramatically in the last ten years that I find dictating to be dramatically faster, easier and shockingly more efficient. The vast improvements have come about because of the following factors:

  1. Understanding of what is involved in analysing language (technical).
  2. Improved code efficiency (technical).
  3. Substantially increased computer processing power (brute force).

This also means that modern speech recognition is better are recognising accent and voice differences. With training, software should adapt to work near perfect for most users; I appreciate that is quite a bold claim.

As someone that used to be able to maintain a decent enough typing speed of between 70 to 80 words per minute (WPM), having that ability taken away from me was devastating; I was unable to work or partake of most of my hobbies. Having struggled through the horrid early years of dictation I can appreciate why people are loathe to give speech recognition a try, however just about every problem has gone away these days.

In general many people are not up to date with the latest information when it comes to cutting-edge technology; after all there is so much to do/learn. This is in part because the various non-specialist media outlets are often years behind when reporting non sensational things, there is so much to talk about and typically they repeat the same core points. In this ever-accelerating technologically era I suspect anyone that has not used modern speech recognition has heard opinions that are about software from 10+ years ago. My title was not an attempt at clickbait, when I discuss or read things about speech recognition there is an understandable fixation on accuracy, but with modern software claiming accuracy of 90%+ for most people with little to no training, and 95%+ with some training, I wonder why accuracy is still considered a barrier to entry. It seems like my system is 99% accurate, but I appreciate it has been used a lot over many years. My point is that typically most people will type errors anyway, even with grammar and spell checkers mistakes slip through. Even for those that manage a rare 100% accuracy the first time they type something the result should still be double-checked. Mistakes are still made, accuracy is a concern whether typing or spoken, so why not do the vast majority of the work via speech?

When I was working in adult social services I had severe RSI flare-up, in fact my worst ever that caused a domino of problems. When I returned to work for a while I was able to cope due to using speech recognition, despite being in a large busy office. I was surprised at how accurate it was even with all the background conversations. Additionally instead of using a mouse to navigate the screen I found using commands to finally be efficient. How things had changed!

During long bouts of sleep deprivation I can somewhat rest my eyes whilst dictating. Thankfully I rarely get headaches, but dictation has also proved helpful when I have; I find it’s better to do something than nothing, since I’ll be suffering either way.

I’d like to highlight that a hybrid approach can be used. Especially if you can still type and you want to, then do so. Can be quite easy with today’s smartphones maybe you can use speech recognition whilst away from your normal work area. For the following reasons I’d recommend at least experimenting.

Speech Recognition Pros & Cons

Pro 1: Health

When dictating we don’t need to be sat down or stood still, we are not tied to a keyboard. Since we can move about I often do so. Over the years I have done all manner of things whilst dictating: physiotherapy, light exercise/stretching, to things like cleaning or ironing, etc. When I am having a particularly painful wrist episode my arms, shoulder and back all become problematic, resulting in difficulties sitting or standing for any length of time, so on a particularly bad days I’ve even dictated whilst resting in bed.

Con 1: Training Time Investment

Like any new skill there can be a learning curve, which can vary dramatically from person to person. Although these days even without any training on a modern device and software, dictation can start out at 90%+ accuracy.

I appreciate that getting out of comfort zones and allocating time to learn something, can be challenging. Saying embrace the challenge is all well and good, but people and their situations can vary wildly. It is sensible to decide during an epically busy time that doing something new is too much of a risk, but because life is strange maybe the change will quickly be beneficial, even in regards to time, which links to Pro 2 …

Pro 2: Speed

Personally, I think the health reason is reason enough but just in case here is another reason. Just because a person is good at typing does not mean they should stick with that method, since dictating can allow them to be faster. I often find it easy to dictate over a 100WPM, sometimes as high as 150WPM; granted a few typists with specialist keyboards can beat that, but for the vast majority of people dictation is twice as fast typing.

Following on from Con 1, it is worth learning the extra functions like how to navigate via dictation, as well as the various advanced commands. Going from quick dictation to struggling to carry out navigation commands can make you feel like a writing session was ruined; writers typically have enough reasons to procrastinate without imagining new ones 😉

Speed is a major factor for writing events like #NaNoWriMo, thus the speed advantage of dictation can really pay off.

Con 2: Initial Costs

Not everyone has a computer (desktop/laptop/tablet) or smartphone (I’m only differentiating because so many people typically do, as it is really just a computer with a phone function). Free speech recognition exists but I do find Dragon NaturallySpeaking to be better overall, but it isn’t cheap.

Then there is the topic of what microphone to use. Whilst you can use a laptop’s built in microphone it is better to have a decent microphone, although I’ve found that a £25 microphone works just as well as my more expensive Yeti, so you don’t have to buy crazy equipment.

Other extras: I’ve also invested in a microphone stand, pop-filter, USB cable extension and a high quality wireless headset. The extension and wireless the reason I can exercise or tidy my room whilst dictating.

One of the problems I found using my fantastic quality Yeti microphone was there were a few delays/problems with the software, but this was because I had leaned back in my chair and thus wasn’t close enough to the microphone. So before you rush off to buy an expensive microphone consider how your setup can be altered to get improvements.

Pro 3: Speaking is Natural + Rhythm of Speaking

Based off this subtitle you can see why Nuance called their software NaturallySpeaking 😉 Particularly when dictating dialogue I find I can write a better scene; I think this down to being able to somewhat act the scene out, I feel more in character as I switch back and forth between character perspectives. I’ve even experimented with literally acting a scene out, although that led to some comedy moments of frantically changing my position to be the correct character, like a stand-up performance.

Sometimes we can spend a lot of time thinking about a subject only to find that when we speak we change what we had intended to say. There is something about speaking out loud; maybe it is because we engage more of a body, thus more of our brain. I also think this is probably a knock-on effect of evolution in regards to us being such a social species, we need to be careful of what we say to others.

One of the best tips for writers is: “Read your writing out loud.” Dictating can be a big help, you get used to speaking out loud, thus when it comes time to edit your work you are more likely to give it a try. This also links to one of the key tips from Bestseller Experiment, “Make a public declaration.”

There is another advantage to dictating. If you think of a sentence and then struggle to dictate it, then that is a sign there is a problem. Typically you’ll easily find a rhythm, indicating were commas and full stops best fit; granted you have to say “comma”, but I think that is no different to having to press the comma key. Maybe somebody who struggles with grammar could benefit from dictation?

Con 3: Editing

As I mentioned above I think this is a con that gets too much attention, since work should be double-checked anyway. Still it can be particularly irksome during the training period, when correcting (editing) as you go is highly recommended. I think a valid point about the accuracy aspect is that they are typically errors that we are aware of, unlike when most people type and things slip through.

Crucially this is a problem that fades over time, I rarely need to correct things. Since I write fantasy fiction and role-playing games I also have lots of additions for my fantasy proper nouns, my system mostly recognises these new words after the initial correction or two. Just like with typing it is more important to get something written first, then you have something to edit.

Pro 4: Flow

Due to the pain from my disability, I lost my ability to enter a flow state whilst writing/typing. It was 2009 when this this feeling briefly appeared during dictation. My comfort level with dictating slowly grow over the years, by 2009 I found talking to my computer to be more than only comfortable but also empowering.

Con 4: Habits

Initially when first learning to use speech recognition a user can feel they are wasting their time. Why bother stressing yourself out, fighting your habits? I’ve separated this point from Con 1: Training, because I think habits/traditions are such a powerful part of our psychology.

Habits are typically difficult to break; various people can react differently to the same thing. Decades ago I had the regular association of being denied the use of my wrists to type a decent work session, the threat of pain from typing as well as sitting too long, plus stress and sleep deprivation. Since back then speech recognition was lacking, I quickly developed justifications about putting things off. In the light of pain-paranoia and frustration it became easy to justify thoughts like “I need to minimise computer usage even using dictation, so I need to work out as much as possible upfront.” Once I developed this habit I found it hard to break it, even as the ability of speech recognition improved.

Pro 5: Focus

I find I do not get distracted as much when I am dictating. Maybe because I am typically away from my desk, so I cannot easily check emails or browse. It can seem like our hands have a mind of their own when within a split second of thinking about a website we’ve switched to that. This is why so many writers use blocking software that restricts their access to the Internet. Following on from Pro 3, I find that if I do start giving my computer commands to browse non-important things I quickly stop myself.

Con 5: Stream of Consciousness

Dictating does not dictate quality. The fact we can dictate more WPM means we can also have more to edit. This is a minor Con, yes I’m being nit-picky, but over the years I have dictated a lot of garbage. I think I have solved this by writing more, showing others my work, learning more about writing; not just practice, but learning to carry out skilled practice. If you feel that when you start dictating you are writing garbage, don’t worry I think you’ll quickly adapt.

Bonus Pro: Moving is Thinking

Linking back to Pro 3: Speaking is Natural, there is something about moving and thinking, dictation means you don’t have to be sat still at a keyboard. When we move we are activating different brain regions, plus getting the blood flowing, etc. Physical intelligence is one of the many types of intelligence being researched, plus whilst kinaesthetic leaners are typically separated from other learning types, the majority of people can learn in all manners of ways including kinaesthetic. Quick interesting point, animals have a more developed brain than plants because they need to navigate; the sea squirt is a fascinating creature that once it finds a permanent spot for its next stage of life eats its own brain. It is also worth looking into the tools of memory specialists and how they utilise virtual spaces to associate memories for better recall.

Some speech recognition software allows for the transcribing of previously recorded speech. You can even transcribe a recording of another person, although I’ve never done this and I am not sure of the efficiency of the process.

I’ll be making a video version of the blog in the New Year, but before I finish here are so extra points. Dictating role-playing mechanics is not a big deal, I’ve even used speech recognition to dictate computer code years ago; I am contemplating giving it another go with the vastly improved software and machine power of today.

Whether walking outside or in bed trying to sleep (chronic pain is hell), I’ve dictated notes via my smartphone’s built in software. Granted it is not as powerful as Dragon, but it is easy to do and I don’t have to get out of bed. I’ve also made use of a Dictaphone with a headset whilst walking, that I’ve later dictated at home, this counted as a first draft. Dragon Anywhere allows for dictating on the go, but I cannot afford it and I am rarely out and I have Dragon 15.

In conclusion if you are still not sure if speech recognition is for you, I highly recommend giving it a go, at least go hybrid, mix things up. The future is already happening!

Links

I’ve written about The Bestseller Experiment before.

The Bestseller Experiment Podcast

Julian Barr

NaNoWriMo

Advertisements

Bestseller and GollanczFest

Last week, November 4th & 5th, was quite the experience as I went to the Gollancz Festival. The event was held at Foyles, a rather grand bookshop in London. I had wanted to go to the writer’s workshop, but that had sold out. Fortunately for me I won tickets to the main Gollancz Festival via the Bestseller Experiment podcast.

Since my friend Richie was going to the writer’s workshop the event also had an extra appeal; we already chat a lot about our writing, and it’s rare we meet up these days. Plus I had not visited his home yet, so after a brief discussion an extended visit was planned.

One thing about long train journeys is at least there is plenty of chance for reading and writing. Even for someone like myself who suffers from travel sickness, trains are generally tolerable for me, plus when I did feel a bit off I stopped writing and changed to listening to an audio book.

Joining me on my journey were Moo & Bat, my mini-fluffy-sidekicks. I planned on taking some silly pictures of them on the train and at the festival, in part because I’ve been thinking through some children story ideas. Plus the Adventures of Moo & Bat amuses my wife.

The Adventures of Moo & Bat

I’ll write about the Saturday morning Gollancz Festival panels next time. I’ll end this short post by highlighting that the Bestseller Experiment has a Patreon fund. Considering the value Mark Stay & Mark Desvaux have provided with this great podcast, it is something I am happy to support even though I currently have no income due to health problems. Just to clarify, I had backed them before I knew I had won Gollancz Festival tickets 😉 It would be quite sad if the podcast does not continue, and it’s worth considering what quality & quantity season 2 could provide, so please consider getting involved.

https://www.patreon.com/bestsellerexperiment

I quite enjoyed Mark Stay’s recent interview with Cover to Cover.

Part 2 of my GollanczFest visit.