Audiobooks – Flying Sound

Flying Sound offers services in recording, mixing and mastering audiobooks. Here is a checklist of sorts of what needs to happen to create an audiobook.

How to record an audiobook: What you’ll need

The first thing you will need is a room that is very quiet and not very ambient. An acoustically treated room is best but a closet with a lot of clothes can be great. Make sure to use your ears and not just look at the room as, for example, foam can do nearly nothing to deaden a room but look effective if it is too thin. The wrong type of foam can actually make a room sound worse.

Second, you’ll need a good microphone and audio interface to plug into. We often use a Neumann U87 for the mic plugged into a Grace Model 101 Preamp going into Pro Tools. The best mic for the job is the one that best suits the voice being recorded. The Grace Model 101 is a good choice because it is clean with a very fast transient response. Another great choice of preamp with fast transient response is the John Hardy Jensen Twin Servo 990 or any of his other preamps

The computer interface is unimportant however quality adds up in every stage. If you are looking for a budget option with a preamp that is actually nice then the Audient iD4 is a good choice as it has a Class-A Discrete preamp.

Recording Audiobooks
Before you start recording get organized. Know exactly what you need for publishing to the platform you intend. Here are Amazon’s requirements for audiobooks.

Your first order of business is to get your gain staging perfect. Find the right gain setting on the mic preamp so that the reader is being recorded as loudly as possible without clipping or driving the circuitry. The Grace preamp sounds the same whether it is quiet or loud but many preamps start to sound driven if they are turned up high so you might want to back off a little. It is better to get a quieter but better-sounding recording and to turn up the volume digitally later if it is too quiet. Conversely you will get a better signal if you have a really clean and high-end preamp turned up higher because the noise floor will be lower.

You’ll need to give the reader a primer on microphone technique. I like to angle the mic from above slightly to get more of the sound coming out of a person’s chest and to avoid plosives. They need to not ‘reach’ up into the mic and also be aware that as they turn pages their head will turn thus changing the sound and also that when they turn the pages to have two up at the ready so that they don’t have to stop mid-pragraph. I find readers do better with a large-screened ipad however they are more reflective of sound so pay attention to the angle. Turning pages makes noise and will need to be edited vs scrolling on a tablet can be essentially silent and save a lot of time.

Record each reader/author on their own track. Make a new playlist or track for each chapter. You’ll want to keep track of which day/chapter is which so you can use a new track, a new playlist, or my personal favorite is to change the track/playlist name to “Reader-Name Chapter X YYYY-MM-DD Original” before I hit ‘record’ on each chapter. This is so that you can easily see which day is which. The reason you want to know that is that people’s voices, the humidity and the alignment of the stars all change every day and thus the sound is also changed.

Start recording from the beginning of the book and leave your DAW with insertion follows playback. This will keep everything chronological. If you are recording to a new track then keep the playhead at the same time the previous chapter ended.

Use markers as you’re recording the reader or author to mark any moments that will need to be fixed later. It is also a good idea to highlight those same marked moments in the manuscript as well. Monitor that your software is still recording at all times as it can be a huge time loss if you need to re-record, particularly if it’s another day.

Whenever something changes (the day, settings) record a long (1 minute) room tone for editing described below. Alternatively you can record a bit of silence after each chapter.

If you have additional readers for the book try to record them approximately where they will be in the book. Leave extra space if necessary. E.g. if you have two authors and two halves of a book but the 2nd half author is recording first then start recording at something like 10 hours in so that you have plenty of space. If you have many essays make a marker for each start point and give them all space. The benefit here is that you will have everything organized in order and not have to guess what goes where or search through the manuscript. In many DAWs audio can be ‘spot’ted back to its original location in the timeline at the time of recording.

Once done tracking all of the audio duplicate the playlists changing the “Original” in the track names to “Processed” and consolidate the audio files.

It’s much simpler to do the consolidates first (like de-noise) and then tweak other processing as plugin inserts. This is because often de-noise plugins are extremely cpu intensive and one rarely needs to change processing after the initial rendering while compression and EQ may need more tweaking. However I still want to have a way to re-do the rendered processing if I need.

I prefer iZotope RX Spectral De-noise with the following settings:
* Quality: D* (Best) – This is why you need to use audiosuite or the standalone application as these settings aren’t available for realtime (plugin insert) processing
* Algorithm: Adv+Extr.
* Adaptive mode: off (and “learn” the right moment of audio. See below)

These settings take forever to process unless you have a really fast computer but is worth the overall clean audio even if you have a great sounding room. Mouth de-click is another nice one to add. The standalone iZotope RX application has a batch mode with multiple layers of plugins (So you can batch all the files with Spectral de-noise and mouth de-click in one go, in the background while you edit)

Back to why we have that ‘rendered’ track .. in Pro Tools there are source files and then the realtime edits which are a part of the session file. You can edit the underlying audio outside of pro tools in a program like iZotope (or with audiosuite and hack em back in by renaming the file and then re-linking to the new file ID in the pro tools relinker). If the audio file is the exact same length then all of the edits you make will be applied to the new audio. This would allow you to say, apply a mouth de-click on the whole file once it’s been edited.

The de-room-mode trick

If you find a moment where the room resonates with itself you can use that piece of audio in Spectral De-Noise to ‘learn’ the room modes and correct them. It is audible to the untrained ear but subtle. It is essentially ringing out the room — in this case Spectral De-Noise works more like a dynamic EQ.

Pulling Everything Together

Once you have your raw tracks de-noised and processed it’s time to start editing!

Make new playlists again — we want to leave the rendered files as ‘originals’ just like the pre-rendered files. I prefer just the reader’s name as the tracks are easier to see on digital scribble scripts.

Save your session and then close it. Copy the session file (not all the audio files and folder stuff, just the .ptx session file itself) and add your initials, date and “Edited” to the file. E.G. “2019-07-08 The Book JAC Edited.ptx” As you work on the book for the next few days keep making new ptx files with new dates. (And later you can make “Mixed” and “Mastered” files too)

The first stage of editing is going backwards from the end and correcting all the issues that you have with noted with markers and deleting the markers. They will still exist in the old session in case you need to check.

Leave breaths in as they sound natural unless they are weird-sounding or excessive and then either replace with room tone (that you gathered above) or with another breath. Don’t re-use a breath more than a couple times because it will sound weird. You can remove a breath if the break between speaking is long enough. Again use your ears. Remove stumbles and extra-long pauses when the reader is thinking unless it sounds good. your ears are the judge.

You might want to group tracks if you have more than one reader so that they follow edits and timing in the original read is preserved

One handy trick is to use tab-to-transients for editing. Since you had the reader go back a sentence or two when they stumbled over a word you can find where they start flowing again and tab to the transient of a start of a word, then find that same word before the stumble and tab to that, then shuffle mode (option-1), delete the middle audio, and the re-read will align perfectly in time to the old one. Add a transparent-sounding fade and the edit is done. Since your tracks are grouped other reader’s bits will keep the correct time relation.

Replace pauses, page-turns, sips of water etc with room noise and use your ears. Often there is perfectly good room noise as a part of what you are removing but it’s handy to keep the room noise on the clipboard to paste whenever you need to make an edit with a gap. Feel the rhythm of the reader’s speech as you make the edit to sound as natural as possible.

If the rhythm is a little off where an edit isn’t otherwise necessary, correct it. If they make a sound of being sick like a sniffle replace it with room noise or a different breath.

Keep using your ears to tweak the sound while you do edits with the plugin inserts. Use plugin automation or separate tracks for the same reader when one section doesn’t match the other. Make sure to only do your audio tweaks at the beginning of the edit session or after a break where you’ve left the space. Ear fatigue should be prevented whenever possible (I never mix for more than 2 hours straight — always a break)

As you finish each chapter, section, essay, etc then clip-group them (all the tracks not just that track) in Pro Tools so that you can space the timing between chapters and also visually see what’s done and what’s not. Name them appropriately. To leave a little space at the end of the chapter just highlight the blank space. The best way to get a clean group edit is to highlight the previous chapter/clip, hit the down arrow to align the selection to exactly the start of the next chapter and then click on the first clip in the previous chapter. then command-option-g to group them. This becomes an easy way to test-and-save timing.

Once you’ve gone all the way backwards, congratulations — you have a draft copy of the audio book you can share with the author if you’d like. However, you still have a lot of work to do!

It’s time to go forward and mix while you fix the edits a second time. Remember to take a break after 2 hours to prevent ear fatigue. Refine automation.

You can repeat this step many times but I usually only do it once or maybe twice and then spot-fix. Really, you want the book to be basically done after the first cluster of edits, before the first pass.

Then, bounce to disk. You can also use a print track to listen along as you work and consolidate the files but I prefer bouncing for smaller disk space used.

From there it is time to master the track. In this sense, mastering means preparing for release! Get the image for the book and all relevant other files. Put them all together. Use XLD (free, open source) to convert to MP3 (LAME MP3, V0 is best)

You need to change the file type to “audiobook” in the ID3 tags for some devices to use to files as a proper audiobook. Add the image to all of the chapters. A trick I use is I have iTunes set on the production machine to not automatically import/sort files. If I add them, it will change the ID3 tags without moving the files. Select all and get info to do the first round of edits and add the book’s image and then change the individual track names to reflect the chapter/part/section and not the filename.

File-name wise sort the chapters with part dash chapter. (or just chapter with a leading zero or two). I.e. if you have a book with 14 chapters start name the file 01_book_name_by_author_chapter_1 through 14_book_name_by_author_chapter_14. If you have 114 chapters then chapter 1 would be 001_book_name_by_author_chapter_1 and 14 would be 014_book_name_by_author_chapter_14 and chapter 110 would be 14 would be 110_book_name_by_author_chapter_110 If you have 3 parts to your book you could do 101_part_1_chapter_1 201 part-2-chapter 2. If your book IS really long you could do part 4 chapter 436 to 4436_book_name_by_author_part_4_chapter_436 This organizes the files automagically in many audio book apps.

Zip them up and you’re done!

Other things I like:
* Even though space-consuming I like to record in 96k to remove antialiasing artifacts
* Backing up at the start of every break taken. For this purpose dragging the folder to another hard drive is sufficient (then have your hard drives automatically backed up off-site over the night.
* I go for a more natural sound with breaths and even some lip sounds because if everything is too perfect the audio book can be kinda boring. I personally prefer the reader to sound a little human than absolutely perfect.