[24 July 2017]
We use reverb to get ambience into dry recorded tracks. Electronic reverb is a simulation of the transmission, absorption and reflection that happens between a sound source (instruments) and sound receivers (ears or mics). The geometry and attributes of surfaces in real rooms, as well as the positions of sources and receivers, greatly influence how this sounds. When we use electronic reverb, typically we pick from a set of canned presets and then fiddle with some parameters to get a pleasing sound quality.
With a convolutional reverb (e.g. Reverberate 2), we can go beyond canned presets and use our own WAV files containing the impulse response (IR) of real (or unreal) rooms, which are capable of very accurately simulating the sound of any source transmitted through that same path. We can buy IR's, scrounge the internet for them, or... make them ourselves. This tutorial is focused on how to make our own IR's by capturing the sound of a real room.
What is an impulse response (IR)? On one level, it's nothing more than a recording of the sound of an ideal impulse emitted at some location, and then received at another location. An ideal impulse is basically a loud click, surrounded by silence. More precisely, in the context of digital audio, the click is digital 0, and then a a single sample at full scale, and then back to 0. When you generate a loud click someplace in a room, and then listen from somewhere else in the room, you don't just hear the click, you also hear reflections of the click, and reflections of the reflections, and so on. The combination of all that stuff is the IR, and it typically has a sound quality that is very different from the dry click that we started with.
Once we have the IR of a room & source/receiver pair, we can simulate the sound of any source playing in that environment.
For example, we might have a dry, close-mic'ed snare sound and we want some realistic room ambience to make it more lively. Typically when recording a drum kit in a studio, we'd also set up room mics, far away from the kit, for this purpose. But maybe we didn't do that, or maybe there's way too much hi hat in the room mic, or bleed from guitar amps, or it just doesn't sound great. A captured IR can help work around all those problems.
IR's are useful for more than just room reverb. For example people also talk about "Cabinet IR's" in the context of guitar amp simulation. The principle is the same -- you feed a signal through a speaker cabinet that has a certain tone to it, and measure the results so you can get the same tone later using simulation.
Signal theory tells us that an IR can faithfully represent any linear, time-invariant (LTI) filter. Read more theory at Stanford or Wikipedia.
"Room sound", i.e. the mixture of reflections and absorptions when a particular sound interacts with a room, qualifies as LTI -- to the extent that the air isn't moving too much, and the walls aren't moving, and there aren't snare drum rattles and other audio distortions.
It's important to understand that a particular room actually has an infinite number of IR's associated with it, because the position of the sound source and the sound receivers (mics or ears) greatly influence the response. (For typical mixing purposes I just want something that sounds good, but I'm curious about whether capturing multiple IR's with different source positions would make it possible to closely simulate the sound of (say) a band in a room, where each instrument is received slightly differently.)
Also worth noting, many common electronic audio effects are LTI, and many are not; for example:
EQ - yes
Delay, possibly with filter - yes
(static) Reverb - yes
(static) Phaser - yes
Non LTI effects include:
Flange - no, like Phaser but time-varying
Chorus - no, like Delay but time-varying
Distortion - no, non-linear
Wah wah - no, it's a time-varying EQ
Compression/Limiting - no, both non-linear and time-varying
One obvious way to record an IR is to set up a monitor speaker and some mics, play a loud ideal click through the speaker, and record the sound from the mics. Now, if we take that recorded impulse response and load it into a convolutional reverb, and then process some audio through that reverb, the result is a simulation of what it would sound like if you played that audio through the monitor speaker, recorded by the mics, in the positions you put them in when you captured the IR.
I actually did this procedure in my apartment. I was looking for a pleasant but realistic reverb sound to put on a snare drum. (Actually the first thing I did was to re-mic the snare drum by playing the isolated snare track through monitors, and recording mics down the hallway. This was tedious though, and noisy, and I felt I needed to move my monitors to get a better sound, which led me down the path of capturing an IR, so I can get room sound whenever I want, without having to turn my apartment upside down.)
The first thing that became very obvious is that my apartment is surrounded by annoying noises -- sirens, air conditioners, people yelling, construction, car alarms, the elevator, people in the stairwell, etc. Those noises show up in the IR and degrade the results. Even with the speaker playing single-sample clicks turned up super loud, the reverb tail goes on for up to a second, and becomes very quiet and delicate as it trails off.
I reasoned that I could reduce the noise by recording many identical clicks, aligning them perfectly using software, and then averaging them together. I gave this a try, averaging 32 (or was it 128?) two-second click recordings. This bought me ~15 dB (or 21 db?) of noise reduction, but it was a pain in the neck -- I had to play and record clicks for several minutes, and then manually reject ones with really objectionable background noises, and write software to align and average the results. The results are usable, though maybe still not pristine.
One fundamental obstacle to getting good signal/noise is that directly recording a click requires all the sonic input energy to occur over a single sample period (i.e. 1/sample_rate), and given real speakers, that's just not very much energy compared to the other stuff in the environment.
I did some research and uncovered a better approach, using a crafted known signal as the sound source, along with some post-processing to compute the IR. The math/theory here is simple:
recording = known_signal * room_IR room_IR = recording * known_signal_inverse where * is convolution, and known_signal_inverse is the "convolutional inverse" of the known_signal.
Looking up convolutional inverse sent me through Deconvolution which is a huge rathole for general signals. So don't read that. Instead...
Fortunately, in 2000, Angelo Farina solved this problem for capturing room IR, using Exponential Sine Sweep (ESS) for known_signal. The convolutional inverse of an ESS is just the reversed ESS with a volume ramp, and has a steady volume over a longer period of time, which makes it friendlier to speakers and more resistant to background noise.
Aside from better signal/noise, ESS has some other nice qualities for capturing IR's, for example it time-separates the harmonic distortion of the system from the linear response. For the purpose of capturing room IR's we just delete the non-linear stuff but if you wanted to characterize it, you could do it with ESS. See the references for details.
So the new procedure is: set up speaker & mics, play ESS through the speaker, record the results, and convolve the recording with the inverse ESS to get the IR.
We can buy commercial software to do this, but I found it much more fun and rewarding to write my own script.
Use these sweeps if you don't want to generate new ones. I decided 4 second sweeps worked well for me but longer sweeps would presumably have a lower noise floor; feel free to experiment.
1. Get an ESS .wav file. You can use a pregenerated one from above, or generate your own using my script (see section below).
2. Set up your speaker and mics. I used Presonus Eris 4.5 mini monitor speakers and a pair of Shure SM-81 small diaphram condenser mics. The more accurate your speaker and mics, the more accurate the results, since the results will include the impulse response of the speaker and mics you use, in addition to the room itself. But please feel free to experiment! To find good locations, you might want to get some nice long cables and play a repeated click or snare drum or some other signal through the speaker while you move it around. Then walk around and listen for a pleasing reverb tail, and put the mics there.
3. Play the ESS through the speakers while recording the mics. The length of the recording should include the whole ESS, plus enough additional time for the full reverb tail of the room. In my example the ESS was four seconds long and the room reverb died out within two seconds of reverb, so I needed at least six seconds of recording. I did 8 seconds to be safe.
4. Run the recorded .wav files through my script, to produce an IR .wav file (see script info below).
5. Clean up the IR using an audio editor. Delete everything before the onset of the click; ideally this should be silence although in practice, via the magic of the ESS math, it captures nonlinearities in the electronics/speaker/room/mic chain, which might be interesting but not relevant for reverb, plus background noise, which we don't want. This is optional, but I included a few milliseconds of silence before the onset of signal to simulate the sound propagation delay between the speaker and mics. One foot per millisecond, e.g. if the mics are about 12 feet from the speaker, include 12 milliseconds before the first click audio appears. (This could be done more precisely by recording the ESS on another track in parallel with the room mics, etc, but I just guesstimate.)
6. Load the IR into your convolutional reverb and see how it sounds!
I wrote some Python code to generate sweeps and convolve recorded sweeps into IR's. It should work with stock Python, with numpy installed. download script expsine.py
Usage example:
# python -i expsine.py >>> write_ess("ess_44k.wav", "inv_ess_44k.wav", 4.0, 40, 20000, 44100) (generates and writes .wav files for an exponential sine sweep and its convolutional inverse, 4 seconds long, starting at 40Hz and ending at 20kHz, at a sample rate of 44100 samples/sec) >>> compute_ir("recorded_sweep_in_room.wav", "inv_ess_44k.wav", "computed_room_ir.wav") (convolves the recorded sweep with the convolutional inverse in inv_ess_44k.wav and writes the result to computed_room_ir.wav)
Original paper here:
A. Farina, "Simultaneous measurement of impulse response and distortion with a swept sine
technique," presented at the 108th AES Convention, Paris, France, February 2000.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.1614&rep=rep1&type=pdf
Good theory & advanced stuff here:
"Advancements in impulse response measurements by sine sweeps", Angelo Farina, AES 2007
http://pcfarina.eng.unipr.it/Public/Papers/226-AES122.pdf
Good practical overview here:
"SURROUND SOUND IMPULSE RESPONSE: Measurement with the Exponential Sine Sweep;
Application in Convolution Reverb", Madeline Carson, Hudson Giesbrecht, Tim Perry.
http://web.uvic.ca/~timperry/ELEC499SurroundSoundImpulseResponse/Elec499-SurroundSoundIR-PreREVA.pdf
More here:
"Swept Sine Chirps for Measuring Impulse Response", Ian H. Chan.
http://www.thinksrs.com/downloads/PDFs/ApplicationNotes/SR1_SweptSine.pdf