DIY Room Impulse Response Capture

recording projects | IR Capture | Dropout Spy | Gear Notes

DIY Capture of Room Impulse Responses

[24 July 2017]

We use reverb to get ambience into dry recorded tracks. Electronic reverb is a simulation of the transmission, absorption and reflection that happens between a sound source (instruments) and sound receivers (ears or mics). The geometry and attributes of surfaces in real rooms, as well as the positions of sources and receivers, greatly influence how this sounds. When we use electronic reverb, typically we pick from a set of canned presets and then fiddle with some parameters to get a pleasing sound quality.

With a convolutional reverb (e.g. Reverberate 2), we can go beyond canned presets and use our own WAV files containing the impulse response (IR) of real (or unreal) rooms, which are capable of very accurately simulating the sound of any source transmitted through that same path. We can buy IR's, scrounge the internet for them, or... make them ourselves. This tutorial is focused on how to make our own IR's by capturing the sound of a real room.

What is an IR?

What is an impulse response (IR)? On one level, it's nothing more than a recording of the sound of an ideal impulse emitted at some location, and then received at another location. An ideal impulse is basically a loud click, surrounded by silence. More precisely, in the context of digital audio, the click is digital 0, and then a a single sample at full scale, and then back to 0. When you generate a loud click someplace in a room, and then listen from somewhere else in the room, you don't just hear the click, you also hear reflections of the click, and reflections of the reflections, and so on. The combination of all that stuff is the IR, and it typically has a sound quality that is very different from the dry click that we started with.

room IR -- the above ideal impulse after travelling from speakers to mics in the room

What's it good for?

Once we have the IR of a room & source/receiver pair, we can simulate the sound of any source playing in that environment.

For example, we might have a dry, close-mic'ed snare sound and we want some realistic room ambience to make it more lively. Typically when recording a drum kit in a studio, we'd also set up room mics, far away from the kit, for this purpose. But maybe we didn't do that, or maybe there's way too much hi hat in the room mic, or bleed from guitar amps, or it just doesn't sound great. A captured IR can help work around all those problems.

IR's are useful for more than just room reverb. For example people also talk about "Cabinet IR's" in the context of guitar amp simulation. The principle is the same -- you feed a signal through a speaker cabinet that has a certain tone to it, and measure the results so you can get the same tone later using simulation.

Signal theory tells us that an IR can faithfully represent any linear, time-invariant (LTI) filter. Read more theory at Stanford or Wikipedia.

"Room sound", i.e. the mixture of reflections and absorptions when a particular sound interacts with a room, qualifies as LTI -- to the extent that the air isn't moving too much, and the walls aren't moving, and there aren't snare drum rattles and other audio distortions.

It's important to understand that a particular room actually has an infinite number of IR's associated with it, because the position of the sound source and the sound receivers (mics or ears) greatly influence the response. (For typical mixing purposes I just want something that sounds good, but I'm curious about whether capturing multiple IR's with different source positions would make it possible to closely simulate the sound of (say) a band in a room, where each instrument is received slightly differently.)

Also worth noting, many common electronic audio effects are LTI, and many are not; for example:

EQ - yes
Delay, possibly with filter - yes
(static) Reverb - yes
(static) Phaser - yes

Non LTI effects include:

Flange - no, like Phaser but time-varying
Chorus - no, like Delay but time-varying
Distortion - no, non-linear
Wah wah - no, it's a time-varying EQ
Compression/Limiting - no, both non-linear and time-varying

Directly recording an IR

One obvious way to record an IR is to set up a monitor speaker and some mics, play a loud ideal click through the speaker, and record the sound from the mics. Now, if we take that recorded impulse response and load it into a convolutional reverb, and then process some audio through that reverb, the result is a simulation of what it would sound like if you played that audio through the monitor speaker, recorded by the mics, in the positions you put them in when you captured the IR.

I actually did this procedure in my apartment. I was looking for a pleasant but realistic reverb sound to put on a snare drum. (Actually the first thing I did was to re-mic the snare drum by playing the isolated snare track through monitors, and recording mics down the hallway. This was tedious though, and noisy, and I felt I needed to move my monitors to get a better sound, which led me down the path of capturing an IR, so I can get room sound whenever I want, without having to turn my apartment upside down.)

The first thing that became very obvious is that my apartment is surrounded by annoying noises -- sirens, air conditioners, people yelling, construction, car alarms, the elevator, people in the stairwell, etc. Those noises show up in the IR and degrade the results. Even with the speaker playing single-sample clicks turned up super loud, the reverb tail goes on for up to a second, and becomes very quiet and delicate as it trails off.

I reasoned that I could reduce the noise by recording many identical clicks, aligning them perfectly using software, and then averaging them together. I gave this a try, averaging 32 (or was it 128?) two-second click recordings. This bought me ~15 dB (or 21 db?) of noise reduction, but it was a pain in the neck -- I had to play and record clicks for several minutes, and then manually reject ones with really objectionable background noises, and write software to align and average the results. The results are usable, though maybe still not pristine.

One fundamental obstacle to getting good signal/noise is that directly recording a click requires all the sonic input energy to occur over a single sample period (i.e. 1/sample_rate), and given real speakers, that's just not very much energy compared to the other stuff in the environment.

I did some research and uncovered a better approach, using a crafted known signal as the sound source, along with some post-processing to compute the IR. The math/theory here is simple:

recording = known_signal * room_IR room_IR = recording * known_signal_inverse where * is convolution, and known_signal_inverse is the "convolutional inverse" of the known_signal.

Looking up convolutional inverse sent me through Deconvolution which is a huge rathole for general signals. So don't read that. Instead...

Exponential Sine Sweep

Fortunately, in 2000, Angelo Farina solved this problem for capturing room IR, using Exponential Sine Sweep (ESS) for known_signal. The convolutional inverse of an ESS is just the reversed ESS with a volume ramp, and has a steady volume over a longer period of time, which makes it friendlier to speakers and more resistant to background noise.

Aside from better signal/noise, ESS has some other nice qualities for capturing IR's, for example it time-separates the harmonic distortion of the system from the linear response. For the purpose of capturing room IR's we just delete the non-linear stuff but if you wanted to characterize it, you could do it with ESS. See the references for details.

So the new procedure is: set up speaker & mics, play ESS through the speaker, record the results, and convolve the recording with the inverse ESS to get the IR.

We can buy commercial software to do this, but I found it much more fun and rewarding to write my own script.

Examples

Ideal impulse. It doesn't sound like much, just a click.

Impulse in room. Same click, but after going through the monitors, room and mics, plus background noise and somebody yelling.

Averaged IR. This is a pretty clean averaged IR taken in the bedroom and foyer of my apartment by combining recordings of many clicks.

IR via ESS. Much cleaner captured IR, using a 4-second Exponential Sine Sweep. Speakers and mics are the same, but positioning is different.

A raw recording of an ESS as captured by room mics. This gets convolved with the sweep inverse to get the IR.

Isolated close mic'd snare drum track. First loop is dry, second with the noisy single-impulse IR, third with the averaged IR, and fourth with the ESS-sampled IR.

Pre-generated sweeps

Use these sweeps if you don't want to generate new ones. I decided 4 second sweeps worked well for me but longer sweeps would presumably have a lower noise floor; feel free to experiment.

ESS at 44k, 4 seconds long, from 40Hz to 20kHz.

Inverse ESS of the above, for postprocessing.

ESS at 48k, other params same as above.

Inverse ESS at 48k.

ESS at 96k, other params same as above.

Inverse ESS at 96k.

The Procedure

1. Get an ESS .wav file. You can use a pregenerated one from above, or generate your own using my script (see section below).

2. Set up your speaker and mics. I used Presonus Eris 4.5 mini monitor speakers and a pair of Shure SM-81 small diaphram condenser mics. The more accurate your speaker and mics, the more accurate the results, since the results will include the impulse response of the speaker and mics you use, in addition to the room itself. But please feel free to experiment! To find good locations, you might want to get some nice long cables and play a repeated click or snare drum or some other signal through the speaker while you move it around. Then walk around and listen for a pleasing reverb tail, and put the mics there.

3. Play the ESS through the speakers while recording the mics. The length of the recording should include the whole ESS, plus enough additional time for the full reverb tail of the room. In my example the ESS was four seconds long and the room reverb died out within two seconds of reverb, so I needed at least six seconds of recording. I did 8 seconds to be safe.

4. Run the recorded .wav files through my script, to produce an IR .wav file (see script info below).

5. Clean up the IR using an audio editor. Delete everything before the onset of the click; ideally this should be silence although in practice, via the magic of the ESS math, it captures nonlinearities in the electronics/speaker/room/mic chain, which might be interesting but not relevant for reverb, plus background noise, which we don't want. This is optional, but I included a few milliseconds of silence before the onset of signal to simulate the sound propagation delay between the speaker and mics. One foot per millisecond, e.g. if the mics are about 12 feet from the speaker, include 12 milliseconds before the first click audio appears. (This could be done more precisely by recording the ESS on another track in parallel with the room mics, etc, but I just guesstimate.)

6. Load the IR into your convolutional reverb and see how it sounds!

Script

I wrote some Python code to generate sweeps and convolve recorded sweeps into IR's. It should work with stock Python, with numpy installed. download script expsine.py

Usage example:

# python -i expsine.py >>> write_ess("ess_44k.wav", "inv_ess_44k.wav", 4.0, 40, 20000, 44100) (generates and writes .wav files for an exponential sine sweep and its convolutional inverse, 4 seconds long, starting at 40Hz and ending at 20kHz, at a sample rate of 44100 samples/sec) >>> compute_ir("recorded_sweep_in_room.wav", "inv_ess_44k.wav", "computed_room_ir.wav") (convolves the recorded sweep with the convolutional inverse in inv_ess_44k.wav and writes the result to computed_room_ir.wav)

References

Original paper here:
A. Farina, "Simultaneous measurement of impulse response and distortion with a swept sine technique," presented at the 108th AES Convention, Paris, France, February 2000. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.1614&rep=rep1&type=pdf

Good theory & advanced stuff here:
"Advancements in impulse response measurements by sine sweeps", Angelo Farina, AES 2007 http://pcfarina.eng.unipr.it/Public/Papers/226-AES122.pdf

Good practical overview here:
"SURROUND SOUND IMPULSE RESPONSE: Measurement with the Exponential Sine Sweep; Application in Convolution Reverb", Madeline Carson, Hudson Giesbrecht, Tim Perry. http://web.uvic.ca/~timperry/ELEC499SurroundSoundImpulseResponse/Elec499-SurroundSoundIR-PreREVA.pdf

More here:
"Swept Sine Chirps for Measuring Impulse Response", Ian H. Chan. http://www.thinksrs.com/downloads/PDFs/ApplicationNotes/SR1_SweptSine.pdf