The Measurement Nobody Orders Until It's Too Late

There is a pattern that repeats in speech-critical audio installations with enough regularity that it has stopped being surprising. A client commissions a system, specifies coverage area and SPL targets, approves a frequency response trace that looks flat and even, and signs off on the design. The system is installed. Commissioning measurements confirm the speakers are covering the room. Then someone stands at the back of a house of worship, a conference hall, or a transport terminal and listens — and they cannot make out what is being said. The complaint arrives: the system is not working. At that point, the engineers are called back in. The electronics are blamed. New DSP is purchased. Equalisation curves are redrawn. Months pass. The problem does not resolve, because the problem was never in the electronics. It was in the acoustic design of the room and the propagation strategy of the loudspeaker system — decisions that were locked in before a single cable was pulled.

The metric that would have predicted this outcome before the first speaker was flown is Speech Transmission Index, almost always abbreviated to STI. It is not a widely specified metric in commercial installations. Most tender documents do not mention it. Most clients have not heard of it. The engineers who design systems around SPL uniformity alone are not being negligent — they are responding to what clients ask for, which is coverage. But coverage and intelligibility are not the same thing, and the difference between them is the subject of this article.

STI was developed in the 1970s by Houtgast and Steeneken at the Dutch TNO Institute. It does not measure amplitude. It does not measure frequency response. What it measures is the degree to which a transmission channel — the combination of the room acoustics and the audio system — preserves amplitude modulation in a speech signal. The underlying tool is the Modulation Transfer Function. Speech intelligibility depends fundamentally on the listener's ability to resolve the rapid amplitude variations in the speech envelope: the transitions between phonemes, the gaps between syllables, the stops and fricatives that carry consonant information. If the room or the system smears those modulation troughs — fills them in, reduces their depth relative to the peaks — intelligibility falls. STI quantifies exactly how much modulation is preserved across seven octave bands from 125 Hz to 8 kHz, weighted by their contribution to intelligibility, and collapses that information into a single number between 0 and 1. The IEC 60268-16 standard maps this number to five intelligibility grades: Bad (0 to 0.30), Poor (0.30 to 0.45), Fair (0.45 to 0.60), Good (0.60 to 0.75), and Excellent (0.75 to 1.0). For speech to be reliably understood by normal-hearing listeners in an undemanding listening task, most engineers target a minimum STI of 0.50 to 0.55 across the coverage area. For critical applications — courtrooms, broadcast commentary positions, emergency announcement systems — 0.60 or above is a more appropriate floor.

The primary acoustic mechanism that destroys STI in buildings is the reverberant field. When a loudspeaker radiates energy into a room, that energy does not disappear after it reaches a listener. It reflects off every hard surface — concrete floors, glass facades, plaster ceilings, tiled walls — and continues to propagate as late-arriving reflections that accumulate into a diffuse reverberant field. These late reflections arrive at the listener's ear after the direct sound but before the acoustic energy from that moment in the speech has had time to decay. They fill in the amplitude troughs between modulated speech peaks, reducing the modulation depth that the listener's auditory system depends on to resolve phoneme boundaries. The physics here is not subtle: reverberant energy and intelligibility are in direct opposition. The relationship is often expressed informally as the Direct-to-Reverberant Ratio — the higher the direct sound energy relative to the reverberant field energy at the listener's position, the higher the preserved modulation depth, and therefore the higher the STI. Room reverberation time, RT60, is the most familiar proxy for how much reverberant energy will build up in a space. A room with an RT60 of three seconds at mid frequencies will devastate speech intelligibility at any listener position where the reverberant field is competitive with the direct sound. This is not a DSP problem. It is not a loudspeaker problem in isolation. It is a room physics problem.

This brings us to the most dangerous shorthand in audio system design: the word coverage. When a designer says a system covers a room, they typically mean that the SPL across the listening plane is reasonably uniform, within a specified variance — often plus or minus three decibels. This is a meaningful engineering target, but it is not a sufficient one. It is entirely possible to achieve excellent SPL uniformity across a highly reverberant room while delivering STI values below 0.40 at every measurement point. Every listener in the room receives the same inadequate intelligibility with admirable consistency. Uniform bad is still bad. The root cause is that SPL measurement integrates energy over a time window that includes both direct and reverberant contributions. A room that is drenched in reflected energy can show a flat SPL map while the Direct-to-Reverberant Ratio at each measurement point is catastrophically low. SPL coverage proves the speakers are working. It does not prove the system is communicating.

Directivity is the primary engineering tool for managing the Direct-to-Reverberant Ratio, and it deserves more analytical attention than it typically receives in the procurement and specification phase. A loudspeaker with high directivity concentrates acoustic energy within a controlled angle, directing it toward the listening plane rather than scattering it toward reflective room boundaries. This improves the Direct-to-Reverberant Ratio at the listener position not by increasing the direct sound level alone, but by reducing the proportion of total radiated energy that enters the reverberant field in the first place. The directivity index of a transducer is frequency-dependent: any transducer behaves progressively more omnidirectionally as frequency decreases, because directivity control requires the physical aperture of the device to be a meaningful fraction of the wavelength being controlled. A line array achieves useful vertical directivity control at mid and upper-mid frequencies by exploiting the coherent summation of vertically spaced transducers. At low frequencies, where the wavelength exceeds the physical array length, it loses that vertical control and radiates energy into the ceiling and floor as freely as any point source would. This is not a failure mode of line arrays — it is a physics constraint that every engineer working with them must account for. A horizontal polar plot that appears controlled tells you nothing about vertical behaviour and nothing about low-frequency behaviour. Both matter enormously in real rooms.

Delay fills introduce another failure mode specifically related to modulation and therefore to STI. In rooms where a main system cannot achieve adequate Direct-to-Reverberant Ratio at distant seating, delay fills positioned closer to those listeners appear to solve the problem. And they can, but only if the delay alignment is treated with precision. The psychoacoustic mechanism governing this is the Haas effect, or the precedence effect: the auditory system assigns source location and fuses the impression of a single sound source based on the first-arriving wavefront, provided the delayed signal arrives within approximately 35 milliseconds. Beyond that window, the delayed signal begins to be perceived as a distinct echo — itself a modulation-destroying event. If a delay fill speaker is set to deliver its signal such that it arrives within the Haas window but without precise alignment at the crossover boundary between main and fill zones, the two sources do not sum coherently. Instead of adding intelligibly, the fill speaker injects reverberant-like energy into the listening environment — energy that arrives after the direct wavefront but within a time window that prevents it from being heard as a separate event, which means it specifically fills in the modulation troughs that STI depends on. Correct practice is to align the delay fill to the direct sound arriving from the nearest main speaker at the crossover position, targeting an arrival difference of no more than approximately one millisecond. Getting that wrong by 10 milliseconds or more turns the delay fill into a smearing device, and every listener in the fill zone will experience reduced STI compared to what the main system alone would have provided — a result worse than the absence of the fill entirely.

Simulation tools — EASE, EASE Focus, and their equivalents — are capable of predicting STI given a calibrated acoustic model of the room. This is a significant capability, and it should be used. But the accuracy of any STI prediction is bounded by the accuracy of the acoustic model, and the most common modelling error in practice is underestimating the absorption in the occupied versus unoccupied condition. An empty room in concrete, glass, and hardwood has a dramatically higher RT60 than the same room filled with seated occupants whose bodies and clothing introduce substantial absorption at mid and high frequencies. If the simulation model reflects the empty room condition, the predicted STI for an occupied service may be significantly more optimistic than what will actually be measured. A well-executed design uses simulation to establish the directivity strategy, verify the propagation geometry, and produce a first-order STI estimate — then builds in provision for in-service measurement and verification with STIPA or a full STI measurement chain, under occupied or near-occupied conditions, before the installation is accepted as complete.

S Sounds begins every speech-critical installation with an acoustic assessment and an STI target agreed in the specification. The number of speakers, the amplifier power, and the processing chain are consequences of that starting point — not the starting point themselves. Specifying a 0.60 STI minimum across the seating plane forces every subsequent design decision to be evaluated against a physically meaningful criterion, rather than against a coverage map that proves only that sound is arriving. The acoustic assessment measures RT60 across octave bands, characterises the room boundaries, identifies the occupied absorption estimate, and establishes the design constraints within which the loudspeaker system must operate. If those constraints make a particular STI target physically unachievable without acoustic treatment, that finding arrives at the design stage — when the solution is a budget conversation — rather than at the commissioning stage, when the solution is an expensive problem. S Sounds operates at the intersection of acoustic measurement, system design, and loudspeaker engineering. Our approach begins with the room, not the rack.

#STI#speech intelligibility#room acoustics#reverberation#RT60#directivity#Haas effect#system design#modulation transfer function

The Measurement Nobody Orders Until It's Too Late

More post stories