Putting a test to rest
Signal strength is not an adequate performance metric for digital-radio systems.
July 1, 2009
Cell phones have incorporated digital-voice technology since at least 1992, but land mobile radio (LMR) systems have adopted digital voice only recently in the form of the APCO Project 25 (P25) standards. The advent of digital-voice radio raises questions about how network performance should be measured during the system-acceptance test (SAT). The purpose of this article is to address some of these questions and suggest an approach to SAT for digital-voice radios.
P25 vocoder review
The vocoder (voice encoder/decoder) adopted by the P25 committee is the Improved Multi-Band Excitation (IMBE) vocoder, developed by Digital Voice Systems. The IMBE vocoder operates at a basic rate of 4.4 Kb/s, but an additional 2.8 Kb/s is required for error control coding. Thus, the bit rate with error-control coding is 7.2 Kb/s. An additional 2.4 Kb/s is used for channel signaling, making the gross bit rate 9.6 Kb/s.
Rather than think in terms of bit rate, we also can think in terms of bits per vocoder frame. Each frame is 20 milliseconds long (50 frames per second) and contains 88 vocoder bits, 56 error-control-coding bits, and 48 signaling bits, for a total of 192 bits.
A full explanation of the IMBE vocoder is beyond the scope of this article, but suffice it to say that unlike most cell-phone vocoders, IMBE was optimized for low-bit-rate applications (< 4.8 Kb/s). However, like most cell-phone vocoders, IMBE is model-based, meaning that rather than sending digital samples of voice, the vocoder sends the parameters to be used in a model of human speech. Fewer bits are needed to specify the model parameters and both the transmitter and the receiver include a full description of the model before they leave the factory. The IMBE vocoder and its successor, Advanced Multi-Band Excitation (AMBE), also are used in many satellite voice networks, including Iridium and Inmarsat.
Which metric should be measured?
Traditionally, the LMR acceptance test involved a drive-test survey where some version of received signal strength (RSS) is measured, usually a linear average over a distance of at least 40 wavelengths. If the contract requirement is 95% service area reliability, the system is accepted if the ratio of measured samples above the service threshold (e.g., -106 dBm) to the total number of samples is greater than 0.95. When all land mobile radios were analog FM and all reputable manufacturers produced receivers with essentially the same delivered audio quality, or DAQ, for the same signal strength, it was not necessary to measure DAQ directly. With digital radio, the playing field is now uneven. Some manufacturers do a better job of digital receiver design than others, especially with respect to mitigation of multipath-delay spread. The bottom line: a high signal strength does not guarantee a high DAQ.
If we can’t trust signal strength, what should we measure? The P25 standards specify performance limits for bit error rate (BER) which is a step in the right direction, but BER has less meaning on a burst-error channel than on a static channel. Vocoder decisions are made on a frame-by-frame basis and errors that are clustered together may affect just one frame, while the same number of errors spread over multiple frames may cause more harm. Or, depending on the error-correction capability of the receiver, isolated bit errors may be fully correctable and burst errors may exceed the error-correcting capability of the code. Regardless, frame error rate (FER) is a metric that is closer to the user experience and is preferred by some over BER. However, vocoders employ frame repeating, muting, and adaptive smoothing, so not all error patterns with the same FER result in the same DAQ. The most reliable way to ensure acceptable DAQ is to measure it directly.
Figure 1 shows graphically the progression of possible metrics for measuring digital-voice quality.
Direct measurements of voice quality
The drive-test survey should (if the budget allows) collect recordings of actual voice calls. These calls should exercise all the syllables used in typical calls and should include calls from both male and female speakers. The key to making the acceptance test cost effective is to automate as many elements of the test as possible. Audio recordings of appropriate phrases (e.g., Harvard sentences) are available in standard wave-file formats and with a modest effort, most systems can be configured to key the transmitter and automatically play the files over the air. Similarly, user radios usually can be configured to produce an audio output that can be stored on a laptop computer using an integrated sound card. Most recording software gives the user many choices of file format. It is important that the audio sample not be further compressed, so 64 Kb/s pulse-code modulation, or PCM, is usually the best choice provided the audio is bandlimited to 300 Hz to 3,400 Hz, which is the usual case.
In a similar fashion to RSS testing, samples must be collected from a uniform grid to preclude spatial bias and a statistically significant number of samples must be collected. For a confidence interval of +/- 2%, a confidence level of 95%, and an estimated service-area reliability of 95%, the minimum number of samples is 456 (See the Telecommunications Industry Association’s TSB-88-C for the equations).
The most labor-intensive part of the SAT is scoring the audio files, but even here automation is available for a price. The first step to scoring is to define the scoring system. In the LMR industry, DAQ is the usual metric. TSB-88-C defines DAQ according to Table 1. The DAQ definitions have come under criticism because they are open to considerable interpretation. For example, a DAQ of 3.4 is often specified for public-safety networks, but it is not clear what is meant by “… repetition only rarely required.” Does “rarely” mean 1% of calls can be repeated? 5%? 10%?
The cellular-phone industry prefers the Mean Opinion Score (MOS), which is defined by the right half of Table 1. MOS scores are compiled and averaged from a statistically significant number of trained listeners. There is considerable variance among listeners, and both rigorous training and a large number of listeners are required to get statistically meaningful results. MOS tests are not normally conducted for system-acceptance testing. Instead, MOS tests are used to compare competing vocoders. Properly conducted MOS scoring with human listeners is expensive: one easily can spend more than $100,000 per test.
For SAT with DAQ scoring, one sensible approach is to use a panel of trained listeners and record each audio sample as a pass or fail according to a majority rule. For example, if the service threshold is DAQ 3.4 and the five-person panel scores a particular sample 2, 3, 3.4, 3.4 and 4, the majority of the scores are 3.4 or better and the sample passes.
Automated speech recognition
Human listeners introduce complexity, uncertainty and cost, so they should be avoided if possible. Fortunately, speech-recognition technology has progressed to the point where machines can score audio files reliably. Perceptual evaluation of speech quality (PESQ) is a family of test methods for automatic scoring of speech quality over telephone channels. It is standardized as ITU-T recommendation P.862. PESQ is a worldwide standard for objective voice-quality testing and it is used by phone manufacturers, network equipment vendors and telephone operators. PESQ scores do not match DAQ precisely, so a mapping of PESQ to DAQ must be agreed upon at the time the system acceptance plan is adopted.
Background audio noise
Many firefighters complain of poor digital-voice quality on P25-compatible radios when the speaker is operating in a noisy environment. One possible explanation for this problem is that the IMBE algorithm cannot reproduce human speech well in the presence of loud noise, at least not as well as an analog FM radio. Changing the vocoder algorithm to accommodate noisy environments is a daunting task and may not be feasible. Recent efforts to solve this problem have focused instead on directional microphones, noise cancellation headsets, and operator training.
Jay Jacobsmeyer is president of Pericle Communications Co., a consulting engineering firm located in Colorado Springs, Colo. He holds bachelor’s and master’s degrees in electrical engineering from Virginia Tech and Cornell University, respectively, and has more than 25 years experience as a radio-frequency engineer.