Failure-proofing NG-911 networks
Failure-proofing NG-911 networks
Emergency-response operations deal with life-and-death situations on a regular basis. Callers must be able to reach 911 telecommunicators every hour of every day. Service cannot be interrupted—not even briefly—and communications must be clear. Anything less than 100% reliability is unacceptable.
Accomplishing this level of reliability as public-safety answering points (PSAPs) move to VoIP, add support for video, and integrate voice and data networks requires a comprehensive look at the emergency operators’ network architecture. This article discusses some of the issues faced by the transition to IP within the wide-area network (WAN) and proposes a network architecture that will provide the required reliability.
Operations are vulnerable to failed and low-quality links—The FCC’s Network Reliability and Interoperability Council (NRIC) requires that "the Emergency Services Network should be designed such that no single failure or interruptive incident (i.e., a cable cut) will create a system outage."
To accomplish this goal and ensure that a single network-link failure will not create a network outage, network architects traditionally have relied on redundant links and automatic failover. In this type of architecture, a failover mechanism typically senses the malfunction and activates a route change, so that traffic begins to use the redundant link. But often that self-healing process takes some time to implement; routing convergence time can take between 6 seconds to a minute. During that time period, existing calls will go silent, some call sessions may disconnect and new calls cannot be initiated.
Even more concerning is that the effect of a hard failure may not be limited to the calls themselves. In many 911 operations, the network also supports a control channel carrying CallManager-to-CallManager traffic. In the best case, there would be dead air. In the worst case, calls would drop, the phones would have to re-register and the CallManagers would get out of sync, which could take about two minutes to recover.
A less-visible, soft failure can be even more disruptive over a longer period of time. A link may have quality issues—jitter, delay or packet loss—that limit its ability to transport voice data reliably. But many failover solutions available today only recognize that the link is "up," not that it is degraded. Often, the only way a network administrator would become aware of a circuit brownout is through a network-monitoring tool. In those circumstances, manual intervention may be required to route voice traffic to a better-performing backup link. During that intervening time period, voice quality will be compromised, and manual intervention then would be required to return to the primary state when the brownout condition is resolved.
The data network is as important as VoIP—While call quality and high availability are necessary to ensure that the voice network is operating, the network connections to systems—for example, automatic location information and radio dispatch—are just as critical to PSAP operations. When an analog voice system was used, this data network was a distinctly different network. But as call centers move to VoIP technology, network architects have an option to combine both types of data on a single network or keep them distinct.
At first glance, it would appear that separating the two networks prevents a single failure from affecting both the voice and data networks. But in reality, accomplishing fully redundant distinct networks would require four different network access links—one primary and one backup for voice and one primary and one backup for data. But, in many locations, there aren’t that many truly diverse service providers. Building distinct data and voice networks that are dependent on the same ISP for network access means that the two systems aren’t actually distinct; the infrastructure and network costs are doubled, but both networks still are vulnerable to a single network-link outage.
A better architecture is a combined data and voice network. This minimizes complexity, cost and management overhead, although it brings with it a unique set of challenges, primarily in the area of traffic prioritization and bandwidth reservation. In a shared network, it is critical to ensure that there is available bandwidth for each type of traffic and that different traffic types can be identified and given different levels of prioritization.
Mitigating the risks associated with VoIP and converged networks—Fortunately, there are solutions available in the market today with features that resolve these issues, including the ability to mitigate hard and soft failures before they affect the service. Network architects should expect their networks to have the following capabilities:
Sub-second failover—The switch from failed link to operative backup should be performed so quickly that no calls are dropped and the failure is unnoticeable to callers.
Quality monitoring—Every link should be monitored continuously for quality, and technology should be implemented that routes traffic from underperforming links before call quality is affected and communication becomes compromised.
Packet duplication—VoIP calls and other types of data can be selectively duplicated and sent over two diverse paths simultaneously, ensuring that no single link failure or poor circuit quality will impact a call.
Network diversity—At least two completely diverse links are required to have any real redundancy; ideally, both a wired and a wireless network should be supported.
Sufficient bandwidth—Available bandwidth must be adequate to handle widespread emergencies that can result in very large numbers of simultaneous calls.
Intra-session load balancing—A single session should be able to simultaneously use more than one link to speed up large data transfers, such as database backups, without impacting voice and critical system traffic.
Northampton County, PA was
Northampton County, PA was just down recently due to a fiber cable cut that created a large outage.