FCC report outlines cause of AT&T’s February outage, impact on FirstNet
AT&T’s nationwide outage on Feb. 22 was caused by a misconfigured “network element” placed on the AT&T Mobility network without being properly reviewed, resulting in hours-long service disruption for both FirstNet users and AT&T commercial customers, according to FCC report about the outage.
“When you sign up for wireless service, you expect it will be available when you need it—especially for emergencies,” FCC Chairwoman Jessica Rosenworcel (pictured above) said in a prepared statement. “This ‘sunny day’ outage prevented consumers across the country from communicating—including by blocking 911 calls—and stopped public safety personnel from using FirstNet. We take this incident seriously and are working to provide accountability for this lapse in service and prevent similar outages in the future.”
Released yesterday, the report reveals findings from an investigation conducted by the members of the FCC’s Public Safety and Homeland Security Bureau (PSHSB) that began “as soon as the outage occurred,” according to an FCC press release.
AT&T is in the eighth year of a 25-year contract with the FirstNet Authority to build and maintain the nationwide public-safety broadband network (NPSBN) known as FirstNet. On Feb. 22, FirstNet service was disrupted for less than three hours—significantly less than the 12 hours it took for AT&T to restore service to all of its commercial customers, according to the FCC report.
“All 4G voice and 5G voice and data infrastructure were impacted between 2:45 AM and 5:00 AM, causing loss of service for FirstNet subscribers,” the FCC report states. “AT&T Mobility prioritized the restoration of FirstNet before other services. FirstNet device registrations approached normal shortly after the restoration of the FirstNet dedicated network elements that are connected to AT&T Mobility’s network.”
Overall, the outage halted all voice and 5G services for AT&T customers in all 50 states, as well as Washington, D.C., Puerto Rico and the U.S. Virgin Islands, the FCC press release states. The outage affected more than 125 million devices, blocked more than 92 million voice calls, and prevented more than 25,000 calls from reaching 911 centers, according to the FCC report.
While AT&T prioritized restoration of services to FirstNet users, the carrier did not notify FirstNet subscribers about the outage until almost one hour after their service was restored, according to the FCC.
This massive outage began at 2:45 Central time on Feb. 22, three minutes after the implementation of a new network element into the AT&T Mobility system “during a routine night maintenance window,” the FCC report states.
“Once implemented, this configuration error caused the AT&T Mobility network to enter ‘protect mode’ to prevent impact to other services, disconnecting all devices from the network, and prompting a loss of voice and 5G data service for all wireless users,” according to the FCC report.
“It took close to two hours to roll back the network change. Full service restoration, however, took at least 12 hours because AT&T Mobility’s device registration systems were overwhelmed with the high volume of requests for re-registration onto the network.”
If AT&T had followed its own internal protocols, the network element at the root of the outage almost certainly would not have been implemented, according to the FCC report.
“The network element was misconfigured,” the FCC report states. “The configuration of the network element did not conform to AT&T’s established network element design and installment procedures, which require peer review. As a result, the misconfiguration of the network element was not detected before the network element was introduced into AT&T Mobility’s network.
“As a result of the error in configuration, downstream network elements propagated the error further into the network. This triggered an automated response that shut down all network connections to prevent the traffic from propagating further into the network.”
The fact that such a misconfigured network element could be implemented into the AT&T Mobility network without proper review was a point of concern for the FCC.
“This outage illustrates the need for mobile wireless carriers to adhere to best practices, implement adequate controls in their networks to mitigate risks, and be capable of responding quickly to restore service when an outage occurs,” the FCC report states in its concluding paragraph.
“Sound network-management practices of critical infrastructure and AT&T Mobility’s own processes demand that only approved network changes that are developed pursuant to internal procedures and industry best practices, should be loaded onto the production network. It should not be possible to load changes that fail to meet those criteria.”
In its report, the FCC noted that AT&T implemented “additional technical controls in its network” within 48 hours of the outage to reduce the chance that such an episode happens again.
“This included scanning the network for any network elements lacking the controls that would have prevented the outage, and promptly putting those controls in place,” the FCC report states. “AT&T has engaged in ongoing forensic work and implemented additional enhancements to promote network robustness and resilience.
“In addition, post-outage, AT&T has implemented additional steps for peer review and adopted procedures to ensure that maintenance work cannot take place without confirmation that required peer reviews have been completed.”
Public-safety subscribers access FirstNet via AT&T’s radio access network, but one of FirstNet’s key features is that it operates on physically separate network core that is dedicated to first-responder communications. As of the posting of this article, neither AT&T nor the FirstNet Authority had responded directly to inquiries from IWCE’s Urgent Communications about why a misconfigured network element placed on the AT&T Mobility network impacted FirstNet service on Feb. 22. If the inquiry is answered, this article will be updated.
However, AT&T indicated that FirstNet’s core helped the carrier restore its public-safety customers more quickly on Feb. 22.
“We have implemented changes to prevent what happened in February from occurring again,” according to the AT&T statement provided to IWCE’s Urgent Communications. “We fell short of the standards that we hold ourselves to, and we regret that we failed to meet the expectations of our customers and the public-safety community.
“FirstNet was restored in advance of the AT&T commercial network, due to the unique features of the dedicated FirstNet network core and our dedication to serving our nation’s first responders. We continue to work closely with the First Responder Network Authority to provide public safety with the dedicated connectivity they require to communicate with one another.”
The FirstNet Authority issued the following statement about the FCC report:
“The FirstNet Authority is reviewing the FCC’s report on the nationwide AT&T outage on February 22, 2024. We thank the FCC for their thorough review and are focused on continuing to improve the FirstNet network for public safety.”