FirstNet Authority CEO Wassel unveils task-force recommendations from AT&T outage in February
An AT&T outage on Feb. 22 resulting in FirstNet users being without service for more than two hours meant the network was not “up to public safety’s standards that day,” but implementing measures designed to prevent a repeat “will ultimately make us stronger,” according to FirstNet Authority Executive Director and CEO Joe Wassel.
Wassel made the statements last week in a blog post that unveiled the five key recommendations made by the FirstNet Authority’s After-Action Task Force that was established in the aftermath of the Feb. 22 outage.
“While this network outage has tested us, it will ultimately make us stronger,” Wassel’s blog states. “With the Task Force’s work complete, the FirstNet Authority has been actively working with [FirstNet Authority contractor] AT&T to execute improvements. In many cases, the FirstNet Authority and AT&T have already implemented measures to support these recommended outcomes.
Network outages can happen. By collaborating with AT&T and acting on the Task Force’s findings, we will be able to know and understand the impact of network disruptions sooner. These efforts will help us better address network disruptions and surge to share information with public safety.”
In his blog, Wassel states that he directed the After-Action Task Force to review the Feb. 22 incident—when users were unable to connect to the public-safety network from 3:45 a.m. to 6:00 a.m. Eastern time—“from every angle, and they met the challenge.” The task force considered network policies and procedures, information sharing with public safety, and post-incident communications that should be followed to help prevent such an outage and improve the response when outages occur.
Wassel’s blog outlined the task force’s “recommended outcomes” in five areas:
- “Maintenance and Notification – This outcome focuses on coordination between the FirstNet Authority and AT&T on maintenance and network update protocols that impact, touch, or leverage systems that support the FirstNet network. This would include routine maintenance training and/or exercise opportunities, as well as improvements to network outage alerts and notifications to FirstNet subscribers.
- “Public-Private Partnership Communications – This outcome aims to foster more comprehensive information sharing, coordination, and communications between AT&T and the FirstNet Authority for network impacting events.
- “Operational Response Planning – This outcome looks to facilitate more complete All Hazards Emergency Operations planning between the FirstNet Authority and AT&T, so both entities can better prepare for, respond to, and communicate effectively during planned and no-notice network impacting events.
- “Stakeholder Communications – This outcome aims to share verified information on network impacting events as quickly as possible with FirstNet users and public safety stakeholders.
- “Continuity of Operations (COOP)/Continuity of Government Plan – This outcome ensures the FirstNet Authority’s continuity of operations plan, or COOP, is executed in coordination with AT&T to ensure the FirstNet Authority provides public safety stakeholders with accurate, timely situational information about FirstNet’s operating status.”
Wassel blog provided no new information from the After-Action Task Force regarding the cause of the Feb. 22 outage.
“The team reviewed AT&T’s after-action assessment of the root-cause analysis of the outage, as well as third-party crowdsourced data to further understand the impacts and extent of the outage,” the blog states.
“The team confirmed AT&T’s initial finding that the outage resulted from an incorrect process used during a network expansion activity. This process inadvertently triggered defense mechanisms, which ultimately prevented users from connecting to the FirstNet network between 3:45 a.m. EST and 6:00 a.m. EST, at which point public safety’s network was restored.”
In July, the FCC released its report about AT&T outage on Feb. 22, which included outages both for FirstNet users and AT&T’s commercial customers. That report noted that AT&T commercial subscribers faced a much lengthier outage than FirstNet users, with some not getting their service restored until the middle of the day on Feb. 22.
The FCC report states that the Feb. 22 outages were caused by a misconfigured “network element” that was placed on the AT&T Mobility network without being properly reviewed. This misconfiguration caused the AT&T network to enter a “protect mode” that is designed to limit the impact on other services by disconnecting all devices from the network. Without a way to connect to the network, AT&T and FirstNet users were unable to access voice and 5G data services during the outage.
AT&T was able to “roll back” the network within a couple of hours, but every device connected to the AT&T wireless system—more than 100 million devices—needed to be registered on the network again before users could access services, according to the FCC report.
“Full service restoration … took at least 12 hours, because AT&T Mobility’s device-registration systems were overwhelmed with the high volume of requests for re-registration onto the network,” the FCC report states.
Restoring service to FirstNet users was prioritized, which has been attributed as the reason that public-safety users on the system were able to access their normal services less than three hours after the Feb. 22 outage began.
In his blog, Wassel cited AT&T’s priority to restore FirstNet service on Feb. 22 but acknowledged that the carrier and the FirstNet Authority must work to ensure that the FirstNet nationwide public-safety broadband network (NPSBN) does not have this kind of outage again.
“AT&T took immediate action to prioritize the restoration of public safety’s communications on FirstNet,” Wassel’s blog states. “While the prioritization of FirstNet was important, the network did not perform up to public safety’s standards that day. Both the First Responder Network Authority (FirstNet Authority) and AT&T have taken action to help prevent FirstNet from experiencing an outage like this in the future.
The thing that really needs to be done, is that each Firstnet User agency, needs to have a system that can be used If/When Firstnet fails, and knows how to put it into action quickly. Just notifying an agency that there is system impacting work going on may help a little, but knowing what to do when you got a whole lot of nothing is key!!