Automated-self-healing IT networks seem right around the corner
Yesterday I had a nice chat with Abbas Haider Ali, chief technical officer of xMatters. The 13-year-old company provides alerts to IT managers, no matter where they are, whenever their networks go on the fritz. The system automatically figures out who should get the alert based on the anomaly and also determines what form the alert should take — e-mail, SMS text or voice message — based on the preferred device of the recipient, which can be different depending on the hour. It all happens in a matter of seconds, based largely on predetermined settings.
Use cases exist in the enterprise, government and public-safety sectors, according to Ali, and they’re not limited just to IT outages.
“[Customers] said, ‘This is great if I have to wake up an engineer in the middle of the night to fix something that’s broken in IT, but we also have [other] situations where … we want the same type of reliable communications going out to all of our employees,” he said.
For example, after a major snowstorm, an organization could shoot out an alert to employees telling them to stay home because the roads are impassible or the commercial power is out; in the case of a public-safety answering point, personnel could be redeployed to a neighboring center to answer redirected calls.
There are many notification solutions on the market, but what intrigued me during my conversation with Ali is that when xMatters first started out on this path, its solution was deployed at the customer’s premise. Now it largely is provisioned as a hosted service.
“Today, the majority of our customers use our cloud service. We take the pain — and there is pain — out of dealing with the telco providers and SMS [aggregators],” Ali said. “We provide a system where they can plug in their other cloud services, or their data-center-deployed technologies, and integrate them into ours. We provide the intelligent routing and notification layer on top of their premises or cloud systems.”
The underlying reason for the shift to the hosted version of xMatters’ platform is that organizations have come a long way in a short amount of time in terms of trusting the cloud, Ali said. While the cost and operational advantages of cloud-based services are easily understood, a large amount of distrust traditionally has existed regarding such services.
“If you went back as recently as two years ago, I would say that at least half of our customers had concerns about security, privacy and reliability — they had an overall lack of comfort regarding cloud services for any of their core operations, and that was something that held them back,” Ali said. “So, they decided to ‘wait and see,’ and started by acquiring our software and deploying it on their side.”
“But we’ve seen that start to shift, and accelerate, quarter by quarter,” he continued. “They eventually started to say, ‘We like the value that the system provides, but we’re not experts on running xMatters, you guys are, and we’re going to hold you to task on providing that.’ At this point, people are much more willing to trust that they can run their stuff in a cloud provider. We’ve gone through enough security audits, third-party penetration tests, and the usual round of multi-acronym compliance tests, that people are becoming much more comfortable with being able to do that.”
The ability to deploy notifications whenever, wherever, is only the beginning stage, however; according to Ali, the end game is to leverage the xMatters platform to instruct networks to heal themselves when something goes awry.
“As far as the technology is becoming concerned, it actually is possible to have remotely initiated, ‘fix-yourself’ commands take place, and … they would result in the behavior that you want,” Ali said.
However, he conceded that adoption of such technology would require customers to take another leap of faith.
“What we’re seeing, even with our most bleeding-edge clients, is that they still want someone who’s a subject-matter expert in any given area to say, ‘Go ahead and do that,’ essentially authorize the machine to fix itself,” Ali said.
“But as a class of problems happens often enough, you can say that all the engineer did was push a button that said, ‘Yeah, go ahead and do that,’” he continued. “So, you’ll put that into the fully automated category, and all of a sudden that person is out of the loop for that class of problems. But I think it’s probably two years away before a significant percentage of these activities can be automated,” Ali said.
While two years might seem like an eternity in the IT world, in the enterprise it’s just around the corner.