When things are cooking, Conde Nast relies on PagerDuty’s alerting platform
To an IT manager, the most urgent communications are those that alert him to a system outage. This is especially true when the outage results in lost opportunities that can’t be recovered.
Conde Nast is an enterprise that has no choice but to be concerned about such scenarios. The company publishes some of the world’s best-known consumer magazines, including Glamour, Epicurious, Golf Digest, Vogue, Vanity Fair, Bon Appétit, the New Yorker and Conde Nast Traveler.
At certain times of the year, such as Fashion Week—when the world’s leading fashion houses unveil their latest creations in runway shows—and Thanksgiving and Christmas—when millions of people are looking for recipes—a system outage can be disastrous, especially if it affects a publication’s website, according to Chris Handy, Conde Nast’s director of Web operations.
“At such times, being down even for a second can be everything to us,” Handy said.
Several years ago, Conde Nast decided to move its websites to a hosted environment as a cost-cutting measure. However, the company ultimately decided to keep the performance-management aspect in house, and maintained a small team of IT professionals for that purpose. The company uses Pingdom to monitor its systems externally; the platform continually pings the company’s websites from locations scattered all over the world, running up the red flag if it doesn’t receive a response. It uses Nagios to monitor its systems internally for anomalies, such as storage nearing capacity.
When one of these monitoring systems detects a problem, an IT alerting system dubbed PagerDuty—developed by the San Francisco-based company of the same name—goes to work. PagerDuty can be configured in myriad ways, so that it automatically sends the alert to the right person using the preferred medium, based on the type of problem and time of day.
“PagerDuty acts as a central hub for all of our monitoring solutions,” Handy said. “We have a small team dedicated to this, so PagerDuty also was good for organizing a way for us to be notified, day or night. It’s been very helpful in that perspective.”
PagerDuty was engineered to take a rifle-shot approach to IT alerting, rather than a shotgun approach, according to Alex Solomon, the company’s CEO.
“The system has an on-call scheduling component, which we believe establishes a concept of ownership,” Solomon said. “You don’t want to blast the alert to a whole bunch of people in the hope that one person gets it, because that scenario falls apart a lot of times. Why wake up five people when one person will do? Also, if you blast out the alert, people may ignore it, thinking that someone else will take it.”
Perhaps the most tangible benefit is that PagerDuty can reach out to IT staff via e-mail, text message or phone call, according to Handy.
“PagerDuty is smart enough to keep track of who should be notified at what time, and how they should be notified,” he said.
For instance, a technician might prefer an e-mail during the business day but a text message while on his subway ride home. In the overnight hours, he might prefer a phone call, to ensure that he wakes up to receive the alert. In fact, this is critically important in the middle of the night, according to Handy.
“If you get something from PagerDuty at 3 in the morning, you know that something is seriously wrong,” he said.
If a technician should sleep through an alert, PagerDuty has thought of that, too, according to Solomon.
“We’ve worked escalation into the platform” he said. “If someone misses the alert, it gets passed on to the next person in the chain.”
Another benefit of the PagerDuty platform is that it is integrated with multiple system-monitoring solutions, including Pingdom, Nagios, Splunk, New Relic and, most recently, Zenoss, a collaboration that was announced in September.
“The flexibility that PagerDuty provides is huge for us. … You don’t have to worry about monitoring [numerous] systems, because they all come in through this central notification system, which is PagerDuty,” Handy said. “The alerts happen automatically—you don’t have to think about it—and that makes it much easier for us to respond.”
PagerDuty is a cloud-based hosted solutions provider. Solomon said that access and security concerns regarding such solutions are going away and that customers today don’t often mention them. In the case of PagerDuty, no such issues exist, because the company ensures reliability by deploying its services out of three data centers and works with three telephone providers, nine SMS providers and three e-mail providers. A “very sophisticated” routing system will bypass any one of these providers, if things get wonky, Solomon said.
“We think that the only good way to do this is in the cloud,” Solomon said.