2003 Blackout: Could Smart Grid Save Us Next Time?

Jeff Dagle earns a living through his work on how to improve the reliability of the electrical transmission system in the United States. On August 14, 2003, the chief electrical engineer for the Electricity Infrastructure Transmission System Resilience at Pacific Northwest National Laboratory was sitting in his office when the phone rang.

It was his mother, who lives in Washington state and likes to call him when she gets wind of a blackout anywhere in the U.S. “She starts naming cities: New York, Detroit, Cincinnati,” recalled Dagle, “and that’s when I knew it was big.”

A few minutes later, the U.S. Department of Energy called. The DOE wanted to know what Dagle knew about the blackout, which would eventually leave about 50 million people in the dark on a sweltering summer’s day. He didn’t have much to offer besides what his mother had just repeated from CNN.

“We always know these things can happen,” he said, “but the magnitude surprised even me.”

A decade after the largest blackout in U.S. history, there are hopeful signs of improvement, and issues of concern.

The Northeast Blackout Investigation Task Force, which Dagle was a member of, identified four root causes of the cascading outage. Greentech Media recently spoke with Dagle about those causes and about how, ten years later, a slightly smarter grid and more stringent oversight could help mitigate the extent of the next blackout.

Cause #1: Inadequate understanding of the system

Dagle: There had been a study by the Cleveland Electric Illuminating Company that concluded under certain conditions if you loaded lines at certain levels, it would loop around Lake Erie and take out the Northeast. First Energy had acquired the utility years before and after the blackout the response was, "What report?” It’s up to companies to instill that culture. Other utilities were shocked at how lax things were.

What’s been improved?

Dagle: Now [with mandatory North American Electric Reliability Corp. reliability standards in place], you’re not requiring individual companies to do that diligence. The focus on reliability isn’t optional.

What still needs to be done?

Dagle: I like to call it a loss of institutional memory. The utility had [learned] those lessons, but they had forgotten about them. That loss of institutional memory can still happen.

Cause #2: Loss of situational awareness

Dagle: The operating philosophy is to have redundancy built in. The first key line tripped at 3:05 p.m. and then a second line, and 15 minutes later a third line tripped. But they didn’t know they needed to take action because the alarm process failed. It was a technological glitch.

What’s been improved?

Dagle: At that time, there had already been a lot of lessons from the blackout of 1996, and people were already working on synchrophasors [phasor measurement units that take real-time measurements off of transmission lines about 30 times a second and timestamp them using GPS]. If we had a blackout again, we would have much better, high-quality data to do analysis of it. Reliability is coming along. There are a lot of advanced applications that are using this data, but it’s still a work in progress. I think the industry is really at the tip of the iceberg in using this data.

What still needs to be done?

Dagle noted that there were opportunities missed by the grid operators at First Energy. A tree trimming crew saw the Hanna-Juniper line go down and called the control center. But because of the computer glitch, the operators could not confirm that the line was down. The Midwest ISO was also having computer issues, which prevented the ISO from seeing the outage earlier. BY the time MISO saw the outage, it called First Energy only to find out that First Energy still had no idea what was going on.

“Taking the mental leap to emergency load operations, to take command of the grid right now, that is a cultural thing,” said Dagle. “The operators, they need to feel empowered to take those decisions. I honestly don’t know if we’re much different than we were 10 years ago.”

Cause #3: Inadequate vegetation management

Dagle: The first three lines were all within their ratings and should have been able to carry that power.

What’s been improved?

Dagle: Ten years ago, it was up to the individual utility, so vegetation management was an easy target to defer if utilities were short on quarterly goals. In this case, the utility got a little sloppy. Now there are national standards.

What still needs to be done?

Dagle: The financial penalties for inadequate vegetation management are a sufficient deterrent.

Cause #4: Inadequate oversight of reliability coordinators

Dagle: This was one of the lessons after 1996. They needed a function of a reliability coordinator. The problem was that AEP Ohio had PJM as its reliability coordinator, but First Energy had MISO. There was some confusion, and MISO had some computer issues.

What’s been improved?

Dagle: It does come down to information sharing, not only on a real-time basis, but also in planning. It’s slower than I would have thought.

What still needs to be done?

Dagle: At the fundamental level it boils down to cost-benefit. With deregulation, transmission people were focusing on markets, and not reliability. With TLR [transmission loading relief], which is basically a market mechanism to unload a link, that works fine, but in an emergency, you need to take more swift, preemptive action. We still need to make better use of IT to more effectively use what we've got. It’s not rebuilding; it’s better managing the infrastructure.

***

For all the advances in IT and data coming off the transmission grid, the San Diego blackout in 2011 shows how much still needs to be done. In that instance, there were multiple organizations running that section of the grid, something that was also a problem in 2003.

Unlike recent storms, which have caused extended outages because of damage to distribution circuits, the cascading outages of 1996, 2003 and 2011 show that proper training is as vital to contain the outages from a control room; indeed, it's as valuable as the best real-time data.

But data will help. The U.S. Department of Energy has a synchrophasor program to coordinate data across the Western Interconnection. Dagle noted that the time-stamped data coming off of phasor measurement units are essential for understanding what happened during large blackouts. The problem is that the data is still not being shared across the organizations in a way that will necessarily allow them to prevent the outage from spreading, as was the case in San Diego.

Dagle also noted that increased distributed generation could become an asset during outages for communities, but because of IEEE 1547, which requires inverters to go offline because of safety concerns when the power goes down, it’s currently not possible. He said the next step is not only to pull together large-scale data from sources like synchrophasors, but also to find ways to make the grid more flexible during outages -- something that will likely take a lot more technological innovation and regulatory shifts.

It will likely take another major blackout followed by increased regulation, rather than voluntary adoption of real-time data analytics, for the grid to take another leap forward in reliability.

“The trend is in the right direction,” Dagle said, “but it’s surprising how long these things can take.”