Information Lives Here

Keeping up with the demand for data storage

By Andrea Poet

Rows of servers inside a data center

The amount of data stored online doubles roughly every 18 months—and users’ expectations for how quickly they can access it are growing about as fast. Online retailers depend on lightning-speed search results to deliver sales. Users expect instant, relevant returns on their Google and YouTube searches. New technologies such as self-driving cars have little tolerance for latency in sending and receiving information.

Addressing all these issues—and having to constantly keep up—are data centers.

Last year, companies and investors spent about $150 billion on data-center construction. Some centers are constructed where the cost of energy is low, while others, such as edge-computing centers, are being built where the demand is. These facilities fill different niches as companies move their software and information from their own hardware to the cloud or server farms.

Across the industry, from hyperscale data centers to colocation groups, secrecy about physical locations and operating conditions is the norm, largely due to the sheer cost of the equipment. According to Robert Sty, BS ’00, data centers cost $6 to $8 million per megawatt of power to build. New hyperscale centers can be as large as 300 megawatts, putting their price tag at more than $1 billion.

Sty, a registered professional mechanical engineer in Arizona, is the global director of technology at the architecture and design firm HDR. His work encompasses data center, aerospace, and advanced tech facilities. He has worked on data center design and construction for about 12 years.

In building a new data center, site selection is important, Sty said, citing key tenets of reliability, redundancy, and uptime. Sustainability also counts. A metric developed by an organization called Green Grid measures “power usage effectiveness”: a ratio that compares a data center’s total power use with the power used by the IT cabinets themselves. The closer to 1.0, the more efficient. Fifteen years ago, data centers rated in the 2.0 range; today, the industry standard is 1.2 or lower. Some data centers, aided by free environmental cooling from air or sea water, have come in just under 1.1. The temperature range in which a center can operate has also expanded, so places with low electricity costs, such as Phoenix, have seen data-center growth.

Volume and energy efficiency are only part of the data-center equation. To satisfy customers—both companies and their end users—speed is another. Balajee Vamanan, PhD, an assistant professor in UIC’s computer science department, is trying to accelerate search results by resolving bottlenecks and prioritizing data flow.

Vamanan described his work: “I’ve been working on applications that search vast amounts of Internet data and provide results on demand to often-impatient users. There are things you can do in the network, in the operating system, to get the data back fast to the user.”

Vamanan explained that when you enter a search term in Google—say, “UIC engineering programs”—the query does not go to just one server. It goes to thousands of servers that work cooperatively and in parallel to return answers. One of Vamanan’s early projects was to ensure that search processing prioritized top results. In our example, webpages originating from or from the Chicago area would take a front seat in processing compared with, say, a website in another country that hosts an article that happens to mention UIC.

Many of us may not equate our search results with energy consumption, but there is a connection. Data centers that run searches are among the largest because of the amount of data they must process, making them energy gluttons. If a particular server holds a result that will show up on page 10 in your search results, however, it doesn’t have to run as fast as the servers that hold your page-one results—keeping itself cool.

“I’ve been working on applications that search vast amounts of Internet data and provide results on demand to often-impatient users. There are things you can do in the network, in the operating system, to get the data back fast.”

Balajee Vamanan, PhD  |  Assistant professor, computer science

Vamanan also is working with programmable networking devices, which allow network operators to insert code into a router. A router typically acts as a traffic cop between your computer and your network or between networks, controlling the flow of information. Programmable devices can operate a bit more like a computer: a small program can be inserted to make queries go faster, identify how critical queries are, prioritize information intelligently, and provide signals to other resources when traffic is slow, so that others can use the network. These devices reduce the overall load on a data center.

Like Sty, Vamanan says it’s hard to stay ahead of the data explosion and its associated needs for storage and quick retrieval.

“Say you want to search for your image in media,” Vamanan said by way of example. “Every time you search, thousands of machines have to process hundreds of thousands of videos, then send the frames over and rank them. This processing gets very rich and puts a lot of stress on infrastructure underneath.”

“You’re searching for the needle in the haystack. And the stack is growing exponentially.”