The Cost of Data with Vaidehi Joshi

Vaidehi Joshi - Jul 23 '20 - - Dev Community

To find all of the resources listed in this talk, as well as the full transcript of this talk, check out costofdata.dev.

Vaidehi is a senior engineer at DEV, where she builds community and helps improve the software careers of millions. She enjoys building and breaking code, but loves creating empathetic engineering teams a whole lot more. She is the creator of basecs and baseds, two writing series exploring the fundamentals of computer science and distributed systems. She also co-hosts the Base.cs Podcast, and is a producer of the BaseCS and Byte Sized video series.

A super brief outline of this talk is as follows:

  1. [Introduction]: The what + why of data centers
  2. [Middle]: Investigating the environmental impact of cloud providers
  3. [Middle]: Exploring how the impact of running data centers will scale over time
  4. [Middle]: Highlighting some advancements made in this sector
  5. [Conclusion]: Providing actionable items for developers

Here is a much more detailed outline, which goes into greater depth:

[Introduction]: The what + why of data centers
Everything that we do on the web has to be stored somewhere. Many of us work with databases every day, and some of us even do impromptu ops work, focusing on making sure our servers stay alive in an effort to keep our apps up and running. But even though we all know this theoretically, most of us never have to think beyond our own databases or servers.

This talk takes a macroscopic view of what all of this data actually looks like in reality. All of our app's data lives in a data center, somewhere in the world. Whether we build our own, or use a cloud provider, we're relying on that infrastructure to maintain our app's uptime and store our users' content. These physical buildings are often out-of-sight—but that doesn't mean that they should be out of mind!

[Middle]: Investigating their environmental impact
Data centers require large footprints: they are physically huge buildings, but they also require a large amount of energy to provide constant, uninterrupted service. Current research estimates that data centers worldwide use ~200 terawatt hours (TWh) a year and demand somewhere between 1-3% of the world’s global electricity (I’ve found conflicting reports on the exact number). In terms of consuming energy, this puts data centers in the same bucket as the aviation industry!

Data centers require so much energy because servers create heat, so they need to be cooled down; unfortunately, many data centers are built in warm climates, which makes them pretty inefficient. Another harsh reality is where these data centers get their energy from. While some cloud providers have shifted to using 100% renewable energy, not all of them have yet. Amazon Web Services (AWS), for example, has committed to a long-term goal of using 100% renewable energy, one of its most popular regions, US East, is still fueled by coal and natural gas. And, to make matters more complicated, cloud providers don’t exactly make it easy for consumers to know whether or not their data is being stored in a green facility or not, and many of them are simply not transparent, and do not release information on their data centers.

[Middle]: Exploring how the impact of running data centers will scale over time
When we stop to think about how this will scale over time, this problem can seem overwhelming and daunting. Researchers estimate that 28 billion devices will be connected to the internet in 2020, and the amount of data we create is ever-growing.

As our climate changes, there are other threats to the infrastructure that is going to be needed to power all of those devices that are connected to the internet. Researchers at the University of Oregon have predicted that, by 2030, approximately 235 data centers will be impacted by a predicted 1 foot rise in sea levels.

The data center problem is multifaceted: they take energy to power, to cool and their number are growing. They are impacting the physical environment, and they are also at a high risk of being impacted by climate change, too.

[Middle]: Highlighting some advancements made in this sector
But, there’s hope yet. Data centers also happen to be the home of some of the most interesting technological advancements in our industry! More and more data centers are being moved to cooler climates, and scientists are coming up with better, more effective cooling solutions. In Stockholm, researchers have found a way to recover data center heat waste and are now reusing that same energy to heat homes in the city!

[Conclusion]: Providing actionable items for developers
So, knowing all of this, what can we do, as developers? Here are some actionable items that each of us can do today:
• Figure out where your data lives!
• Figure out if it’s green! (Check out thegreenwebfoundation.org is a great resource for this)
• If your data doesn’t live in a green zone, consider migrating your data to a different location or provider. (Admittedly, I know that this is not easy!)
• When provisioning a new server/database, don’t provision it in a zone that is not green.
• Build things that make this knowledge easily accessible! (I love the cloud sustainability console chrome extension, built by Paul Johnston, which highlights AWS zones that are green in the console). He also imagined a CLI tool that would allow us to see how much energy usage, renewable energy, and carbon released for every new instance on a cloud vendor! This doesn’t exist yet, but other tools like this would be great to build for the community.
• If you work at a small company, draw attention to this issue internally! At my previous company, I brought this up and we started the discussion of what it would take to migrate away from the AWS US East zone.
• If you are part of a large company, especially one that has a large enterprise account, pressure your cloud provider to be transparent about where their energy comes from for powering their data centers.
• If you work for a cloud provider, push them to use clean energy in their data centers (lots of employees have already done great work on this!)"

Here is a download link to the talk slides (PDF)


This talk will be presented as part of CodeLand:Distributed on July 23. After the talk is streamed as part of the conference, it will be added to this post as a recorded video.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player