ENERGY STAR for Commercial Buildings is Unreliable


mypgm3phMany people are familiar with the US Environmental Protection Agency (EPA)’s ENERGY STAR program as it pertains to consumer products.  The most ubiquitous place to see their logo is on home appliances.  The ENERGY STAR label on an appliance signifies that the appliance is supposedly more efficient than non-certified models.  There are questions about the reliability of the consumer ENERGY STAR program, but that is not a topic that will be explored here.

In addition to energy efficient products, the ENERGY STAR program offers advice to homeowners for ways they can improve the energy efficiency of their homes, from wall and window sealing to insulating to lighting, and more.  Their Home Advisor lets you create a profile of your home and get tailored recommendations for prioritized improvements.

Less well known, but growing in importance is the ENERGY STAR certified commercial buildings program.  On the program website, it’s described as follows:

ENERGY STAR certified buildings and plants meet strict energy performance standards set by EPA. They use less energy, are less expensive to operate, and cause fewer greenhouse gas emissions than their peers. Starting with the first ENERGY STAR certified building in 1999, tens of thousands of buildings and plants across America have already earned EPA’s ENERGY STAR for superior energy performance.

Currently, 21 types of facilities can earn the ENERGY STAR. Commercial buildings start by entering their utility bill data and building information into Portfolio Manager, EPA’s free online tool for measuring and tracking energy use, water use, and greenhouse gas emissions. Industrial plants start by entering key plant operating data into another set of free tools, called Energy Performance Indicators.

Specifically, to be eligible for ENERGY STAR certification, a building must earn an ENERGY STAR score of 75 or higher, indicating that it performs better than at least 75 percent of similar buildings nationwide.

Needless to say, scoring whole buildings this way is not so simple.  EPA scores different uses of buildings separately (schools are separate from malls are separate from hotels, for example), but what do they do about similar buildings in different climate zones?  How about retail stores open 24 hours as opposed to stores that close?  There are many variables in buildings beyond use group that affect energy usage.  Recent attempts to validate the claims the EPA makes, flat-out, that ENERGY STAR buildings use less energy, have shown the picture is not at all clear (at best) and are very possibly simply wrong.


The EPA created the ENERGY STAR system for commercial buildings as a benchmarking tool to track and collect data on how buildings in the US were improving in efficiency.  They provided the Portfolio Manager tool that building owners can use to input data on their buildings.   The information collected includes location, use group, year constructed, technologies used and how much actual energy is either purchased or generated on site.  The EPA used this data to begin scoring buildings on its 1 to 100 scale in the late 1990s.  Since then, many municipalities and green building certification organizations have begun mandating that buildings be scored using EPA’s Portfolio Manager, and receiving ENERGY STAR certification is among the criteria to receive green building labels, which makes the reliability of the EPA’s scoring system critically important.

Current Research

In his paper, ENERGY STAR Building Benchmarking Scores: Good Idea, Bad Science published in the 2014 Summer Proceedings of the  American Council for an Energy-Efficient Economy John Scofield, Ph.D. scrutinized the methodology used by the EPA to assign ENERGY STAR scores, and found that it contains “serious flaws that lead to erroneous results.” The flaws in one model are so severe that Scofield demonstrated that random numbers produced a model just as convincing as the EPA’s model.

The main problem is that there are simply too few actual buildings entered in the comparison database for each building model (use group) and too many independent variables (location, occupancy, size, equipment, etc), and that in some instances a few hundred actual buildings are being asked to represent hundreds of thousands.  The database is not actually populated by information entered in the Portfolio Manager; instead it’s collected in a periodic survey by the Energy Information Administration.  When you look for buildings that closely resemble the target in terms of location and other variables, the number may be only in the teens.  To combat this, the EPA developed statistical models using predicted energy use.  These models are meant to represent a larger cohort of buildings, and new projects are compared against the model, rather than against actual buildings.

The statistical tools used by the EPA and Dr. Scofield are fairly advanced for a lay person – A detailed explanation of the EPA’s methodology and statistics is included in Scofield’s paper; a more lay-accessible description can be found in this video presentation.

Statistical models are imperfect and sometimes the independent variables make more noise than give good results.  In order to test how well the statistical model works, statisticians use the R2 measure, also called ‘goodness of fit’ to determine how well data fit a given statistical model. An R2 of 1 indicates that the regression line perfectly fits the data, while an R2 of 0 indicates that the line does not fit the data at all. A low value could result because you are trying to fit a line into a curve or you are dealing with known variables that do not truly predict the expected value (random or barely correlated to the dependent variable).  So the lower the R2, the higher the uncertainty of the predicted results is.  Another reason why this number can be small is that you do not have enough experience (observed instances) to give you credibility in your model.

Scofield’s paper finds R2 to be as low as 0.33 for some of the models, so the ENERGY STAR scores are extremely uncertain and offer minimal improvement over random guesses.


Making recommendations for improving the statistical tools used to score buildings is beyond my math knowledge.  I had to survey two friends (one a math teacher and the other an actuary) to get to the point where I barely understood the statistics tools used in the scoring method.  As for recommendations, Scofield says, “I would encourage people to continue scoring their buildings with the EPA’s Portfolio Manager simply as a useful way to track the variation of their own usage over time.  Usually these variations are tied to variations in energy use not variations in building operating parameters — and these are not subject to EPA errors.  As a tool for comparing your building to others in the stock it is less useful — and that is where it is subject to the errors and uncertainties in the EPA model predictions.  I am not optimistic that these will get better anytime soon.”

Until the models and tools are vastly improved, “ENERGY STAR Certified” should be considered to be a marketing gimmick and not a true measure of efficiency.