our methodology

Our objective was to predict the competitiveness of Virginia’s state legislative races by district, ranking them by likely margin. We sought to build a regression model looking at previous electoral history to understand the historical relationship between state legislative, statewide (e.g., gubernatorial), and national (e.g., Senate) elections and project likely 2017 margins. A few notes:

  • We’ve used regression-based modeling. In recent years, political analysts and data scientists have relied on polling for forecasting purposes, particularly for executive and federal races. While polling is the best tool available for national elections, which are infrequent and cover larger districts, regression analyses are more appropriate for state legislative races (as polling data from state legislative races is sparse and inconsistent).

  • We have trained the model on historical data, testing its ability to predict previous elections. At each step, we check the historical predictiveness of our model, applying it to previous electoral years and adjusting as fit. As we gather more data, this will become a more automated process, resulting in fewer human assumptions applied to the model; but for now, we are relying mostly on electoral fundamentals commonly used among political scientists and analysts we follow. In upcoming versions of our model, we will quarantine a subset of historical elections, separating the districts we use to build the model from those we use to test it.

  • This is our 1.0 version. See below for the next steps we plan to take to refine our model further

To build this model, we took the following steps:

Step 1: Understanding the landscape. We started by compiling electoral data by state legislative district. We collected 30 years (1987-2016) of state legislative and fundraising data, as well as six years (2011-16) of Presidential, gubernatorial, and Senate races. The aim was to understand the relationships between how a given district would vote year to year, as well as how it would vote up and down the ballot.

One of the first challenges we encountered when analyzing the state legislative electoral data was the shocking number of elections that went uncontested—over 60% of state legislative elections, in the case of Virginia’s previous House of Delegates election. Our initial hypothesis on state electoral analysis depended on the notion that there is a much richer data set to utilize when it comes to local elections; whereas presidential elections only occur once every four years, the Virginia House of Delegates has 100 elections every two years, meaning we could eschew the typical forecasting reliance on polling and focus exclusively on empirical outcomes. However, the number of uncontested races threw a curveball in our thesis. How do you evaluate a race that on paper might be competitive if only a Democrat would challenge an entrenched Republican incumbent?

Step 2: Developing a proxy to predict often-uncontested races. We needed to project the odds a “generic” candidate would face in any given race, regardless of whether it actually went uncontested—in other words, a proxy for a specific candidate. We developed this proxy using national and statewide electoral data from the same districts (i.e., how each state legislative district voted for president in 2016, for senator in 2014, for governor in 2013, and so on).  

We found extremely high correlations among district-level electoral margins for these highest legislative offices from year to year, indicating a high degree of partisanship and voting consistency. Particularly given the idiosyncrasies of the last few Presidential candidates, we were surprised to see that districts voted so consistently on party lines.

Though national and statewide electoral margins were positively correlated with each other, the generic proxy method had a number of flaws in predicting state legislative outcomes. As mentioned earlier, many state races went uncontested, essentially handing the election to the single main-party candidate. Second, there was a built-in incumbency advantage. Third, a generic proxy does not measure the skill level of a candidate or campaign; particularly skilled politicians can win over swing voters, and candidates mired in scandal may lose them. Finally, state legislative races tend to see lower turnout than statewide and national races, particularly in off-cycle years.

Step 3: Applying adjustments to address variations in voter turnout and to emphasize recency and momentum. The rest of our model is designed to address these tradeoffs. First, we apply a significant adjustment to reflect the differences between forecasting generic candidates and actual candidates. Put simply, there is a noticeable drop in overall turnout, as well as a decline in Democratic support in these off-cycle and down-ballot elections. We ultimately expect some of this effect to be curtailed this year with complete Republican control of government at the national level; however, we took a cautious and conservative approach to our model and assumed this year would not diverge much from recent electoral outcomes.

Next we implement a decay on older results, emphasizing more recent electoral outcomes. Specifically, we apply a four year half life to prior electoral results, meaning the 2012 presidential election is weighted 50% as much as the 2016 election, and the 2014 senate race between Mark Warner and Ed Gillespie is weighted 75% as heavily. Second, we establish a metric for defining the built-in incumbency advantage. This is both the hardest metric to measure, as well as arguably the most important at the state level. As it turns out, the effect ranges quite widely, but generally speaking, the longer a candidate has been in office, the more powerful the effect, although with increasingly diminishing returns. We calculate the incumbency effect based off this curve, and apply it as an addend to our decay-adjusted marginal prediction.

For the most part, we found that recent elections were more predictive than previous elections, hence our decay assumptions. But some districts saw consistent longer-term trends (e.g., Fairfax County has been strongly trending blue over the past eight years). Here, electoral momentum overpowered the recency effect. We implemented a mean momentum calculated as a simple velocity function over multiple electoral years. The result is a model that maintains a certain level of conservatism in forecasting, but weights momentum slightly stronger in accelerating districts.

Step 4: Determining district rankings. After calculating the raw scores for each race, we apply some adjustments to the rankings. First, we have decided to emphasize flipping vulnerable seats over defending potentially weak seats. After all, our name is flippable, and central to our mission is the need to take some risk for the sake of overturning state legislatures around the country. The exact multiplier we have settled on is currently 1.5x, applied to the raw marginal score of any currently held Democratic seats. Put simply, this means that for the purposes of our final prioritization of races, a Democratic seat with an expected marginal victory for 5% is treated the same as a Republican seat with an expected marginal victory of 7.5%. We may choose to adjust this multiplier as we grow.

Using that historical precedent as the basis, we established an expected maximal bounds of outcomes for our target list. In other words, it would take a truly unusual and outlying election for Democrats to expect to pick up more than about 6 seats. With that in consideration, we have decided to target five races this year as priority flip races. All of these races fall into a category of competitiveness, which we define as a race where the expected marginal outcome is expected to be within 5%. At the same time, we wanted to highlight some races that we’ll be monitoring throughout the election year, which represent the set of races Democrats would need to win in order to flip the entire legislature. We call these races potential flip (and defend) races, and the vast majority of them represent longer-term investments for Democrats—races we think would be flippable with fair redistricting or races that may flip blue in 2019 or shortly thereafter. We’re keeping a keen eye on these 20 races and invite you to donate or volunteer to their campaigns. As the election year progresses, we may choose to include some of these races among our priority flip targets, particularly as we gather more data around fundraising and internal polling.

Next steps

As mentioned previously, this is our first version of an electoral model. We recognize a number of limitations, including:

  • We’re currently working off of electoral data only. At the local level, the best predictor of future election results is electoral history. In our next iteration, we will incorporate additional data that could powerfully predict race outcomes, from Census demographics and educational levels to consumption data and social listening. This requires consistent, clean data and a meticulous approach to training our model. We also plan to take dynamic inputs into consideration, as circumstances change in the lead-up to November’s elections.

  • Democratic candidates are not finalized in most of these districts. Our model shows us the flippability of a given district, but we will know more once the Democratic primaries are held on June 13. Please stay tuned for updates to our model. In the meantime, you can learn more about candidates by visiting their district pages.

  • The data can only get us so far—so we’ve worked with Virginia’s House Democratic Caucus to overlay political nuance. We are strong believers in the power of data—but we understand that it has limitations. A truly incredible candidate can defy even the greatest odds. We meet regularly with Virginia’s House Democratic Caucus to discuss our district picks, and we expect to incorporate even more information about candidate viability once primaries are held in June.

  • We have chosen a limited number of “flippable” districts to concentrate resources—but this could change. It will take 17 net seats gained to flip Virginia’s House of Delegates. This is not impossible, but it’s unlikely—and it’s especially unlikely if we spread our resources too thin. So we’re starting small, and we plan to increase our scope as we raise more money, recruit more volunteers, and refine our model.

Over the next few months, we plan on building upon our model to include several new sources of data and insight. First, we plan on integrating other sources of data that we have found to be particularly correlated, such as census demographics, educational levels, and population density, which will help us refine key population clusters in the model. Next, we plan on gathering deeper Virginia-specific data, using increasingly smaller jurisdictional levels as the foundation for our analysis, as well as incorporating older data for the purposes of better defining important adjustments such as third party candidate and party-in-power effects.

Finally, we will replicate this process for all 50 states. This will have two important effects: first, it will greatly expand the size of our data set, improving our confidence intervals in our regression analyses. Second, it will give us the opportunity to apply the same methodology to other states with upcoming elections—in both 2017 and 2018—whose outputs we will use as the basis for ranking all races across states through decision tree prioritization.

We welcome input and ideas. If you would like to get in contact with us, please do so here.