Because of the COVID-19 pandemic, unprecedented efforts to promote voting by mail, instability at the USPS, and Donald Trump’s anti-democratic rhetoric against voting by mail, past patterns in voting by mail may not hold this year.

At the same time, the process of voting by mail has several checkpoints on the side of both voters and election administrators. Identifying what will happen at each of these checkpoints could create a big opportunity for improved targeting and messaging.

In particular, we’ve identified four opportunities for predictions that could help campaigns run better programs in 2020: 1) whether a voter will request a ballot, 2) whether they will return it, 3) whether the ballot will be at risk of being late, and 4) whether it will be accepted.

Thanks to some fancy footwork by our data and ML teams, we’ve built models for each of these questions that very validate well on data from recent elections. The result is a set of four scores for each voter that should make VBM sign-up, ballot chase, and ballot curing programs much more efficient and effective.

For example, a voter could have:

  • A 95% chance of requesting a ballot, meaning that 19 out of 20 people with their traits in a given election are likely to request a ballot.
  • An 80% chance of returning their ballot, meaning that 4 out of 5 of the people with their traits who request a ballot will go on to return it.
  • A 5% chance of returning their ballot dangerously late, meaning that 1 in 20 ballots cast by people with similar traits will be received later than 3 days before Election Day.
  • A 99.9% chance of having their ballot accepted, meaning that almost all ballots returned by people with their traits will be accepted and properly counted.

How the models are built

The first step in preparing these models is to assemble the training data. In this case, all four models share the same predictors: the traits of voters plus state election laws. However, the voters and elections selected for each model are different.

  • For ballot requests, we use requests (and non-requests) for elections since April, 2020 — including the November general election. More recent elections get a higher weight than older elections. As we collect more ballot requests for the 2020 general election in a given state, we will phase out early requests from previous elections in that state.
  • For ballot returns, we look exclusively at elections for which we have complete return data — including elections in 2016, 2018, and early 2020. The sample frame in this model is filtered to only voters who received a ballot.
  • For late ballot returns, we use the same elections as the ballot return sample frame, but further filter to only voters who actually returned their ballot.
  • For ballot acceptance, we use the same sample frame as late ballot returns, but further filter to elections that reported at least 0.5% of ballots as cancelled or otherwise not counted.

We then use our training data to identify the features most likely to have high predictive power — either alone or in combination with others — and those most likely to confuse a model into overfitting or diminish the impact of other features.

At this stage, we 1) prune highly correlated features and features without meaningful variation, 2) use a technique called VSURF (variable selection using random forest) to better interpret how features will interact with each other, and 3) impute missing data.

While VSURF is helpful in identifying variables and variable combinations with high predictive power, we select the final list of features by hand — ensuring that we’re feeding the models data that we believe tells a plausible causal story, rather than just hacking our way into a list features that happen to correlate with our response variables.

Finally, we iteratively design a deep learning architecture to predict our outcome. In each of these cases, we’ve built neural networks optimized for low binary crossentropy with a sigmoid activation layer.

Evaluating the models

To validate these models, we ran them on hold-out testing data from 2020.

When validated on over 93,000 testing samples, we found that our ballot request model’s area under the ROC curve was 0.92, indicating that the model’s ranking of ballot request likelihood aligned with actual ballot requests in our testing samples 92% of the time.

When validated on over 39,000 testing samples, we found that our ballot return model’s area under the ROC curve was 0.81, indicating that the model’s ranking of ballot return likelihood aligned with actual ballot returns in our testing samples 81% of the time.

When validated on over 59,000 testing samples, a lift chart organized by decile showed that voters with the top 10% of late ballot return scores were 2.27 times more likely than a random voter to return their ballot dangerously close to Election Day. Those with scores in the bottom 10% were about 60% as likely to return their ballot dangerously close to Election Day.

When validated on over 41,000 testing samples, a lift chart organized by decile showed that ballots cast by voters with the bottom 10% of ballot acceptance scores were 10 times more likely to be rejected than a random ballot. Ballots cast by voters with the top 10% of scores were less than 1% as likely to be rejected. When organized by percentile, the bottom 1% of scores show a lift of 6288— meaning those voters’s ballots are nearly 63 times more likely to be rejected than a randomly selected ballot.

Next steps

We will continue to iterate on these models — bringing new features and new training data every day — until the end of the 2020 general election. If you have any ideas or concerns about anything we missed, please let us know!