Running complex predictive models on huge datasets can be a pain. At Deck, we’re really feeling that pain.

Every day, our product runs several models on almost 13 billion rows of data — representing every campaign/voter relationship for the offices we support.

It’s been a struggle to do this in a way that balances accuracy, efficiency, and affordability. But we recently found an answer that’s been a game changer for our team: BigQuery’s new built-in TensorFlow support.

The problem

We train our models using an approach we call “contextual inference.” We take hard data on what’s happened in the past and organize lots of related data that might help to explain it. For example, to determine why a group of voters supported a particular candidate, our models look at things like how the candidate was covered in local media, the traits of people who made financial contributions to the candidate, what the candidate’s history in office might have been, and what traits those voters shared.

Until recently, the best models we had built to make sense of this data relied on approaches called random forest and gradient boosting. Both methods can incorporate complex chains of interactions and be iteratively tuned to weed out variables and relationships between variables that don’t increase your model’s accuracy.

However, these types of models are computationally intensive to train and utilize, especially on the ginormous datasets we’re working with.

Attempted solutions

When we launched Deck, we were just supporting a small handful of campaigns and we had an easy time making this work. But as we scaled up, it became unsustainable.

We first tried using distributed computing technology like Spark. But, while Spark is pretty awesome, it takes a long time to load data into the memory of a cluster and pull results out. It didn’t seem built for the kinds of jobs in which your scoring data would change regularly and need to be swapped out at a high frequency.

We then tried coding random forest “formulas” into SQL. However, our data warehouse has a 250k character limit for queries, and these were tens of millions of characters long.

Finally, we settled on a compromise. We used gradient boosting or random forest to build models that generate very accurate district-level forecasts. We then used a less accurate logistic regression model to generate individual-level scores for voters. We then recalibrated the scores to match the forecasts.

That worked pretty well for us, but it also left a lot to be desired.

First, it prioritized the accuracy of our forecasts over our scores. Our forecast models were built with more sophisticated methods and held to a high standard of accuracy. As a concession to our infrastructure limitations, our score models were built with less sophisticated methods and held to a slightly lower standard of accuracy. We then adjusted voters’ scores using the forecast models in an attempt to retrofit their accuracy, but that’s not the same as building an accurate model in the first place. When we tell you a person is 90% likely to support your campaign, it’s obviously important that you can trust that number.

Second, it left our individual-level scores less polarized than they ought to be. This approach has routinely generated individual-level scores probability scores (predicting the probability that a given voter will support a specific candidate, for example) ranging from 20% to 80%. But we’d rather see scores with bimodal distributions concentrated near 0% and 100% so our users could do more accurate and efficient targeting.

Enter: TensorFlow

At Deck, we use a tool that we developed called Lexicali to help us make sense of text data. Lexicali is powered by deep learning models engineered using the Keras API, which is an interface to TensorFlow.

Given this experience, we’ve talked about retooling our district-level forecasts around deep learning in the past, but it never felt worth it. Our forecast models are already working very well, and the relatively sparse training data we have for district-level outcomes (as opposed to person-level outcomes) might not be enough for most deep learning models to work with. And using deep learning models to predict person-level outcomes would lead to an even more extreme version of the same problems we encountered with random forest and gradient boosting.

But recently, we discovered that BigQuery ML had begun offering native support for TensorFlow.

This meant that we could train a TensorFlow model to generate individual-level scores on a local machine, upload it to BigQuery, then apply that model at scale to massive datasets. So… we got pretty excited and got busy!

Our lead ML Engineer, Cia Salinas, has been busy moving all of our individual-level models into a TensorFlow context. So far, we’ve done it for our individual-level support models and the results have been very exciting. Our predictions are now more accurate and more polarized, making our users’ outreach and advertising programs more effective and efficient.

Trade-offs and next steps

Deep learning has its trade-offs. Compared with approaches like regression and random forest, It’s more difficult to dig into why certain predictions are being generated by deep learning models.

Deep learning models are also difficult to tune, with dozens of decisions compounding on one another to generate wholly different outcomes. So if we were to start getting weird predictions, debugging could be a messy project.

That said, we’re mainly just stoked. Our candidate support models are validating better than ever before, and their increased polarity makes them easier to use. We’re nearly done with a similar overhaul of our turnout models, then will move on to our contributor likelihood and partisanship elasticity models.

We’ll let you know what we find!