Thumbtack helps customers search for the right local professionals to get projects done. Our search product collects project details from customers and matches them against preferences from professionals. Afterwards, our ranking algorithm displays the professionals most likely to result in a job well done. We tackle the search ranking problem by scoring professionals that match the customer’s requirements and then sorting them by score. Earlier this year, we changed our search ranking algorithm from a heuristic scoring system to a machine learning (ML) based scoring system. This change was very challenging but impactful. In this blog post, we’ll discuss why we wanted to transition our search ranking algorithm to use machine learning,
Part 1: Organizing Chaos
Over the past year, we’ve built out Thumbtack’s data infrastructure from the ground up. In this two-part blog post, I wanted to share where we came from, some of the lessons we’ve learned, and key decisions we’ve made along the way.
When we started this project in early 2015, Thumbtack didn’t have a standalone data infrastructure; all analytics and data-oriented tasks were accomplished by directly using production databases. Individuals across all engineering and non-engineering teams were using the PGAdmin desktop tool for running queries. These and other dashboard/analytics queries were directly hitting a production PostgreSQL replica.