How to solve the “cold start problem” in an ML recommendation system

One common problem teams face when deploying machine learning products is the cold start problem, where a shortage of quality data limits the performance and value an ML system can deliver. This is especially visible in recommendation systems: when there isn’t enough information about new users or new items, the model tends to underperform.

A team we closely worked with encountered this problem when launching an ML-powered product recommendation system for an online shopping platform a few years ago. Before introducing machine learning, the platform simply recommended the most popular products. The new system was designed to provide personalized recommendations.

Without going into technical details, a recommendation engine essentially solves a ranking problem: given a long list of items (in this case, a product catalog), it must sort them by importance (in this case, relevance to a specific user).

Because the team had access to a large table of historical purchase data, they trained a simple model capable of ranking items based on how users had previously interacted with them or with similar products. After integrating the model into the platform and rolling it out, the team observed a meaningful increase in conversion rates.

However, this approach only worked for users with an existing purchase history. For new users, the recommendations were irrelevant because there was no prior interaction data that the model could use to score products.

Since the platform was growing quickly, this cold start problem affected more than 60% of users at most times.

To address it, the team developed a hybrid solution. They used a separate recommendation algorithm specifically designed for new users. This algorithm recommended the most popular products until enough behavioral data had been collected, after which each user was moved to the personalized recommender. This resulted in a significant boost in conversion rates among new users and improved overall system performance.

Later, the team added a short onboarding survey to capture users’ initial preferences. This enabled the use of a content-based filtering algorithm, which matches users to products based on stated preferences and product attributes—without requiring historical behavior.

They further enhanced the system by introducing collaborative filtering, recommending products based on similarities across users, and by modifying the product to collect additional behavioral signals—for example, tracking which items users added to their carts but did not purchase.

What began as an effort to solve a cold start problem ultimately led the team to improve the product in many other ways. The lessons from this project were later transferred to multiple other ML initiatives.

To enhance your skills in working on AI/ML products, you can benefit from: