Data Cherry-Picking: Impact and Implications

Data is an essential part of the work of every product manager. It helps to form and validate hypotheses, provide more insights about user behavior, and make better decisions and track product changes.

But the misuse of data can be harmful. One important example is selecting only data that confirms a particular hypothesis and ignores relevant contradictory evidence.

We interviewed experienced product managers and economists from different companies about the risks and remedies of data cherry-picking. They shared their thoughts on the following:

What is data cherry-picking?
Why do product teams cherry-pick data to support their point of view or hypotheses?
What is the impact of data cherry-picking on business, users and executives?
How can you bias-proof your research and hypothesis validation process?

We would like to thank all the experts who shared their experience with us and helped us answer these important questions.

Gleb Romanyuk (Economist at Amazon Web Services)
Casey Phillips (Senior Product Manager, AI/ML at Uber)
Eugene Ter-Avakyan (Senior Product Manager Travel Search at Huawei)
Cristian Garcia (Senior Product Manager at Publicis Sapient)

→ Test your product management and data skills with this free Growth Skills Assessment Test.

→ Learn data-driven product management in Simulator by GoPractice.

→ Learn growth and realize the maximum potential of your product in Product Growth Simulator.

→ Learn to apply generative AI to create products and automate processes in Generative AI for Product Managers – Mini Simulator.

→ Learn AI/ML through practice by completing four projects around the most common AI problems in AI/ML Simulator for Product Managers.

Q: What is data cherry-picking?

Data plays a crucial role in a PM’s work. But if it’s incomplete and doesn’t tell the whole truth, it might undermine the business. This happens when product managers cherry-pick data that supports their arguments and assumptions.

Casey Phillips (Senior Product Manager, AI/ML at Uber)

Data cherry-picking is the conscious or even subconscious act of seeing data in a manner that confirms a preconceived bias one has. This comes in the form of analyzing and interpreting data in a manner that supports what we want to believe versus what is actual reality. This can be intentional or sometimes even unintentional if the subconscious desire to believe something is true leads one to inherently have bias in the way they’re analyzing and interpreting data.

Gleb Romanyuk (Economist at Amazon Web Services)

Cherry-picking is intentional selection of favorable facts, usually to support an outcome which the analyst prefers to see (either personally or because his boss wants to see a specific result). Presenting a story that is built on all the discovered data points is not cherry-picking, doing the same by vetting facts towards a specific story is.

Q: Why do product teams cherry-pick data to support their point of view or hypotheses?

Data cherry-picking is not tied to a specific role, such as junior specialists, PMs, engineers or analysts. It also can be intentional or unintentional. Here are the most common causes of data cherry-picking according to the experts we spoke to:

Cognitive biases
Not understanding the profit drivers of the business
Wanting to launch a product or a feature at any cost
Building the product for the founders instead of users

Let’s dig deeper into those issues.

Gleb Romanyuk (Economist at Amazon Web Services)

A product manager is rewarded for launching products and features. Therefore, the PM is often biased towards analysis outcomes that favor launches of products and features. The tradeoff is that this kind of analysis often ignores the impact of product launches on the business as a whole. It is both intentional and unintentional. Amazon has a great corporate culture and I have never seen people intentionally defrauding data to make a point. But they sometimes select data points that are favorable to their hypotheses.

For example, a senior product manager has been working on a new product for one year and it’s time for the launch. Suddenly, the VP gets concerned that it might cannibalize the rest of the business and asks to do a further financial analysis. Now guess what happens. The PM will find customer anecdotes that indicate that cannibalization doesn’t happen (and will ignore those that indicate the opposite). Then, an economist on his team will do a quantitative analysis using various methods and reach a different conclusion. But the PM will claim that the most robust method was the one which gave the highest positive estimate of the financial impact of the new product.

Casey Phillips (Senior Product Manager, AI/ML at Uber)

I’ve seen individuals cherry-pick data out of ignorance by not truly understanding the big picture of what the data represents and the appropriate analysis necessary to paint an accurate representation. In most cases though I believe data cherry-picking happens when someone wants something to be true and believes it is true and lets this internal bias manipulate the way they perceive the true reality of the data they’re interpreting.

Eugene Ter-Avakyan (Senior Product Manager Travel Search at Huawei)

In many cases, cherry-picking is not intentional; instead, people are so invested in their projects that they fail to dig deep enough when they see what they want to see. The best example I can share sometimes happens during meetings with leadership when teams are hard-pressed to present their progress positively. There is a considerable temptation to show only “good” numbers. Sometimes it’s just giving the relative growth values and omitting the absolute values: monthly growth of 300% sounds much better than growth from 10 to 30 sales.

Cristian Garcia (Senior Product Manager at Publicis Sapient)

I have seen cherry-picking data too many times in order to support metrics that will later on be shared amongst other teams, customers or leadership. The reasons behind this practice were diverse, from trying to fit a rushed product or feature released to building something for the founders and not the users. The cycle created is dangerous, as the product team could start manipulating the information shared until they get an ‘actual result’.

Q: What is the impact of data cherry-picking on the business, users, and execution?

In the long run, data cherry-picking can lead to negative effects on business. Cherry-picked data may be misleading and steer you away from the reality of your product. Ultimately, it can lead to decisions made based on false narratives and a complete compromise of data-driven culture.

Here are some examples of how it happens:

Eugene Ter-Avakyan (Senior Product Manager Travel Search at Huawei)

The worst case is that cherry-picking undermines the whole culture of data-driven decision making, especially when done by senior managers and leaders. On a more tactical level, it could be anything from focusing on a wrong region or launching a feature that negatively impacts lifetime customer value, which was deemed “too complex to calculate for this case”.

Cristian Garcia (Senior Product Manager at Publicis Sapient)

Cherry-picking can be prolonged if no one questioned the data results. If a middle level team shared cherry-picked data to their managers, they could subsequently present the same insights to the upper echelons and the misleading information could be spread without even knowing it’s skewed on purpose. The same misleading information could later on be utilized as marketing collateral and the customers will be misled when evaluating the product.

Casey Phillips (Senior Product Manager, AI/ML at Uber)

Data cherry-picking can be detrimental to true growth and progress in many professional fields. It can lead to decisions being made on a narrative that is false. In most cases, this will eventually manifest itself in some measurable form. But it is often hard to identify and attribute this manifestation to a particular instance of data cherry-picking. False narratives lead to poor decision making, which ultimately leads to poor aggregate results and performance.

Gleb Romanyuk (Economist at Amazon Web Services)

Consider a PM who owns the UX of the details page on amazon.com. One day he may come up with an idea that the “Buy” button should be in the bottom right corner because it is “more convenient.” It may or may not be convenient for the most users of the website but he believes that it is. He gathers evidence in support of the change by cherry-picking the feedback of the customers who agree with him by ignoring the large majority who do not. He launches the change and, here we go, now the UX is less convenient and disruptive for the majority of the users.

Q: How can you bias-proof your research and hypothesis validation process?

When product managers work with data, they should constantly question their methods and results and make sure they are unbiased. Here are some recommendations from experts we spoke to:

Ask yourself questions: Where these numbers come from? How has it been calculated?
Imagine you are in court. If you are a lawyer who presents the evidence that your hypothesis is correct what would a prosecutor say? How would he argue with you?
Compare data results to user-based evaluation
Try to create a transparent structure where everyone has access to the main KPIs and gets the benefit of the doubt

Gleb Romanyuk (Economist at Amazon Web Services)

On a personal level, you might notice that it’s hard to be impartial when you are attached to a specific outcome and select a method or a research design that may favor your preferences. To prevent human biases from leading to cherry-picking, we can set up an environment where there is a third party who can repeat the analysis but who has a different or opposing goal. Like in court!

In scientific research, we have refereed journals. In case of product teams, there can be a separate research team which is not attached to the outcome and therefore doesn’t have biased incentives.

In economics, there has been a movement towards setting up studies when you first fully outline a method of analysis to the smallest detail, and only then apply it to data. And whatever comes out from another end is what is sent for publication.

Eugene Ter-Avakyan (Senior Product Manager Travel Search at Huawei)

My approach is to be critical of any number presented: for any critical data point, I must have a clear understanding of where it came from or how it was calculated.

It is essential to give a person the benefit of the doubt because sometimes it is not easy to immediately understand if it was an honest mistake or intentional misleading. It often helps to decide what data is needed to make a particular decision before the analysis has started, the same way it works in hypothesis testing

Cristian Garcia (Senior Product Manager at Publicis Sapient)

It is difficult to spot cherry-picking data when you don’t have access to it (you are only seeing the results). But you can always spot inconsistencies or gaps by looking at what the users are achieving with the product vs what the data presents. There’s always an ‘educated guess’ element to it. If the onboarding process would convert 5% last month and now — two sprints later — the figure is 30%, it would be interesting to see all the data and make extra sure there isn’t misleading information.

If you do have access to data it is always worth raising concern on why ‘unfavorable’ data has been left out and what are the reasons behind it.

Since we are data-backed professionals, we will affect future releases using biased data. A good way to stop this is to have a shared BI tool or platform where everyone has access to the main KPIs we look after. The BI platform should be directly connected to all the data sources we use and therefore, everyone could filter the data accordingly.

→ Test your product management and data skills with this free Growth Skills Assessment Test.

→ Learn data-driven product management in Simulator by GoPractice.

→ Learn growth and realize the maximum potential of your product in Product Growth Simulator.

→ Learn to apply generative AI to create products and automate processes in Generative AI for Product Managers – Mini Simulator.

→ Learn AI/ML through practice by completing four projects around the most common AI problems in AI/ML Simulator for Product Managers.

Data cherry-picking to support your hypothesis. What is it? Why is it bad?

Q: What is data cherry-picking?

Q: Why do product teams cherry-pick data to support their point of view or hypotheses?

Q: What is the impact of data cherry-picking on the business, users, and execution?

Q: How can you bias-proof your research and hypothesis validation process?