Lessons: 2 Aug 2021
The importance of data integrity in startups
Problem
Today, I updated month end subscriber data for July to understand variance from projections. I realised that while my data model was aligned with our subscriber KPIs dashboard, it did not match another dashboard that I (and others) regularly use and reference. The difference between the dashboards was ~17% i.e. significant enough to merit further investigation.
Initial Action
I traced the difference back to different schemas being referenced in the queries. However, due to lack of time and competing priorities, I was not able to investigate the source code fully. As the data scientist who created the subscriber KPI dashboard is on holiday, I will ask others who might be familiar with the differences in the definitions of subscribers and make a decision on the correct numbers to use.
Lesson
It takes up substantial time from team members when KPI definitions are not consistent. Often, having to reconcile different queries and data obtained from such queries can lead to (i) easily avoidable mistakes (ii) lost time which could have been used on higher value tickets. Where possible, teams should encourage tight definitions of key metrics and have single units (individual or team) own ground truth on important metrics. This will encourage more consistent tracking and evaluation in a more timely and efficient manner.