I’ve been working on the central data platform team at ShareChat for almost 3 years now, and it never ceases to surprise me how many people don’t fully grasp what a data platform is - or what role a product person plays in its development. So, let’s cut through the confusion of data platform being merely a tech stack, define and understand it as a product. Because at it’s core, the data platform is a product (or how I like to put it - a portfolio of products) that delivers value to hundreds of users to power the actual user facing products.
As new products and technologies emerge and user bases expand, the volume, velocity, and variety of data are growing at an unprecedented pace. Just as we use apps for everything—from meal prepping to finding flatmates—accessing data to make decisions is now essential. We collect data from our users to serve them better and improve their experiences.
The Data platform is how and where data is ingested, stored, transformed, accessed and served back, at scale. Now there are some obvious questions that may arise here, “who are the users of a data platform and do we understand them?” , “what is it that we’re trying to serve?” , “what are the goals, and north star metrics of a data platform team?” and “is this big an investment in data, really worth the time, money, energy and product efficiency?” Let’s tackle each of them.
Data Platform’s Users
Beyond just building technically sound products that align with company objectives, it’s important for any product team to prioritize their users’ needs. But wait, who are the users of the data platform?
Data analysts, engineers, data scientists, operations associates and leadership alike are our users. Simply put, everyone who needs and uses data, is a user of the data platform. We can look at it as a horizontal platform (as opposed to verticles) that provides the foundation to everyone in the organization to build upon. Whether it is writing queries, scheduling data pipelines, monitoring processes, troubleshooting issues with the help of data pipelines or with data in general, our users have some core needs, which involve all things data. It goes without saying that our users are tech savvy & demanding in terms of their requirements. And to be a step ahead of them to try understanding this, means to understand what data they might need, how we can help orchestrate that, which technologies would suit their usecases better, and so on.
As a product manager, it’s important to understand the user personas and how to enable and help them succeed with data. This means not only providing the right tools and technologies but also creating an intuitive + seamless experience that empowers users to leverage data effectively. Because ultimately, success of the data platform is measured by the success of its organization & it’s users :)
Data Platform as a portfolio of products
Now that we know who are users are, what their intents are and what they need, it’s important we understand how they leverage data - that is through data products. A data platform, at the end of the day, is a portfolio of data products that work together to provide value.
A data platform may include the following (don’t forget each of these can be built in-house or purchased, depends on value generated vs time the company has to spare):
- Data Storage - We need to store data somewhere - could be a data warehouse, data lake storage or even a hybrid storage solution, lakehouse - to house vast amounts of data, whether structured or unstructured, raw or aggregated, user generated or modeled. Plain data storage. Eg: BigQuery - we use BQ to store raw event data as well as aggregated data, which is used by analysts & engineers.
- Data Integration, Aggregation, Transformation - data engineers use these tools to consolidate data from various sources into a unified system. Eg: Apache for integration and further raw data is needed to be transformed to be in usable shape & format. Eg: Apache Spark, Beam, Google Dataflow.
- Query Engine - Analysts use query engines to interact with data storage and run complex SQL queries. For example, they might use BigQuery to write & run complex queries on datasets (which are also stored inside BQ). Perks of using serverless engines like BQ or Snowflake is that we don’t need to worry about anything running on the backend, we can just simple write & run a query on an intuitive UI. More on how to keep your warehouse efficient, here.
- BI Tools - Who doesn’t need fancy dashboards to make quick data driven decisions? Business leaders, analysts and engineers alike use BI tools to visualize data and derive insights. These dashboards are created via a BI UI and queries are run to fetch on the warehouse/ query engines to populate data on the dashboards. Colourful & fancy looking dashboards powered by data. Eg: MS PowerBI, Tableau, Looker, Superset, Redash, Metabase.
- Data Quality & Observability - Data engineers need tools to ensure the reliability and accuracy of data pipelines. Teams can set up validation tests that automatically detect and alert users to anomalies or discrepancies, ensuring that downstream data remains accurate and trustworthy. There are tons of tools that can help you do this - open-sourced as well as managed both - Great Expectations, Monte Carlo Data, Databand.io etc. More on data quality here.
By developing these components as individual products within the data platform, we can ensure the success of our users & org. While it’s possible to purchase these solutions as managed services, much of this infra can also be built in-house. However, there’s no need to build it all from scratch. This brings us to a common dilemma for PMs: the build vs. buy decision. Often, a hybrid approach is most effective, leveraging third-party tools for foundational components while developing custom solutions for specialized needs.
Data Platform’s Goals & North Star Metrics
By now, it isn’t difficult to guess that the success of a data platform comes from the success of the organization. That’s good and bad news both. Because it puts you, as a PM, in a tricky position - it becomes difficult to justify your impact / existence as compared to the revenue generating / user facing division of your org.
But the good news is that, being in horizontal teams that manage data / tech for the organization, it gives you enough leverage & a unique advantage to impact cost (server / data cost). We’re in a world where anything can be bought & this herd mindset has led us to literally buy & pay for everything so there’s always enough scope - a teeny-tiny optimization can lead to huge cost savings. I’ll come back with the many, many stories that I have to tell here, just about cost savings! (coming soon :)).
Back to what goals & metrics do data platform teams chase walk towards -
-
Cost Savings
- Goal - Manage performance vs cost optimization. Reduce server / data cost by x%.
- Metrics - establish cost across various data products. Eg: Storage cost, Compute cost, Dashboarding cost etc. Trust me, 90% of the time we spend on driving initiatives that lead to cost savings, and it’s 100% worth the effort.
-
Performance
- Goal - Enabling & increasing Productivity of engineers / analysts, ensuring scalability to handle increasing data volumes and complex queries efficiently
- Metrics - Query runtime, compute cost optimization
-
Data Quality
- Goal - Maintain high data accuracy, consistency, and integrity across all datasets.
- Metrics - Data quality score based on accuracy, completeness, and timeliness; reduction in data errors and inconsistencies
- Data Democracy, if you aren’t there yet. Read more here.
Let’s bring it all together!
Having been part of the data platform journey here at ShareChat, I’ve grown alongside our data platform. It’s evolution here, was much like navigating uncharted waters. I’ve seen the data platform through its inception - navigating through a mess, to realizing we’re spending too much, to building & buying data products, to insane cost savings, to fuelling user productivity, and to projecting it as being more than just a tech stack - to making it a well aligned, user centric portfolio of products.
After all, the real magic happens when data becomes more than just tool, numbers and charts, it becomes the foundation upon which products are built!