← Blog · Data Pipeline · S1 E1
Is data mesh dead, or just misunderstood?
In the very first episode of Data Pipeline, Ilya lights up his pipe and unpacks the hype, and the reality, around data mesh. Once a buzzword on every job ad, the concept promised to fix bottlenecks by decentralizing data teams. But what went wrong? Why did so many implementations fail? Ilya explores the four core principles of data mesh, shares real-world stories, and explains why the real challenge isn't technology, it's people. If you're thinking about data mesh, this episode might save you some trouble.
Transcript
Hello, my name is Ilya. I do data engineering and I like smoking pipes. So I thought, why don't you combine those two things in one and make a blog about data engineering where I can share my thoughts and smoke. And today in my first ever episode I would like to talk about data mesh, the idea which was quite popular for many years I would say, but recently somehow I don't hear about data mesh so often as it used to be. Let's say five years ago every single job description in data area was mentioning data mesh and every company seemed to be sure they needed it.
The idea totally makes sense as centralized data teams became a bottleneck in many companies. It was somehow natural to think about decentralizing, and the idea, which was I think first raised in 2019, is based on four principles. So first it's about domain oriented ownership, where if you produce data you own it. So the ownership of the data moves to domain teams, while in an old setup there was a centralized big data team which kind of took care of it. The second principle is "data as a product". So you start treating data as any other product. You assure its quality, its usability, its accessibility. It's not something on top. It's actually part of your product. The third basic principle was self-serve data infrastructure, so in order to enable your teams to build their own data products, you need to give them a platform. And last but not least, federated computational governance. So you still need data governance, but you don't want your governance to become a bottleneck.
Sounds great. I remember I spoke to a company which had three people in their data team, an engineer and two data analysts, and they told me they're building data mesh. I was like, wow, interesting. How are you guys planning to decentralize three people? So I guess even though the idea is good, it's not a silver bullet, and the problem is as always implementation.
Somehow it's similar to moving from a monolith system in a backend to microservices: you decentralize, you distribute, but then you get quite some complexity in connecting those services with each other. Another analogy which comes to my mind is I think similar changes happened to quality assurance and DevOps teams a bit earlier. At some point centralized quality assurance teams were resolved in most of the companies and quality engineers moved to domain teams. Same has happened to DevOps, so maybe it's just data people's turn.
It all sounds great, but in real life I saw attempts to implement data mesh failing in multiple companies, and I would say mostly the reason was that people focus too much on technology and leave the social component behind.
In my opinion, data mesh is 70% people and communication and the way people work, and just 30% tools and technologies. So if you want to move your data people and embed them into product teams, you need to change the way those teams work. It's not like implementing or rolling out another database or another ETL framework. It's so much more. What also happened, that I believe is a cargo cult: people thought big guys are doing data mesh, we will do the same, and without understanding the principle, without actually doing the hard parts of it, they just focused on the most simple: let's move data analysts to product teams and call it data mesh. I'm afraid it's not working like this.
Summarizing, I would say the idea of decentralizing your data team is great. I really believe that putting your data people close to the domain and helping them understand the business problems is crucial, and probably the only way to avoid the data team becoming a bottleneck. But at the same time don't forget you are dealing with people, and don't forget it won't solve all your problems. So if you have communication issues, they won't magically disappear if you call your new setup data mesh.
Thanks a lot for joining me today. I'm very interested in your ideas and experiences around data mesh, so please reach out and share any feedback, and hopefully see you in the next episodes.