IOblend Data Mesh – power to the data people! Analyst engineering made simple
Hello folks, IOblend here. Hope you are all keeping well.
Welcome to the IOblend blog page on ChamberMK. We are the creators of the IOblend advanced DataOps platform.
Over the many (many!) years, we have gained experience and insight from the world of data, especially in the data engineering and data management areas. Data challenges are everywhere and happen daily, so our mission is to keep solving them with the latest tech and know-how so that you won’t have to. We want to democratise working with data, make it simple to engineer and govern, and just generally let you, data folks, get on with working on the data insights themselves rather than the inner workings of engineering it all together.
What we want to do in this section is to share some of the best practice, tips and tricks, or just cool ways of doing things with DataOps (and our platform, naturally). We want to show you a different perspective on doing things you do every day simpler and better. But we do not want to make the blog overly taxing to digest or deeply technical (that would defeat the whole purpose of what we are promoting!)
In our first blog post here, we will discuss the benefits of implementing a data mesh architecture in your organisation.
Companies are increasingly leaning towards self-service data authoring. Why, you ask? It is because the prevailing monolithic data architecture (no matter how advanced) does not condone an easy way to manage the growing data needs of your organization. Centralised data processing and management make it difficult to meet your business demand – you need a sizable engineering team to handle requests and manage the entire ETL cycle. On top of that, the engineering team does not necessarily possess sufficient knowledge of the qualities of the data inputs/outputs beyond prescribed SLAs – aka they do not “own” the data, but merely process it for the owner teams.
Today’s data lakes, lakehouses and data warehouses still represent a centralised architecture pattern. Although very powerful, they are predominantly deployed to support the “centralised” thinking – a complex ecosystem operated by a team of highly skilled engineers who strenuously try to meet the data demands of the business.
However, in a world of ever-growing data sources, data products and user needs, the monolithic architecture is not proving ideal. As the demand grows, new use cases require different types of transformations and associated management policies, putting an increasingly heavy load on the platform engineering resources. This eventually leads to delivery bottlenecks with unserved frustrated data consumers and over utilised and disillusioned data platform engineering teams.
Below is just one of many examples of the centralised architecture pattern we have encountered on our journey. This organisation was attempting to develop a complex data analytics product to help them improve efficiency and profitability. The project required a large number of distinct “hardened” data pipelines scripted by two dozen skilled engineers and was going to take six months to deliver. Unfortunately, as they costed up the project, it became painfully clear that it was prohibitively expensive to implement using the centralised approach. The costs were outweighing the product benefits, so the management were not prepared to sign it off.
It is highly likely that your organisation is using some form of a centralised architecture. After all, it is the norm. You are working very hard at reducing the bottlenecks by growing/upskilling your engineering teams and buying various tools to help you in the effort. But what you are essentially doing is applying brute force to overcome the issues of an inherently inefficient architecture – it will never be ideal, no matter how much resource you throw at it. You can have the sleekest infrastructure on the planet, but the centralised approach to data will still be your biggest bottleneck. You can scale your infrastructure, but you will always struggle to scale your central pipeline.
This is where a federated, domain-driven data architecture pattern, known as Data Mesh, gives the business a better option: robust quality data with business areas handling their own dataflows. Originally coined by Zhamak Dehghani, a ThoughtWorks Director of Emerging Technologies, data mesh architecture is now starting to attract a lot of attention. Data meshes address the shortcomings of the centralised architecture by shifting data ownership to the subject matter experts (aka data domains). Data meshing unlocks much greater data experimentation and innovation by the businesses while significantly lessening the technical burden on the central data engineering teams.
The benefit is clear: let the subject matter experts (SME) process and manage their own data (to an agreed set of enterprise standards) and supply it to the rest of the data consumers as a “product”. They know their data better than anyone else and are best placed to be the custodians of it. At the same time, they can quickly experiment, scale and augment their data as the business demand evolves. There is no longer a need to load up your central data team with a multitude of requests, no more bottlenecks.
This means that the data domain themselves now do data engineering, management and governance for their data domain. The location where the data physically resides is a technical decision and can be central or federated.
The key outcome is that the data SMEs now “own” their data domain. They themselves will write their dataflows, apply standardised governance policies of the business, and manage data quality end to end. There is no longer a central backlog to add your new jobs to and no need to wait for someone else to execute your dataflows.
In a federated architecture, domains will interact with each other directly, easily sharing relevant data, insights and knowledge. This domain networking greatly increases organisational productivity, efficiency and reduces cost.
Back to our earlier example. For the data project to succeed, the company had to think “outside the box”. The only way they could pull it off was by allowing the analysts to work with the data directly and create hardened data pipelines themselves. In a way, this was a form of the data mesh. The analysts knew what data was needed where and when better than any data engineer ever could. All they needed was a capability to do the engineering themselves. Since at the time they still lacked advanced DataOps tools like IOblend, they had to use a few of data engineers to help develop and productionise the pipelines with the conventional apps. Even then, the benefits were impressive: 50% reduction in resource demand, 45% reduction in project timelines and cost, and a much more engaged and collaborating project team.
In the end, this project was made possible by applying a federated approach to solve a complex and expensive data engineering challenge efficiently.
One of the challenges with the data mesh pattern is that data scientists and analysts are not engineers. Developing production-grade dataflows is a very specialised skillset. It is not normally possible to train up your analysts to do full-on data engineering, which puts proper data meshes in the realm of aspiration at the moment. You can embed data engineers within the SME domains, but you will again have a single point of execution, just on a narrower scale. What you ideally need is the analysts fully creating and managing dataflows themselves – no middle layers, this is true data democratisation.
That is where IOblend comes in. We have created a powerful DataOps platform that naturally facilitates the creation of a data mesh, truly supporting data democratisation IOblend enables the data analyst and scientist to create product grade data pipelines, data assets and governance through automation, without them having to upskill their data engineering skills or rely on data engineers.
IOblend also addresses another major data mesh complexity – potential duplication of effort and skills needed to maintain dataflows and infrastructure in each SME domain. Federating data domains can lead to silo-ed teams, if not carefully executed. To avoid this unfortunate outcome, we suggest utilising a central platform that handles the dataflow clusters, storage, and streaming infrastructure for ease of a unified oversight and maintenance. At the same time, dataflow assets should also reside in a central repository so that all domains have access to the prior work and shared knowledge.
An important tenet of a data mesh is collaboration amongst domains to ensure siloed development does not happen. IOblend can help here as well. Our platform automatically creates an easily searchable catalogue of all data assets created, whether that’s a data pipeline, a table, file etc. This encourages reuse and avoids duplication. We avoid silos by having the full observability out of the box.
IOblend gives your data teams a common, domain-agnostic, and automated capability to achieve full data standardization, data lineage, data quality, alerting and logging – all with one platform. These features and the simplicity of implementation and use make it a “must try” product for any organisation considering going the data mesh route.
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behaviour or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.