Before we can answer that question, we need to understand what data quality is. If you google “Data Quality”, you will probably be faced with the seven dimensions of data quality which are uniqueness, completeness, consistency, relevance, precision, conformity and timeliness.
Considering this, we can establish that data quality is more than just a single metric — it’s a combination of several dimensions — but why are we not using these dimensions in the BI world?
I believe that the reason behind this is that most BI landscapes have evolved over several years, perhaps even decades. They most probably started as a departmental initiative and have now grown to become the core of the company’s decision-making process. Historically speaking, data quality has been something that has been on the IT agenda for quite some time but since modern BI tools have evolved from the business-side, this has not been a natural part of the development.
How can we change this going forward?
In my experience, the best way to move forward when it comes to implementing data quality initiatives is to focus on the things that will make the most difference. Ask yourself “Do I have good governance of my data?” or “Do I have a proactive testing strategy for my data?”. Just because data governance (and subsequently data quality) wasn’t a top priority when your organization first began building analytics applications and reports doesn’t mean it should not be now.
I have met companies that decided not to implement a governance process because they were afraid of what they might discover. If it would have been my company, I would rather know what I am dealing with rather than to sweep it under the rug. Once we know what’s hiding under the surface, we can slowly start working our way towards our target governance structure.
The same logic goes for testing your data. The correct approach does not involve testing all your data containers and integrations, but rather focuses on beginning by identifying your weakest point and starting from there. I can promise you that this will increase the trust that end-user feel towards their data enormously.
“We don’t have time for yet another process!”
“We don’t have time…”. This is an argument that I frequently hear when data quality comes up on the agenda. I understand, your team doesn’t have unlimited time and the time that you do have you need to spend on developing new KPIs and supporting your BI users. That’s why it’s so important to find a process and a way of working that won’t slow you down but rather will give you more time.
Let me give you an example (true story 😊), once I worked with a company that had a developer who was logging into the most critical BI application every Monday morning just to secure that the data was up to date. Trust me, this is still a very common way of working with data quality in the BI space. We are reactive, not proactive! Imagine a reality in which we instead invest 10 hours today and save 2 hours every Monday. On top of that, we at NodeGraph have automated the testing process, ensuring that you don’t have to rely on manual work. Again, focus on where the pain points are today.
My tips for achieving better data quality:
- Focus on governance
- Add data testing into your operations process
- Move from reactive to proactive
- Start small get big results!
Quality has not been high up on the agenda in the BI world. Most likely because the BI landscape has evolved from the business side. If you would like to start working with data quality, focus on governance and data testing. And most importantly, start small and start now!