The “whats, whys, and hows” of data quality management
Let’s begin with a short disclaimer – reaching an exceptional level of data quality management is near impossible. Yes, with a lot of time and money, a close second might be achievable – but perfection isn’t ordinarily recommended. Why? Because it wouldn’t be the best use of your resources. And it would be nowhere near efficient. So what should you do in order to reach an acceptable (i.e. good) level of data quality management? And how do you actually define quality?
How do you define data quality?
There are many different ways to define data quality but, universally, there are six data quality dimensions that are deemed most essential. These are as follows:
- Completeness (how complete is your dataset?)
- Uniqueness (are there any duplicates within your data?)
- Timeliness (is your data available when you need it?)
- Validity (does the data fit the defined range and definition?)
- Accuracy (is the data an accurate representation of the element it describes?)
- Consistency (is your data synchronized across your organization?)
Tailoring your quality guideline to your organization
However, when it comes down to globally recognized metrics – there are none. Instead, we are yet again faced with the subjective nature of the term “quality”. So where do we go from here? Well, due to the aforementioned subjectivity, we recommend that each and every organization decide upon (and follow) a strict quality guideline that is suited to their specific circumstances and the data that they handle. This way, you are able to ensure that the data that is being used to derive insight and determine business decisions is, in fact, reliable.
For example, if your organization engages heavily in customer contact and outreach, you might choose to focus your energy on ensuring high uniqueness, timeliness, and validity. Uniqueness because you don’t want to fall into the trap of sending duplicate messages to your customers. Timeliness because you want to make sure that all of the contact information that you have is up-to-date. And validity because it is important that you indeed have a valid telephone number, email address, etc.
This ties in neatly with our self-constructed definition of data quality, highlighting the notion that data quality can look different depending on the intended use of the data in question:
“High data quality means having data that is up-to-date, complete, and a true reflection of the reality that is being analyzed. A high level of quality further indicates meticulous data governance – i.e. control over the data and its sources.”
Data quality management looks different for everyone – what’s important is getting started
We’ve said it before and we will say it again – the most important step in optimizing data quality management is getting started. The key is to ask yourself what data quality dimensions are most vital within your organization and start building a structure around that. Yes, it may seem overwhelming and no, there is no clear-cut way to do this – but the problems that come with low-quality data will continue to build up if you do not take action.
(What problems? Find out more in our guide “How much is poor data quality costing you?“.)
But before you decide to walk it alone – we’d love to help. NodeGraph is a data quality platform for QlikView and Qlik Sense that utilizes features such as Automated Testing and Data Lineage Visualization to help you understand and trust you Qlik environment.