NodeGraph explains Data Quality [Data Quality Management, Concepts & Issues]
Welcome to the first episode of “NodeGraph explains”! In this series, we will tackle a range of data subjects presented through good old-fashioned whiteboard lessons. Our team here at NodeGraph will take turns presenting everything from data lineage to data scalability. Oh, and please let us know if there is something specific that you would like to see next – we are always open to suggestions!
Learn more about data quality in our first lesson
Our first episode focuses on data quality and will take you through prominent data quality concepts, issues, and benefits. Ellen will also help you on the way to developing a data quality management strategy that suits your organization. To find out more, click play above!
Looking for a transcript?
“Hi everyone and welcome to this episode of NodeGraph explains. Today we are going to look at the concept of data quality – a favorite of ours and hopefully our favorite us of yours as well after this session. Let’s take a look at the definition of high data quality and that means having data that is a true reflection of the reality.
And when we speak of data quality we usually take a look at the six dimensions of it.
So these are the completeness, uniqueness, timeliness, validity, accuracy, and consistency. The completeness is if your is your dataset complete you are looking at the full name and then you want both this first name and the last name to be in that data set. The uniqueness is if you are having any duplicates in your dataset or is it a unique in that sense. Timeliness is about to time frame of the data so if you are looking at a report or a monthly review you want to only see that type of data. The validity is if you are seeing names you only want to see the alphabetic characters in that data set, not any numbers. And accuracy is basically if your data is accurate is it a true reflection of your data. And the consistency is especially important if you are having like a large organization and displaying the data in a huge amount of departments, you need consistent data throughout the whole organization.
So, why am I talking about data quality?
Because there is an urgency of reaching a high data quality. We have numbers here exposing that 3% of companies’ data meet basic quality standard needs and this is very reflected of eighty-four percent of CEOs that are worrying about the data quality which is quite a few. And we do have this beautiful pie chart down here about the use of data management strategies and here we can see that more than half of these are having either an active or reactive strategy and there is a few like 26% are having a proactive strategy. Furthermore, it’s 18 percent are having this optimized strategy. And here you might wonder – what is an optimized data quality strategy?
There are a few aspects of it but one thing I really want to tell you that there’s no one size fits all – you need to look at your company and into these dimensions like for example the timeliness and accuracy are particularly important for you, then that is the optimized solution for you. And furthermore, we need to elaborate a bit about the either like human-driven approach or the software-driven approach to data quality. And I will tell you more about this later.
First of all, I’d like to tell you about the benefits of maintaining or reaching a high data quality…
…and there is a lot of them out there but I’d like to highlight a few. One that is really important is that you gain confidence and insight and trust if you are having high data quality. And our friend at QlikView CookBook he said it through this quote here that “confidence closes the gap between the CEOs and the ones who are producing the data”. So the ones that are producing the data know that the data is correct but if the whole organization is confident that this is actually correct you can make decisions upon your data and you gain insights in completely new ways. So that’s a really good benefit.
And we do have the readiness and the readiness is basically about that you are ready if and when something fails in your dataset – you’re ready to correct that in an easy way because you know what high data quality means to you.
Let NodeGraph help
And as I told you before there is this software-driven approach to data quality and the software-driven approach is in particularly important if you want to reach the benefit of scalability. The scalability allows you to scale up your datasets without necessarily the need for more resources because you can scale it up through a software-driven approach if you are configuring the software to do the testing for you – which leads me into what we at NodeGraph do because we are a data quality platform working for both QlikView and Qlik Sense and we offer this test-driven approach through these automated testing we are having where you can test your data and make sure it always has high quality, is always up to date and it’s always working as you configure it to do.
And we also offer an automated documentation where you can see how you set up your Qlik solution all the way from like Qlik syntax and the script you used to create your solution and you can also add comments to that so it will be very transparent throughout the whole organization. We do have a data lineage visualization so you can see all the way from the data source to the end-user application or dashboard how the data was created and what will be affected if something fails in your solution.
This and much more is what we at NodeGraph offer and I do want to tell you the data quality it’s not simple it’s quite complex but you need to start asking these questions and you need to start working with data quality. We at NodeGraph are more than happy to help so please visit our webpage or get in contact with us! And, until that, have a great day!”