Unissant

 

Data Quality - An Introduction

A PLAIN WHITE PAPER

What is Data Quality and Why is it Important to Control?

Data Quality is the degree to which a set of data values meets the usage requirements of its various users. In other words: is the data “fit” to be used for its intended purpose(s)?

Therefore, “data quality” is relative to each individual or group of individuals that uses the data. This relativity is what can make the implementation of a robust data quality function a challenging exercise. While one can create measures simply showing how often a data field is populated, its range of values, and how the values are distributed, it is not until business context is applied that these measures become relevant and effective controls can be established.

So, if data quality is in the “eye of the beholder” (the data user), data quality controls are driven by those user’s priorities, processing needs, and rules. Those needs will shift and change over time and, therefore, so will the controls. If  an organization is serious about managing data quality an enduring data quality control function will be established. Why is this important?

Data, the digital building blocks of information, is a valuable corporate asset that outlasts applications and processes. Data is used as input into tactical and strategic decisions. Analysis of data allows organizations to learn and grow. Data is received from, and transmitted to, customers and business partners. Data is reported to investors and regulators and is used by the media. Data flows through every process your organization runs and is used by every individual your organization employs. As important as your central nervous system is to your body, data is that important to your organization. Data signals when you should move, how fast, and in what direction. If those signals are faulty, poorly managed, or constantly in conflict you may find yourself paralyzed with inaction or moving in the wrong direction at the wrong speed. So, what steps can you take to ensure your data’s quality is understood and sufficient for your organization?

What Components Comprise a Robust Data Quality Function?

“Quality Control:” an aggregate of activities (as design analysis and inspection for defects) designed to ensure adequate quality of a target object or output.

A Data Quality function is nothing more than a quality control process applied against the data relied upon by your organization. As the above definition implies, a quality control function is a set of activities needing to be designed and executed effectively. These activity categories include:

  • Data Profiling – the process of understanding the current state of your data. This is a necessary initial analysis step to provide basic information about your data. This step identifies potential problem spots, spawns follow-up questions, and informs stakeholders on the true (versus perceived) state of the profiled data. Data Profiling provides facts and it can be accomplished quickly by using a data quality or data management tool.
  • Data Quality Rules – the applicable constraints in relation to the data’s business context. This is where the data rubber starts to meet the business road. Some rules are generic and simply measure how well populated a field is, whether the data is valid for its basic data type, does the data exceed a broad reasonability test, etc. Other rules are more specific to the industry, company, process, and subject matter context that the data is being used within.
  • Data Quality Controls – the rules, processes, and systems used to identify and address the risk that the data does not meet its required level of quality. Controls can be automated, manual, or a mix of both. When implementing controls consider the different mix of control types that can be utilized.

 

  1. Detective – identify defects but do not stop them from initial processing.
  2. Corrective – fix or enrich data that is defective or deficient.
  3. Preventative – identify and stop defective data from being processed.

 

When designing controls, consider the following:

  • Where will the controls be placed (implemented) along your information supply chain?
  • In which cases will the control stop data from being processed, versus allowing processing with notification for follow-up?
  • Who will monitor for control exceptions (defects)?
  • If a risk, issue, or defect is identified, how will it be tracked, prioritized, and resolved? Who will do this?
  • How will controls for process-level risk differ from controls for systemic (entity-level) risk? Do you need/want both? Why, or why not?

 

Data Quality Reporting & Metrics

Information about data quality levels across a given process, data set, or an enterprise. Reports and metrics are a type of detective control. They inform the parties responsible for ensuring data quality as to how well they are doing their job. They inform data users as to the overall quality of the data. Reports and metrics can be developed to cover a single point in time view and to show trends and variance over time. Metrics (facts) on the reports can be summarized, aggregated, and viewed from different perspectives and across various dimensions.

 

When defining and developing data quality metrics and reports considers the following:

  • Who will be able to view the reports and what metrics and metric views will they want to see?
  • How often will the reports be refreshed?
  • What is the right mix of pre-canned and ad-hoc reports?
  • Are there existing company quality control metrics that can be leveraged?
  • Can existing operations monitoring techniques used at your organization be leveraged, such as Six Sigma, Total Quality Management (TQM), and Statistical Process Control (SPC)?
What Data Should Have Quality Controls?

 

Ultimately, all organizational data should have some degree of quality control. However, it is not realistic to assume that all data will have the same level of rigor applied. Combining & applying the dimensions of criticality, commonality, and quality to your organization’s data assets to establish a prioritized list of data categories is a key initial step on the path to better data quality.

 

Some data is more critical than other data. Its degree of importance is driven by its usage – i.e. the criticality of the processes, reports, or decisions relying upon that data. Each organization will have a different view of their most critical processes and decisions, and therefore, pieces of information. To make things more complex this view will change over time.

 

Additionally, some data is more highly shared. Customer and product data may be used repeatedly across many of your organization’s processes, while other information may be limited to use by a single function. A good indicator of the importance of a data subject to your entire enterprise is its level of sharing or common use. Finally, some data is simply better than other data. Higher quality data typically needs less attention than data that is known to be defective.

 

Once intelligence is gathered, create a prioritized list of processes and/or data categories. Warning: the prioritization process has the potential to drag on if there is unresolved conflict and an overly broad span of coverage. Be sure that basic decision making roles, processes, and timeframes are in place prior to embarking on this initial step. But most importantly get started. With each successive data quality project more data risks and defects will be addressed; you will become more adept at prioritization, control coverage will expand, and you will mature in the art of designing and implementing controls.

 

What are some Best Practices for Improving Data Quality?

Improving data quality is an iterative process by which both the data itself and the controls that manage data quality levels are improved and matured over successive cycles. Below are guiding principles to follow:

  • Understand current data quality levels and practices at a broad, high-level; get started in a targeted area, then expand scope.
  • Data Quality is never “done.” It is a continuous improvement process baked into everyday business and IT functions.
  • Data content is owned by the business. Data systems are run by IT. Data Quality depends upon both constituent groups.
  • Don’t drive the effort exclusively from one side or the other.
  • Establish a core competency center in data quality.

 

When forming this center, create a team(s) possessing the following attributes:

  • Strong communication skills and ability to understand and work with others.
  • Substantial data analysis skills from both a business and technology perspective.
  • Deep expertise in statistics, quality control, and process engineering.
  • Demonstrated knowledge and experience using a data quality tool.
  • Demonstrated ability to get things done.

 

While there may be no one person on the team that embodies all of the above characteristics, the combined group should cover them all. How many teams and team size is variable depending upon the nature of your organization and initial scope. The general guideline is to start small (one team with no more than 3-5 resources). Additional data quality competency teams can be established later, if needed.

  • Controls must be both designed and executed effectively.
  • It does no good to architect an elegant suite of controlled processes if the operational function that will employ and react to deviations does not function well.
  • A diligent operational function may be hampered by poorly placed or highly-manual controls that limit control coverage, or inhibit the efficiency by which defects are identified, prioritized, and fixed.
  • Focus on both operational controls at a local process level and enterprise control reports that gauge overall data quality risk for the enterprise.
Conclusion

 

In conclusion, data is an extremely important asset to organizations and it must be continuously improved based upon data, consumer feedback, and priorities. By leveraging a well-planned Data Quality effort, the value your organization receives from your data, in relation to the associated data acquisition, management, usage costs, and risks will increase significantly.

References & Bibliography
About the Author

Eric Hartung has extensive program management and operations expertise in the areas of Data Strategy & Governance, Data Architecture, Data Quality, Data Management, Data Warehousing, Business Intelligence, and System Integration. Mr. Hartung also possesses deep industry experience in the mortgage and financial services industry, having worked at large financial institutions in both business operations and information technology roles for over 16 years.

Contact Unissant

Unissant Logo - DSTI tag
11800 Sunrise Valley Dr., Ste 1000
Reston VA 20191
Email: information (at) unissant.com
Phone: 703.889.8500
Fax: 703.889.8501
www.unissant.com

Unissant is an innovative software development and consulting company that manages complex initiatives, solves data challenges, and transforms business.  Unissant brings technical excellence and program/project execution best practices that exceed the expectations of our clients in the Banking and Finance, Health and Life Sciences, National Security, and Federal/Civilian sectors.

Copyright © 2006-2015 Unissant, Inc. All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of Unissant, Inc., except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted by copyright law.

End of Document