About Business Intelligence: The Quality of Data

11208982434_a58edfc401_z

For Serra (2002), the effective Data Management function relies on standards and policies regarding data, their definition and usage. These standards and policies must be defined and adopted, being stringent, comprehensive, flexible to changes aiming reusability, stability, and the effective communication of the meaning of the data, as well as enabling their scalability. One should use tools such as data dictionary and repositories for data management. Data must be well defined, sound, consistent, reliable, safe and shared so that each new system defines only the data that is within its scope and shares the other data with other systems in the organization.

For Kimball (1998), warehouse design often begins with a load of historical data that requires cleansing and quality control. In existing ones, clean data comes from two processes: inserting clean data and cleaning/solving inserted data problems. In addition, establishing accountability for data quality and integrity can be extremely difficult in a Data Warehousing environment. In most transactional systems, important operational data is well captured, but optional fields do not receive attention and system owners do not care if they are accurate or complete if the required logic is being met. Thus, business and information systems groups must identify or establish an accountable person for each data source, whether internal or external, treating the data from a business perspective. The quality of the data depends on a series of events, many beyond the control of the data warehousing team, such as the data collection process that must be well designed and count on a great commitment of the people that perform the entry of those data with their respective quality. Once established the value of the data warehouse, it is easier to induce the necessary modifications to the data entry processes of the source systems aiming better data.

Kimball (1998) further argues that it is unrealistic to expect any system to contain perfect data, but each implementation must define its own standards of data quality acceptance. These standards are based on the characteristics of the quality data that are: accurate, complete, consistent, unique and timely – the warehouse data is consistent with the records system (accurate), and if not, reason can be explained; They represent the entire relevant set of data, and users are notified of the scope (complete); They have no contradictions (consistent); They always have the same name when they have the same meaning (unique); They are updated based on a useful agenda for business users, the schedule is known and people accept it that way (timely). In addition, quality data simply represent the truth of the facts.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Our future posts that complete the current “About Business Intelligence” theme will be:

  • About Business Intelligence: Data Warehouse

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits: