About Business Intelligence: The Quality of Information

11022926536_d6652fba3d_z

Any quality information depends directly on quality data. One problem is the fact that today’s software production is still being done in an artisanal way. Serra (2002) even classifies system development professionals as “intellectual artisans” given the lack of controls and well-defined processes for that activity. Despite this difficulty in the efforts to measure the quality of software development processes, at least concrete results have been obtained by applying the methods of Kimball (1998) to Data Warehousing, in such a way that by them we have defined processes for the measurement and processing of information quality.

Consistent information means high-quality information. This means that all  of the information is accounted for and is complete. (KIMBALL, 1998, p.10).

Data staging is a major process that includes, among others, the following sub-processes: extracting, transforming, loading and indexing, and quality assurance checking. (KIMBALL, 1998, p.23).

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Our future posts that complete the current “About Business Intelligence” theme will be:

  • About Business Intelligence: Data Warehouse

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

About Business Intelligence: The Quality of Data

11208982434_a58edfc401_z

For Serra (2002), the effective Data Management function relies on standards and policies regarding data, their definition and usage. These standards and policies must be defined and adopted, being stringent, comprehensive, flexible to changes aiming reusability, stability, and the effective communication of the meaning of the data, as well as enabling their scalability. One should use tools such as data dictionary and repositories for data management. Data must be well defined, sound, consistent, reliable, safe and shared so that each new system defines only the data that is within its scope and shares the other data with other systems in the organization.

For Kimball (1998), warehouse design often begins with a load of historical data that requires cleansing and quality control. In existing ones, clean data comes from two processes: inserting clean data and cleaning/solving inserted data problems. In addition, establishing accountability for data quality and integrity can be extremely difficult in a Data Warehousing environment. In most transactional systems, important operational data is well captured, but optional fields do not receive attention and system owners do not care if they are accurate or complete if the required logic is being met. Thus, business and information systems groups must identify or establish an accountable person for each data source, whether internal or external, treating the data from a business perspective. The quality of the data depends on a series of events, many beyond the control of the data warehousing team, such as the data collection process that must be well designed and count on a great commitment of the people that perform the entry of those data with their respective quality. Once established the value of the data warehouse, it is easier to induce the necessary modifications to the data entry processes of the source systems aiming better data.

Kimball (1998) further argues that it is unrealistic to expect any system to contain perfect data, but each implementation must define its own standards of data quality acceptance. These standards are based on the characteristics of the quality data that are: accurate, complete, consistent, unique and timely – the warehouse data is consistent with the records system (accurate), and if not, reason can be explained; They represent the entire relevant set of data, and users are notified of the scope (complete); They have no contradictions (consistent); They always have the same name when they have the same meaning (unique); They are updated based on a useful agenda for business users, the schedule is known and people accept it that way (timely). In addition, quality data simply represent the truth of the facts.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Our future posts that complete the current “About Business Intelligence” theme will be:

  • About Business Intelligence: Data Warehouse

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

When to use Enums?

Let us consider this short real-life example:

One day, I was helping the guys at our local Housing Agency to build a new online form that was going to feed the database with candidates for the government’s low-cost housing benefit. Months later, after the solution was on-air and people had fed in their information, the same small Scrum team was responsible for building reports on that data.

solar-system-word-clipart-1
An enumeration example: The Solar System

One of our team members had left the project for another job and up to that moment we were alright implementing everything as demanded from above. But one thing happened to catch my attention: there were some data that, at first sight, did not have any correspondence among the report and the database.

We went up and down the documentation trying to figure out how to map all the report fields and the database we had (one born for the other for a very simple task). Despite the database was well documented, apparently, very little could be done to solve the problem and deliver the reports on time. Where the reports asked for marital status, income range, and information of the kind that usually go in combo boxes we only could find numbers (integers) in the database!

3-states-of-matter-clipart-1
3 States of Matter: real enums do not have states added during its existence

After some hours we were struggling with that, I decided to open the code, find and clamp what had the possibility to be the data structure I was looking for.

genders
Genders: a simple and genuine enumeration example

An enumeration is a complete, ordered listing of all the items in a collection. The term is commonly used in mathematics and theoretical computer science (as well as applied computer science) to refer to a listing of all of the elements of a set. – Enumeration – Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Enumeration

I believe in the benefits, agree and support the values of Scrum and really think of it as the closest approach to good effectiveness of a small team dealing with multiple projects, and I also would never consider that colleague that left the team a bad or average programmer (actually he is one of the best programmers I’ve met in my journey). The fact is that pressure and heat generated some bad gases in that event. The exhaust valve for the major part of OO programmers is called “enum”.

trafficlights
Traffic Lights: such a good enum example that has a classic usage for reporting purposes

That situation inspired me to observe more that “enum” thing and during some other experiences with that whatchamacallit datatype I was convinced I would be able to research a little and maybe bring something useful for the scientific and software development communities (for those that my words convince, of course).

For that reason, I’m going to start a series of posts that will depict in parts my monograph
“THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND THE QUALITY IN BUSINESS INTELLIGENCE”, freely translated from the original in Portuguese, and where I expect to introduce the view of some authors on Software Design Science, Business Intelligence, THE ENUM, and some other things, usually related in a BI environment and, above all, to decipher the enumerations and when and how it’s better to use them.

tencommandments
The Ten Commandments: no change forever and ever

As database people we sometimes feel uncomfortable when developers tend to use certain methods, so, my proposal to answer the question “When to use enums?” came up after some debate among our professional circles. Some colleagues support my point of view and some avoid or do not like it. All in all, there is still a gap between code and data. Let’s explore it?

rainbow
The 7 visible colors: there’s a treasure hidden for those who seek authentic enumerations

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

 

Image credits:

The images used in this post, edited or not, are Creative Commons (CC) and the originals are credited to their creators and can be found at: