About Business Intelligence: The Quality of Information

11022926536_d6652fba3d_z

Any quality information depends directly on quality data. One problem is the fact that today’s software production is still being done in an artisanal way. Serra (2002) even classifies system development professionals as “intellectual artisans” given the lack of controls and well-defined processes for that activity. Despite this difficulty in the efforts to measure the quality of software development processes, at least concrete results have been obtained by applying the methods of Kimball (1998) to Data Warehousing, in such a way that by them we have defined processes for the measurement and processing of information quality.

Consistent information means high-quality information. This means that all  of the information is accounted for and is complete. (KIMBALL, 1998, p.10).

Data staging is a major process that includes, among others, the following sub-processes: extracting, transforming, loading and indexing, and quality assurance checking. (KIMBALL, 1998, p.23).

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Our future posts that complete the current “About Business Intelligence” theme will be:

  • About Business Intelligence: Data Warehouse

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

About Business Intelligence: The Quality of Data

11208982434_a58edfc401_z

For Serra (2002), the effective Data Management function relies on standards and policies regarding data, their definition and usage. These standards and policies must be defined and adopted, being stringent, comprehensive, flexible to changes aiming reusability, stability, and the effective communication of the meaning of the data, as well as enabling their scalability. One should use tools such as data dictionary and repositories for data management. Data must be well defined, sound, consistent, reliable, safe and shared so that each new system defines only the data that is within its scope and shares the other data with other systems in the organization.

For Kimball (1998), warehouse design often begins with a load of historical data that requires cleansing and quality control. In existing ones, clean data comes from two processes: inserting clean data and cleaning/solving inserted data problems. In addition, establishing accountability for data quality and integrity can be extremely difficult in a Data Warehousing environment. In most transactional systems, important operational data is well captured, but optional fields do not receive attention and system owners do not care if they are accurate or complete if the required logic is being met. Thus, business and information systems groups must identify or establish an accountable person for each data source, whether internal or external, treating the data from a business perspective. The quality of the data depends on a series of events, many beyond the control of the data warehousing team, such as the data collection process that must be well designed and count on a great commitment of the people that perform the entry of those data with their respective quality. Once established the value of the data warehouse, it is easier to induce the necessary modifications to the data entry processes of the source systems aiming better data.

Kimball (1998) further argues that it is unrealistic to expect any system to contain perfect data, but each implementation must define its own standards of data quality acceptance. These standards are based on the characteristics of the quality data that are: accurate, complete, consistent, unique and timely – the warehouse data is consistent with the records system (accurate), and if not, reason can be explained; They represent the entire relevant set of data, and users are notified of the scope (complete); They have no contradictions (consistent); They always have the same name when they have the same meaning (unique); They are updated based on a useful agenda for business users, the schedule is known and people accept it that way (timely). In addition, quality data simply represent the truth of the facts.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Our future posts that complete the current “About Business Intelligence” theme will be:

  • About Business Intelligence: Data Warehouse

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

About Business Intelligence: The Relationship Between Data, Information and Knowledge

11297873123_4b8df86e6d_z

It is deduced from Serra (2002) that it is necessary to know the relation between data, information and knowledge. It can be said that the data is a record, the information is a fact associated with a record and knowledge is the identification of an information according to a rule.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Our future posts that complete the current “About Business Intelligence” theme will be:

  • About Business Intelligence: Data Warehouse

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

About Business Intelligence (BI)

14583229327_62848089a7_o

The definition of Business Intelligence is born from the knowledge of what is Data Management. According to Barbieri (1994, apud SERRA, 2002), data management can be defined as a function of the organization responsible for centrally developing and managing strategies, procedures, practices and plans capable of providing the necessary corporate data, when necessary, covered with integrity, privacy, documentation and sharing.

Data Management has a very broad operation and participates in organizational strategic planning, detects future information needs, plans the databases to meet the organization’s business and manages all of the organization’s data (even those non-computerized). However, for Serra (2002), Data Management has changed its name and is now labeled “Business Intelligence”, however, the purpose and characteristics remain the same. The truth is that the term BI encompasses a variety of software and practices that facilitate decision-making (operational, tactical or strategic) by analyzing information of responsible quality.

Business Intelligence represents a great potential for organizational change and, for this reason, has inherent challenges in its implementation due to human resistance to change. Thus, cultural and technical challenges can pose threats and opportunities. One of the key challenges for the BI project is the issue of data quality, which must be clean and consistent to substantiate the information that will be generated. In this way, it is necessary to be certified that the data that feed BI have these attributes (cleanness and consistency) so that the information generated is reliable. The need for data cleansing and consistency used in BI is addressed in the data staging phase of the Data Warehouse, of which the BI tools are part.

We educe from Serra (2002) that there are some important factors to be considered in Business Intelligence. Among them are the relationship between data, information and knowledge, the data quality, the information quality, the relationship between operational information and managerial information, and the adequacy of information to business needs.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Our future posts that complete the current “About Business Intelligence” theme will be:

  • About Business Intelligence: Data Warehouse

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

About The Software Design Science – Part 2 of 2

11075945736_386b87388a_o

Along the book “Code Simplicity”, Kanat-Alexander (2012) discusses the truths about software design during its whole life cycle, invoking to existence the missing science of software design, in which every programmer is a designer, with arguments for such, he still justifies the reason it took so long to come to light. In this way, didactically, in the appendices, he lists the laws that we comment here:

  • The purpose of software is to help people.
  • The Equation of Software Design is:
D = (Vn + Vf) / (Ei + Em)

where:

D represents the desirability of the change.

Vn represents the value now.

Vf represents the future value.

Ei represents the implementation effort.

Em represents maintenance effort.

The Equation of Software Design is the primary law of software design. As time goes on, that equation will reduce to:

D = Vf / Em

That demonstrates that reducing the maintenance effort is more important than reducing the implementation effort.

  • The law of Change: The longer your program exists, the more probable it is that any piece of it will have to change.
  • The Law of Defect Probability: The chance of introducing a defect into your program is proportional to the size of the changes you make to it.
  • The Law of Simplicity: The ease of maintenance of any piece of software is proportional to the simplicity of its individual pieces.
  • The Law of Testing: The degree to which you know how your software behaves is the degree to which you have accurately tested it.

Kanat-Alexander (2012), however, makes a very important comment at the end of the appendix, which summarizes the thinking about these laws:

Note that of all of these, the most important to bear in mind are the purpose of software, the reduced form of the Equation of Software Design, and the Law of Simplicity. (KANAT-ALEXANDER, 2012, p. 74).

Thus, we note the superior relevance of these three laws:

  • Law number 2: A Equation of Software Design is (in reduced form):
D = Vf / Em

where:

D represents the desirability of a change.

Vf represents the future value.

Em represents the maintenance effort.

  • Law number 1: The purpose of software is to help people.
  • Law number 5: The Law of Simplicity: The ease of maintenance of any piece of software is proportional to the simplicity of its individual pieces.

Still, KANAT-ALEXANDER (2012) summarizes the important facts about software design in two simple sentences:

  • It is more important to reduce the effort of maintenance than it is to reduce the effort of implementation.
  • The effort of maintenance is proportional to the complexity of the system.

Unless the software in question is intended to be used only once or to have a very short life, which is unlikely, the importance of maintainability is very clear.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

About The Software Design Science (Part 1 of 2)

Image from page 422 of "Bell telephone magazine" (1922)

As the first person to claim the science of software design, Kanat-Alexander (2012), when he speaks about what he calls “the missing science”, explores the concept of software design. All the fundamentation of the software design laws depend on that conceptualization. The missing science is the software design science. The approach defined by Kanat-Alexander (2012) transcribed bellow, reflects the practice and the facts. The software design science acts since before the beginning of programming phase, remains during its development, after the programming is finished and until the program enters in operation, for its maintenance.

Every programmer is a designer. (KANAT-ALEXANDER, 2012, p.6).

In the original version of this specific work of Kanat-Alexander, Code Simplicity, its title represents a fundamental truth to be followed by the software developers: the simplicity of the code.

Software design, as it is practiced in the world today, is not a science. What is a science? The dictionary definition is a bit complex, but basically, in order for a subject to be a science it has to pass certain tests. (KANAT-ALEXANDER, 2012, p.7).

In this defense made by him, of a new science, are the elements long ago perceived, but not yet organized by the more experienced programmers. In this way, are listed the tests by which the software project must pass to be considered a science:

  • A science must be composed of facts, not opinions, and these facts should have been gathered somewhere (like in a book).
  • That knowledge must have some sort of organization, be divided into categories and the various parts must be properly linked to each other in terms of importance etc.
  • A science must contain general truths or basic laws.
  • A science should tell you how to do something in the physical universe and be somehow applicable at work or in life.
  • Typically, a science is discovered and proven by means of scientific method, which involves the observation of the physical universe, piece together a theory about how the universe works, perform experiments to verify its theory and show that the same experiment works everywhere to demonstrate that the theory is a general truth and not just a coincidence or something that worked only for someone.

The whole software community knows there is a lot of knowledge recorded and collected in books, in a well-organized manner. Despite that, we still miss clearly stated laws. If experienced software developers know what is right to do, nobody knows for sure why some decisions represent the right thing. Therefore, Kanat-Alexander (2012) lists definitions, facts, rules and laws for this science.

The whole art of practical programming grew organically, more like college students teaching themselves to cook than like NASA engineers building the space shuttle…. After that came a flurry of software development methods: the Rational Unified Process, the Capability Maturity Model, Agile Software Development, and many others. None of these claimed to be a science—they were just ways of managing the complexity of software development. And that, basically, brings us up to where we are today: lots of methods, but no real science. (KANAT-ALEXANDER, 2012, p. 10).

Kanat-Alexander (2012) affirms that all the definitions below are applicable when we talk about software design:

  • When you “design software”, it is planned: the structure of the code, what technologies to use, etc. There are many technical decisions to be made. Often, one decides just mentally, other times, also jots plans down or makes a few diagrams;
  • Once that is done, there is a “software design” (a plan that was elaborated), may that be a written document or only several decisions taken and kept in mind;
  • Code that already exists also has “a project” (“project” as the plan that an existing creation follows), which is the structure that it has or the plan that it seems to follow. Between “no project” and “a project” there are also many possibilities, such as “a partial project”, “various conflicting projects in a code snippet”, etc. There are also effectively bad projects that are worse than having no project, like coming across some written code that is intentionally disorganized or complex: a code with an effectively bad project.

The science presented here is not computer science. That’s a mathematical study. Instead, this book contains the beginnings of a science for the “working programmer” — a set of fundamental laws and rules to follow when writing a program in any language… The primary source of complexity in software is, in fact, the lack of this science. (KANATALEXANDER, 2012, p. 11).

The science of software design is a science to develop plans and to make decisions about software, helping in making decisions about the ideal structure of a program’s code, the choice between execution speed or ease of understanding and about which programming language is more appropriate to the case. We note then, a new point of view for what we call software design, through the prism of the programmer, and that involves not only the activities after the requirements analysis, but which also perpetuates throughout all the programming and product life cycle, including in its maintenance, because for a good maintenance, such software you will need a good design as a reference taking into account its fundamental laws.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

Hello Data!

qrcode

We all are, in one form or another, informed that, despite human beings, we also exist in the form of data, may it be as a single telephone record in the smartphone of a friend or even as a simple paper record in the notebooks of an old church or civil registry office of a distant island somewhere in the world.

Besides that, all things discovered up to the moment have been registered and mapped in some form. From underground waters to every meter of ground land and the frontiers of outer space, from subjective interpretations of human sciences to the objective discoveries of engineering, physics and math. It is true that, in terms of science, there is still so much to discover, but every time that this is achieved, new records are added, not to mention the new data that has been created along the way. Actually, new data is created on the go while sisters share their selfies on Facebook, when engineers and scientists manage their projects away from their team members, and while I am here writing this post.

Anyhow, civilization has always been subject to its tools, from generation to generation, and currently, in the information age, every person is affected by the way data is obtained, processed and presented.

This website expects to talk about data and its related subjects in every perspective we find, giving examples, sharing tutorials, commenting experiences, and whatever possible and necessary to contribute positively with the scientific, software development and data professionals communities, by getting feedback (intellectual contribution) from its readers, and by generating discussion and reflection.

DataOnScale.com is an endeavor of Luis Cláudio and Pedro Jr., two Brazilian IT professionals and data science enthusiasts.