About BI and Data Modeling: Requirements and Data Modeling

20191657948_ff15a71f32_z

There are two most common approaches to data modeling: relational modeling, often related to OLTP systems and dimensional modeling, which is a more appropriate technique for OLAP systems. Serra (2002) states that the process of building quality data models comes before the design of the first entity, starting with the understanding of the corporate model being addressed.

Collecting the business and data requirements is the foundation of the entire data warehouse effort—or at least it should be. Collecting the requirements is an art form, and it is one of the least natural activities for an IS organization. We give you techniques to make this job easier and hope to impress upon you the necessity of spending quality time on this step. (KIMBALL, 1998, p.6).

For any project aiming to build OLAP systems or OLTP systems, the reality with respect to the requirements is the same: they are the foundation of all the structure to be built and therefore, the necessary amount of time must be invested in order to prospect anything that is relevant. The more time invested in collecting and investigating requirements, the less time will be spent making unnecessary corrections in the future of the data model and the systems involved.

Fig. 2 - Definição de Requisitos e Modelagem Dimensional no Diagrama do Ciclo de Vida Dimensional do Negócio de Kimball
Definition of Requirements and Dimensional Modeling in the Kimball Business Dimensional Life-Cycle Diagram

From the business dimensional life-cycle model, on which Kimball’s (1998) methodology is based, note the importance not only of the process of defining the business requirements, but also, how much the dimensional data modeling process depends on these requirements, the dependency of the other processes of these two processes, and the critical path of the project, which tends to be strongly configured over this whole core set of processes.

KIMBALL (1998) states that in dimensional modeling, the definition of business requirements determines the data needed to meet the analytical requirements of business users. That is, for the ways of analysis that the business users intend to use can be made feasible, it is necessary to have the required data from the definition of the business requirements. A different approach than that used for the design of operational level systems is necessary to design data models that support such analyzes.

Even if there are applicability differences (and others) between data modeling techniques, relational or dimensional, the quality of the model will always be dependent on the quality of the requirements surveyed.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

These will be the next posts on the same theme:

  • About BI and Data Modeling: Quality of Modeling, Data and Information
  • About BI and Data Modeling: Types of Data Modeling
    • About BI and Data Modeling: Relational Modeling
      • About BI and Data Modeling: Phases of Relational Data Modeling
      • About BI and Data Modeling: How to create an Entity-Relationship Diagram
    • About BI and Data Modeling: Dimensional Modeling
      • About BI and Data Modeling: Defining Granularity
      • About BI and Data Modeling: Detailing Dimensions
      • About BI and Data Modeling: Defining the Attributes of the Fact Table (s)
      • About BI and Data Modeling: Defining Aggregates

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

About BI and Data Modeling

14569778340_9dd9c72276_z

It is not possible to talk about BI without talking about data modeling. The data must be organized in a way that allows the visualization of true and reliable information. As Business Intelligence is not just about data, but also about other factors, it is important to pay close attention to them, which are the main pillar of all information to be consumed.

According to Serra (2002), it is necessary to examine the mission and goals for the future of the organization, to identify key data for the different functional areas, to list and to analyze products, services, markets, current systems and organizational distribution channels. The organization’s goals, when examined, lead analysts to identify the key data needed for top management decisions.

The data model should be built by observing the meaningful real-world representation, the degree of excellence and the comprehensiveness, the language use and the syntax in an appropriate manner, besides the adherence to the organization’s business. (SERRA, 2002, p.31).

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

These will be the next posts on the same theme:

  • Requirements and Data Modeling
  • Quality of Modeling, Data and Information
  • Types of Data Modeling
    • Relational Modeling
      • Phases of Relational Data Modeling
      • How to create an Entity-Relationship Diagram
    • Dimensional Modeling
      • Defining Granularity
      • Detailing Dimensions
      • Defining the Attributes of the Fact Table (s)
      • Defining Aggregates

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

 

About Business Intelligence: The Relationship Between Operational Information and Managerial Information

20513932501_7541a75594_z

Information can be classified according to its operational or managerial purpose. For Serra (2002), information is both the source and the outcome of executive action: complete and current facts are essential for appropriate decisions. Information is operational when generated to maintain continuity of operations in the organization’s operational cycle and usually comes directly from transactional systems. Information is managerial when it aims to support some decision making. In addition, people of different management levels need management information of different levels.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Our future posts that complete the current “About Business Intelligence” theme will be:

  • About Business Intelligence: Data Warehouse

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

About Business Intelligence: The Relationship Between Data, Information and Knowledge

11297873123_4b8df86e6d_z

It is deduced from Serra (2002) that it is necessary to know the relation between data, information and knowledge. It can be said that the data is a record, the information is a fact associated with a record and knowledge is the identification of an information according to a rule.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Our future posts that complete the current “About Business Intelligence” theme will be:

  • About Business Intelligence: Data Warehouse

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

About Business Intelligence (BI)

14583229327_62848089a7_o

The definition of Business Intelligence is born from the knowledge of what is Data Management. According to Barbieri (1994, apud SERRA, 2002), data management can be defined as a function of the organization responsible for centrally developing and managing strategies, procedures, practices and plans capable of providing the necessary corporate data, when necessary, covered with integrity, privacy, documentation and sharing.

Data Management has a very broad operation and participates in organizational strategic planning, detects future information needs, plans the databases to meet the organization’s business and manages all of the organization’s data (even those non-computerized). However, for Serra (2002), Data Management has changed its name and is now labeled “Business Intelligence”, however, the purpose and characteristics remain the same. The truth is that the term BI encompasses a variety of software and practices that facilitate decision-making (operational, tactical or strategic) by analyzing information of responsible quality.

Business Intelligence represents a great potential for organizational change and, for this reason, has inherent challenges in its implementation due to human resistance to change. Thus, cultural and technical challenges can pose threats and opportunities. One of the key challenges for the BI project is the issue of data quality, which must be clean and consistent to substantiate the information that will be generated. In this way, it is necessary to be certified that the data that feed BI have these attributes (cleanness and consistency) so that the information generated is reliable. The need for data cleansing and consistency used in BI is addressed in the data staging phase of the Data Warehouse, of which the BI tools are part.

We educe from Serra (2002) that there are some important factors to be considered in Business Intelligence. Among them are the relationship between data, information and knowledge, the data quality, the information quality, the relationship between operational information and managerial information, and the adequacy of information to business needs.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Our future posts that complete the current “About Business Intelligence” theme will be:

  • About Business Intelligence: Data Warehouse

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

About The Software Design Science (Part 1 of 2)

Image from page 422 of "Bell telephone magazine" (1922)

As the first person to claim the science of software design, Kanat-Alexander (2012), when he speaks about what he calls “the missing science”, explores the concept of software design. All the fundamentation of the software design laws depend on that conceptualization. The missing science is the software design science. The approach defined by Kanat-Alexander (2012) transcribed bellow, reflects the practice and the facts. The software design science acts since before the beginning of programming phase, remains during its development, after the programming is finished and until the program enters in operation, for its maintenance.

Every programmer is a designer. (KANAT-ALEXANDER, 2012, p.6).

In the original version of this specific work of Kanat-Alexander, Code Simplicity, its title represents a fundamental truth to be followed by the software developers: the simplicity of the code.

Software design, as it is practiced in the world today, is not a science. What is a science? The dictionary definition is a bit complex, but basically, in order for a subject to be a science it has to pass certain tests. (KANAT-ALEXANDER, 2012, p.7).

In this defense made by him, of a new science, are the elements long ago perceived, but not yet organized by the more experienced programmers. In this way, are listed the tests by which the software project must pass to be considered a science:

  • A science must be composed of facts, not opinions, and these facts should have been gathered somewhere (like in a book).
  • That knowledge must have some sort of organization, be divided into categories and the various parts must be properly linked to each other in terms of importance etc.
  • A science must contain general truths or basic laws.
  • A science should tell you how to do something in the physical universe and be somehow applicable at work or in life.
  • Typically, a science is discovered and proven by means of scientific method, which involves the observation of the physical universe, piece together a theory about how the universe works, perform experiments to verify its theory and show that the same experiment works everywhere to demonstrate that the theory is a general truth and not just a coincidence or something that worked only for someone.

The whole software community knows there is a lot of knowledge recorded and collected in books, in a well-organized manner. Despite that, we still miss clearly stated laws. If experienced software developers know what is right to do, nobody knows for sure why some decisions represent the right thing. Therefore, Kanat-Alexander (2012) lists definitions, facts, rules and laws for this science.

The whole art of practical programming grew organically, more like college students teaching themselves to cook than like NASA engineers building the space shuttle…. After that came a flurry of software development methods: the Rational Unified Process, the Capability Maturity Model, Agile Software Development, and many others. None of these claimed to be a science—they were just ways of managing the complexity of software development. And that, basically, brings us up to where we are today: lots of methods, but no real science. (KANAT-ALEXANDER, 2012, p. 10).

Kanat-Alexander (2012) affirms that all the definitions below are applicable when we talk about software design:

  • When you “design software”, it is planned: the structure of the code, what technologies to use, etc. There are many technical decisions to be made. Often, one decides just mentally, other times, also jots plans down or makes a few diagrams;
  • Once that is done, there is a “software design” (a plan that was elaborated), may that be a written document or only several decisions taken and kept in mind;
  • Code that already exists also has “a project” (“project” as the plan that an existing creation follows), which is the structure that it has or the plan that it seems to follow. Between “no project” and “a project” there are also many possibilities, such as “a partial project”, “various conflicting projects in a code snippet”, etc. There are also effectively bad projects that are worse than having no project, like coming across some written code that is intentionally disorganized or complex: a code with an effectively bad project.

The science presented here is not computer science. That’s a mathematical study. Instead, this book contains the beginnings of a science for the “working programmer” — a set of fundamental laws and rules to follow when writing a program in any language… The primary source of complexity in software is, in fact, the lack of this science. (KANATALEXANDER, 2012, p. 11).

The science of software design is a science to develop plans and to make decisions about software, helping in making decisions about the ideal structure of a program’s code, the choice between execution speed or ease of understanding and about which programming language is more appropriate to the case. We note then, a new point of view for what we call software design, through the prism of the programmer, and that involves not only the activities after the requirements analysis, but which also perpetuates throughout all the programming and product life cycle, including in its maintenance, because for a good maintenance, such software you will need a good design as a reference taking into account its fundamental laws.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

When to use Enums?

Let us consider this short real-life example:

One day, I was helping the guys at our local Housing Agency to build a new online form that was going to feed the database with candidates for the government’s low-cost housing benefit. Months later, after the solution was on-air and people had fed in their information, the same small Scrum team was responsible for building reports on that data.

solar-system-word-clipart-1
An enumeration example: The Solar System

One of our team members had left the project for another job and up to that moment we were alright implementing everything as demanded from above. But one thing happened to catch my attention: there were some data that, at first sight, did not have any correspondence among the report and the database.

We went up and down the documentation trying to figure out how to map all the report fields and the database we had (one born for the other for a very simple task). Despite the database was well documented, apparently, very little could be done to solve the problem and deliver the reports on time. Where the reports asked for marital status, income range, and information of the kind that usually go in combo boxes we only could find numbers (integers) in the database!

3-states-of-matter-clipart-1
3 States of Matter: real enums do not have states added during its existence

After some hours we were struggling with that, I decided to open the code, find and clamp what had the possibility to be the data structure I was looking for.

genders
Genders: a simple and genuine enumeration example

An enumeration is a complete, ordered listing of all the items in a collection. The term is commonly used in mathematics and theoretical computer science (as well as applied computer science) to refer to a listing of all of the elements of a set. – Enumeration – Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Enumeration

I believe in the benefits, agree and support the values of Scrum and really think of it as the closest approach to good effectiveness of a small team dealing with multiple projects, and I also would never consider that colleague that left the team a bad or average programmer (actually he is one of the best programmers I’ve met in my journey). The fact is that pressure and heat generated some bad gases in that event. The exhaust valve for the major part of OO programmers is called “enum”.

trafficlights
Traffic Lights: such a good enum example that has a classic usage for reporting purposes

That situation inspired me to observe more that “enum” thing and during some other experiences with that whatchamacallit datatype I was convinced I would be able to research a little and maybe bring something useful for the scientific and software development communities (for those that my words convince, of course).

For that reason, I’m going to start a series of posts that will depict in parts my monograph
“THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND THE QUALITY IN BUSINESS INTELLIGENCE”, freely translated from the original in Portuguese, and where I expect to introduce the view of some authors on Software Design Science, Business Intelligence, THE ENUM, and some other things, usually related in a BI environment and, above all, to decipher the enumerations and when and how it’s better to use them.

tencommandments
The Ten Commandments: no change forever and ever

As database people we sometimes feel uncomfortable when developers tend to use certain methods, so, my proposal to answer the question “When to use enums?” came up after some debate among our professional circles. Some colleagues support my point of view and some avoid or do not like it. All in all, there is still a gap between code and data. Let’s explore it?

rainbow
The 7 visible colors: there’s a treasure hidden for those who seek authentic enumerations

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

 

Image credits:

The images used in this post, edited or not, are Creative Commons (CC) and the originals are credited to their creators and can be found at: