About BI and Data Modeling: Requirements and Data Modeling

20191657948_ff15a71f32_z

There are two most common approaches to data modeling: relational modeling, often related to OLTP systems and dimensional modeling, which is a more appropriate technique for OLAP systems. Serra (2002) states that the process of building quality data models comes before the design of the first entity, starting with the understanding of the corporate model being addressed.

Collecting the business and data requirements is the foundation of the entire data warehouse effort—or at least it should be. Collecting the requirements is an art form, and it is one of the least natural activities for an IS organization. We give you techniques to make this job easier and hope to impress upon you the necessity of spending quality time on this step. (KIMBALL, 1998, p.6).

For any project aiming to build OLAP systems or OLTP systems, the reality with respect to the requirements is the same: they are the foundation of all the structure to be built and therefore, the necessary amount of time must be invested in order to prospect anything that is relevant. The more time invested in collecting and investigating requirements, the less time will be spent making unnecessary corrections in the future of the data model and the systems involved.

Fig. 2 - Definição de Requisitos e Modelagem Dimensional no Diagrama do Ciclo de Vida Dimensional do Negócio de Kimball
Definition of Requirements and Dimensional Modeling in the Kimball Business Dimensional Life-Cycle Diagram

From the business dimensional life-cycle model, on which Kimball’s (1998) methodology is based, note the importance not only of the process of defining the business requirements, but also, how much the dimensional data modeling process depends on these requirements, the dependency of the other processes of these two processes, and the critical path of the project, which tends to be strongly configured over this whole core set of processes.

KIMBALL (1998) states that in dimensional modeling, the definition of business requirements determines the data needed to meet the analytical requirements of business users. That is, for the ways of analysis that the business users intend to use can be made feasible, it is necessary to have the required data from the definition of the business requirements. A different approach than that used for the design of operational level systems is necessary to design data models that support such analyzes.

Even if there are applicability differences (and others) between data modeling techniques, relational or dimensional, the quality of the model will always be dependent on the quality of the requirements surveyed.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

These will be the next posts on the same theme:

  • About BI and Data Modeling: Quality of Modeling, Data and Information
  • About BI and Data Modeling: Types of Data Modeling
    • About BI and Data Modeling: Relational Modeling
      • About BI and Data Modeling: Phases of Relational Data Modeling
      • About BI and Data Modeling: How to create an Entity-Relationship Diagram
    • About BI and Data Modeling: Dimensional Modeling
      • About BI and Data Modeling: Defining Granularity
      • About BI and Data Modeling: Detailing Dimensions
      • About BI and Data Modeling: Defining the Attributes of the Fact Table (s)
      • About BI and Data Modeling: Defining Aggregates

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

About BI and Data Modeling

14569778340_9dd9c72276_z

It is not possible to talk about BI without talking about data modeling. The data must be organized in a way that allows the visualization of true and reliable information. As Business Intelligence is not just about data, but also about other factors, it is important to pay close attention to them, which are the main pillar of all information to be consumed.

According to Serra (2002), it is necessary to examine the mission and goals for the future of the organization, to identify key data for the different functional areas, to list and to analyze products, services, markets, current systems and organizational distribution channels. The organization’s goals, when examined, lead analysts to identify the key data needed for top management decisions.

The data model should be built by observing the meaningful real-world representation, the degree of excellence and the comprehensiveness, the language use and the syntax in an appropriate manner, besides the adherence to the organization’s business. (SERRA, 2002, p.31).

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

These will be the next posts on the same theme:

  • Requirements and Data Modeling
  • Quality of Modeling, Data and Information
  • Types of Data Modeling
    • Relational Modeling
      • Phases of Relational Data Modeling
      • How to create an Entity-Relationship Diagram
    • Dimensional Modeling
      • Defining Granularity
      • Detailing Dimensions
      • Defining the Attributes of the Fact Table (s)
      • Defining Aggregates

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

 

About Business Intelligence: The Adequacy of the Information to the Business Needs

14744081366_201b47c5b6_z

For Serra (2002), each information source has three attributes: form, age and frequency. Taking an example of a “Quantity Produced by Manufacturing Order” report, we can assume that it has the following characteristics:

  • As for the form: detailing of quantities produced by product;
  • As for age: to be received at 8:30 AM, with facts reported until 00:00 AM of the  previous day;
  • As for frequency: daily.

Kimball (1998), addressing the processes involved in the Data Warehouse lifecycle, calls for the importance of balancing the reality of business requirements with the availability of data to meet such demand. Preparation and time are fundamental to a good project that will involve considerable dialogue between the qualified personnel in the systems area with the information-consuming staff of the business area.

Before you can do a good job of defining your data marts, you need to do some homework. You must thoroughly canvass your organization’s business needs and thoroughly canvass the data resources. (KIMBALL, 1998, p. 268).

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Our future posts that complete the current “About Business Intelligence” theme will be:

  • About Business Intelligence: Data Warehouse

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

 

 

About Business Intelligence: The Quality of Information

11022926536_d6652fba3d_z

Any quality information depends directly on quality data. One problem is the fact that today’s software production is still being done in an artisanal way. Serra (2002) even classifies system development professionals as “intellectual artisans” given the lack of controls and well-defined processes for that activity. Despite this difficulty in the efforts to measure the quality of software development processes, at least concrete results have been obtained by applying the methods of Kimball (1998) to Data Warehousing, in such a way that by them we have defined processes for the measurement and processing of information quality.

Consistent information means high-quality information. This means that all  of the information is accounted for and is complete. (KIMBALL, 1998, p.10).

Data staging is a major process that includes, among others, the following sub-processes: extracting, transforming, loading and indexing, and quality assurance checking. (KIMBALL, 1998, p.23).

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Our future posts that complete the current “About Business Intelligence” theme will be:

  • About Business Intelligence: Data Warehouse

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

About Business Intelligence: The Quality of Data

11208982434_a58edfc401_z

For Serra (2002), the effective Data Management function relies on standards and policies regarding data, their definition and usage. These standards and policies must be defined and adopted, being stringent, comprehensive, flexible to changes aiming reusability, stability, and the effective communication of the meaning of the data, as well as enabling their scalability. One should use tools such as data dictionary and repositories for data management. Data must be well defined, sound, consistent, reliable, safe and shared so that each new system defines only the data that is within its scope and shares the other data with other systems in the organization.

For Kimball (1998), warehouse design often begins with a load of historical data that requires cleansing and quality control. In existing ones, clean data comes from two processes: inserting clean data and cleaning/solving inserted data problems. In addition, establishing accountability for data quality and integrity can be extremely difficult in a Data Warehousing environment. In most transactional systems, important operational data is well captured, but optional fields do not receive attention and system owners do not care if they are accurate or complete if the required logic is being met. Thus, business and information systems groups must identify or establish an accountable person for each data source, whether internal or external, treating the data from a business perspective. The quality of the data depends on a series of events, many beyond the control of the data warehousing team, such as the data collection process that must be well designed and count on a great commitment of the people that perform the entry of those data with their respective quality. Once established the value of the data warehouse, it is easier to induce the necessary modifications to the data entry processes of the source systems aiming better data.

Kimball (1998) further argues that it is unrealistic to expect any system to contain perfect data, but each implementation must define its own standards of data quality acceptance. These standards are based on the characteristics of the quality data that are: accurate, complete, consistent, unique and timely – the warehouse data is consistent with the records system (accurate), and if not, reason can be explained; They represent the entire relevant set of data, and users are notified of the scope (complete); They have no contradictions (consistent); They always have the same name when they have the same meaning (unique); They are updated based on a useful agenda for business users, the schedule is known and people accept it that way (timely). In addition, quality data simply represent the truth of the facts.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Our future posts that complete the current “About Business Intelligence” theme will be:

  • About Business Intelligence: Data Warehouse

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

About Business Intelligence: The Relationship Between Data, Information and Knowledge

11297873123_4b8df86e6d_z

It is deduced from Serra (2002) that it is necessary to know the relation between data, information and knowledge. It can be said that the data is a record, the information is a fact associated with a record and knowledge is the identification of an information according to a rule.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Our future posts that complete the current “About Business Intelligence” theme will be:

  • About Business Intelligence: Data Warehouse

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits: