About Business Intelligence: The Relationship Between Data, Information and Knowledge

11297873123_4b8df86e6d_z

It is deduced from Serra (2002) that it is necessary to know the relation between data, information and knowledge. It can be said that the data is a record, the information is a fact associated with a record and knowledge is the identification of an information according to a rule.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Our future posts that complete the current “About Business Intelligence” theme will be:

  • About Business Intelligence: Data Warehouse

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

About Business Intelligence (BI)

14583229327_62848089a7_o

The definition of Business Intelligence is born from the knowledge of what is Data Management. According to Barbieri (1994, apud SERRA, 2002), data management can be defined as a function of the organization responsible for centrally developing and managing strategies, procedures, practices and plans capable of providing the necessary corporate data, when necessary, covered with integrity, privacy, documentation and sharing.

Data Management has a very broad operation and participates in organizational strategic planning, detects future information needs, plans the databases to meet the organization’s business and manages all of the organization’s data (even those non-computerized). However, for Serra (2002), Data Management has changed its name and is now labeled “Business Intelligence”, however, the purpose and characteristics remain the same. The truth is that the term BI encompasses a variety of software and practices that facilitate decision-making (operational, tactical or strategic) by analyzing information of responsible quality.

Business Intelligence represents a great potential for organizational change and, for this reason, has inherent challenges in its implementation due to human resistance to change. Thus, cultural and technical challenges can pose threats and opportunities. One of the key challenges for the BI project is the issue of data quality, which must be clean and consistent to substantiate the information that will be generated. In this way, it is necessary to be certified that the data that feed BI have these attributes (cleanness and consistency) so that the information generated is reliable. The need for data cleansing and consistency used in BI is addressed in the data staging phase of the Data Warehouse, of which the BI tools are part.

We educe from Serra (2002) that there are some important factors to be considered in Business Intelligence. Among them are the relationship between data, information and knowledge, the data quality, the information quality, the relationship between operational information and managerial information, and the adequacy of information to business needs.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Our future posts that complete the current “About Business Intelligence” theme will be:

  • About Business Intelligence: Data Warehouse

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

About The Software Design Science – Part 2 of 2

11075945736_386b87388a_o

Along the book “Code Simplicity”, Kanat-Alexander (2012) discusses the truths about software design during its whole life cycle, invoking to existence the missing science of software design, in which every programmer is a designer, with arguments for such, he still justifies the reason it took so long to come to light. In this way, didactically, in the appendices, he lists the laws that we comment here:

  • The purpose of software is to help people.
  • The Equation of Software Design is:
D = (Vn + Vf) / (Ei + Em)

where:

D represents the desirability of the change.

Vn represents the value now.

Vf represents the future value.

Ei represents the implementation effort.

Em represents maintenance effort.

The Equation of Software Design is the primary law of software design. As time goes on, that equation will reduce to:

D = Vf / Em

That demonstrates that reducing the maintenance effort is more important than reducing the implementation effort.

  • The law of Change: The longer your program exists, the more probable it is that any piece of it will have to change.
  • The Law of Defect Probability: The chance of introducing a defect into your program is proportional to the size of the changes you make to it.
  • The Law of Simplicity: The ease of maintenance of any piece of software is proportional to the simplicity of its individual pieces.
  • The Law of Testing: The degree to which you know how your software behaves is the degree to which you have accurately tested it.

Kanat-Alexander (2012), however, makes a very important comment at the end of the appendix, which summarizes the thinking about these laws:

Note that of all of these, the most important to bear in mind are the purpose of software, the reduced form of the Equation of Software Design, and the Law of Simplicity. (KANAT-ALEXANDER, 2012, p. 74).

Thus, we note the superior relevance of these three laws:

  • Law number 2: A Equation of Software Design is (in reduced form):
D = Vf / Em

where:

D represents the desirability of a change.

Vf represents the future value.

Em represents the maintenance effort.

  • Law number 1: The purpose of software is to help people.
  • Law number 5: The Law of Simplicity: The ease of maintenance of any piece of software is proportional to the simplicity of its individual pieces.

Still, KANAT-ALEXANDER (2012) summarizes the important facts about software design in two simple sentences:

  • It is more important to reduce the effort of maintenance than it is to reduce the effort of implementation.
  • The effort of maintenance is proportional to the complexity of the system.

Unless the software in question is intended to be used only once or to have a very short life, which is unlikely, the importance of maintainability is very clear.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

What is the best big data processing framework?

20778886222_e6a0f46ef3_o

When we look to some of the data processing frameworks, we see questions about if Apache Flink is better than Apache Spark, or about when Spark is a better choice than Hadoop, etc. I would say that the choice of one of these depends on one’s business requirements.

Spark and Flink offer better performance and huge gains over traditional Apache Hadoop MapReduce. Apache Flink, although it can be used in batch processing scenarios for ETL workloads, is more focused on providing stream processing solutions when we need sub-second latency. Flink is a great tool for IoT cases, where vital signals drawn from sensors need to be processed and event flow processing results need to be delivered in real time. On the other hand, Apache Spark is a good solution when response time fits well on seconds or minutes scale. Using Spark, we can also store and perform further analysis over the RDDs generated during workload processing. If response time is not so critical and batch processing jobs are executed out of peek time (overnight), Hadoop may be an alternative. Hadoop has achieved maturity over a decade, and has gained a solid market share due to a large number of tools that integrate the ecosystem, such as Pig (scripting), Hive (DW), HBase (column store), Mahout (machine learning), Giraph (graph), Oozie (workflow), etc. In addition, there are many Hadoop distributions available for production environments, such as Cloudera, Hortonworks, MapR and others.

With business looking forward to provide even faster results, we see a paradigm shift in the big data processing landscape. A few years ago, companies adopted Hadoop, but with the arising of Apache Storm and stream processing, there was a very strong appeal to get real-time responses as well. However, Storm’s stream processing could not guarantee the delivery of a consistent view of the data, although very close to the actual results. This concept is what we call Lambda architecture: Storm was included in the data flow, but its results could later be adjusted with the correct results obtained by batch processing on Hadoop.

With the arrival of stream processors such as Flink, Samza, Apex and Gearpump, the results of stream processing began to offer a consistent view of the data. As a way to simplify the complex workflow of the Lambda model, the market has been adopting a more concise and simpler approach for fast data, the Kappa architecture. Kappa can be adopted for both batch processing and stream processing purposes. Inside of Kappa architecture, there’s a distributed commit log component, and it works as a layer of integration between data producers and data consumers, such as pub-sub system. For this integration layer, we can adopt Apache Kafka or a cloud solution like Amazon Kinesis. In this layer, all the input events generated by the data producers are stored in a durable data storage, and what determines the latency of data delivery is how long data consumers consume the data from the these distributed queues. If you need to process data and deliver results in real time, you may choose Flink for stream processing. If consumer is a batch mode ETL system, Spark may be a great solution too. The advantage of a Kappa architecture is that we have a reliable and source of truth, and solutions like Apache Kafka become the central piece of the Kappa architecture.

Text: Luis Cláudio R. da Silveira
Revision: Pedro Carneiro Jr.


Image credits:

  • “The Commons”, Flickr.com / Internet Archive Book Images – https://www.flickr.com/photos/internetarchivebookimages/20778886222/sizes/l/ (Image from page 293 of “Plant propagation : greenhouse and nursery practice ” (1916)Title: Plant propagation : greenhouse and nursery practice, Identifier: cu31924073971149, Year: 1916 (1910s), Authors: Kains, M. G. (Maurice Grenville), 1868-1946, Subjects: Plant propagation, Publisher: New York : Orange Judd Company, Contributing Library: Cornell University Library, Digitizing Sponsor: MSN)

 

 

What is Data Science?

Image taken from page 2 of 'Lady Lohengrin. [A novel.]'

I am a bit dissatisfied with the multiple definitions that data science have been receiving and the lack of at least one clear and scientific approach to a definition for it as it occurs with computer science, software development science, and a lot of other subjects. So I decided to write this post expecting to produce some findings and/or to light up some discussion around it. Who knows we may reach a more scientific definition in the future.

“The field of data science is emerging at the intersection of the fields of social science and statistics, information and computer science, and design. The UC Berkeley School of Information is ideally positioned to bring these disciplines together and to provide students with the research and professional skills to succeed in leading edge organizations.” – https://datascience.berkeley.edu/about/what-is-data-science/,
accessed on January 13rd, 2016.

Data Science Happens Not Only In California

Many people quote that a data scientist is “a data analyst who lives in San Francisco”. That alone might indicate the importance of the data analysts and all the data practitioners in California, but also it seems to be enough to determine that what we know as data science has a more practical or commercial appeal than a proper scientific definition for itself. Anyhow, we should not deny that this data science already has an identity: a fast-paced, rapidly-evolving one, just like any other field directly involved with modern technologies. But the distinct personality of data science is still a bit confusing.

Is Statistics Data Science Itself?

Many argue that data science might be statistics itself or whatsoever modern statistics does by the usage of computational means. That happens even in the academic ecosystem in a large scale, propelled by the popularity and the usage of big data, machine learning et cetera. Do statistics compose the whole data science? Does data science compose the whole statistics? In other words, are statistics and data science different sets, different sciences? The known truth so far is that statistics makes use of data science.

Data Science, According To Wikipedia

Many professors would not accept an Wikipedia definition as the basis for a scientific argument. Anyhow, let us ease things a little bit by using it. In my opinion, Wikipedia reflects what a majority think or at least tends to be an average of the mindset.

Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics, similar to Knowledge Discovery in Databases (KDD). – Wikipedia, https://en.wikipedia.org/wiki/Data_science, accessed on January 12th, 2016.

Wikipedia, at this moment at least, defines data science as an interdisciplinary field. That is true. Another point of view affirms that too and provides the famous Data Science Venn Diagram. My question is: must a field be a science? A field is a subset or part of a science but the reciprocal is not necessarily true. In the citation above, Wikipedia affirms that statistics is a field too and we are considering Statistics as a science.

google-definition-datascience
Google uses Wikipedia’s definition of data science

 

A Data Science Visualization, according to Drew Conway One of the opinions that has a closer approach to a common place for a definition of Data Science is the one of Drew Conway. Despite I have not seen yet any statement that it is a definition, his visualization brings data science as an intersection of hacking skills, Statistics, and the areas of application, the famous Data Science Venn Diagram. It seems that it still misses key areas such as databases, data governance, and so on, but I think that he has put all Computer Science and databases stuff into a set called “hacking skills”. Also, that occurs probably because the world has much more programmers (people with hacking skills) than computer scientists or because those results oriented people with hacking skills are in more demand than computer scientists. Who knows computer science is so closed in itself (difficult to enter or to communicate) or it becomes so boring in university that there are more “people with hacking skills” from other areas behind the desks typing R command lines than good computer scientists doing the same.

“As I have said before, I think the term “data science” is a bit of a misnomer, but I was very hopeful after this discussion; mostly because of the utter lack of agreement on what a curriculum on this subject would look like. The difficulty in defining these skills is that the split between substance and methodology is ambiguous, and as such it is unclear how to distinguish among hackers, statisticians, subject matter experts, their overlaps and where data science fits. What is clear, however, is that one needs to learn a lot as they aspire to become a fully competent data scientist. Unfortunately, simply enumerating texts and tutorials does not untangle the knots. Therefore, in an effort to simplify the discussion, and add my own thoughts to what is already a crowded market of ideas, I present the Data Science Venn Diagram.” – Drew Conway, http://drewconway.com/zia/2013/3/26/the-data-sciencevenn-diagram, accessed on January 13rd, 2016.

According to Drew Conway, author of the DS Venn Diagram, the recent “data science” term forged for the recent usage of data may be a bit of a misnomer and I agree with him.

data_science_vd
Data Science Venn Diagram – Credits/source: Drew Conway (http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram)

Data Science vs Data Science

We should ask then, what science is that one that the so called data science field sits in? Information Science, the “Data Science”, Statistics, Computer Science…? Wikipedia’s data science definition also says that DS is similar to KDD, but shouldn’t KDD be encompassed by DS simply because databases deal with data? Because of that, another question comes to mind: Is the real Data Science “the science of data” or “the science that extracts knowledge or insights from data in various forms”?

Here we encounter two definitions and only one of them is the real Data Science.

“Data science is the study of where information comes from, what it represents and how it can be turned into a valuable resource in the creation of business and IT strategies… Mining large amounts of structured and unstructured data to identify patterns can help an organization rein in costs, increase efficiencies, recognize new market opportunities and increase the organization’s competitive advantage. Some companies are hiring data scientists to help them turn raw data into information. To be effective, such individuals must possess emotional intelligence in addition to education and experience in data analytics.” – http://searchcio.techtarget.com/definition/data-science, accessed on January 13rd, 2016.

The Data Science Venn Diagram above helps a lot with that, but there is more to be discovered, mainly because, in my opinion, this “data science” Wikipedia, data analysts, statisticians, programmers, and business men talk about is more about what these data practitioners have been doing with statistics, substantive expertise and hacking skills to turn raw data into information then, for example, the science that studies data, a systematically organized body of knowledge on the particular subject of data, in other words, the science that studies data frames, data sets, databases, meta-data, data flows, data cubes, data models, and all the domain the subject of data might encompass and its frontiers. That makes us go after the definition of science.

“There is much debate among scholars and practitioners about what data science is, and what it isn’t. Does it deal only with big data? What constitutes big data? Is data science really that new? How is it different from statistics and analytics?… In virtually all areas of intellectual inquiry, data science offers a powerful new approach to making discoveries. By combining aspects of statistics, computer science, applied mathematics, and visualization, data science can turn the vast amounts of data the digital age generates into new insights and new knowledge.”, http://datascience.nyu.edu/what-is-data-science/, accessed on January 13rd, 2016.

What Science Is

I went after a classic definition for science and the first thing that came to me was, again, an Wikipedia definition. That’s the modern days, professors. Anyway, trying to be fair to the investigation, I tried to find other online sources and found some other definitions, including one that gets close to what is better to use when one wants to prove a science and that may be helpful in our future reasonings.

Science, According to Wikipedia

Wikipedia defines science as “a systematic enterprise that creates, builds and organizes knowledge in the form of testable explanations and predictions about the universe”.

Science, According to Google’s Definition

According to Google, science is “the intellectual and practical activity encompassing the systematic study of the structure and behavior of the physical and natural world through observation and experiment. (‘the science of criminology’)”; “a particular area of this. (‘veterinary science’)”;”a systematically organized body of knowledge on a particular
subject. (‘the science of criminology’)”; “synonyms: physics, chemistry, biology; physical sciences, life sciences (‘he teaches science at the high school’)”.

google-definition-science
Google Dictionary’s definition of science

Science, According to Merriam-Webster

At Merriam-Webster we read that science is “knowledge about or study of the natural world based on facts learned through experiments and observation; a particular area of scientific study (such as biology, physics, or chemistry); a particular branch of science; a subject that is formally studied in a college, university, etc.”

Science, According to BusinessDictionary.com

The BusinessDictionary.com defines science as “Body of knowledge comprising of measurable or verifiable facts acquired through application of the scientific method, and generalized into scientific laws or principles. While all sciences are founded on valid reasoning and conform to the principles of logic, they are not concerned with the definitiveness of their assertions or findings”. And adds, “In the words of the US paleontologist Stephen Jay Gould (1941-), ‘Science is all those things which are confirmed to such a degree that it would be unreasonable to withhold one’s provisional consent.’”

This one seams to be the best definition for science we found up to the moment as it mentions the scientific method as the way to measure and verify the facts and the laws or principles that compose a science.

A Raw First Definition of Data Science

This is raw, and maybe not sophisticated and prone to errors (we are not using the scientific method yet – let us keep that one for future posts), but let us imagine what a data science definition would be based on the definitions of science we listed above.

An Wikipedia-Would-Be Definition of Data Science

“A systematic enterprise that creates, builds and organizes knowledge in the form of testable explanations and predictions about data“.

Do we have a systematic enterprise that creates, builds and organizes knowledge in the form of testable explanations and predictions about data? What we have today is about data or about other things using data as the main support?

A Google-Would-Be Definition of Data Science

From Google’s definition of Science, it looks like our data science definition should at least become something like:

1. “the intellectual and practical activity encompassing the systematic study of the structure and behavior of the physical and natural world through observation and experiment of data. (‘the science of data’)”;
2. or “the intellectual and practical activity encompassing the systematic study of the structure and behavior of data in the physical and natural world through observation and experiment. (‘the science of data’)”.

We have first and second definitions, based on Google’s definition of science.

From recent practice and readings, I would bet that our first-created Google-would-be definition (1) is what all people involved have in mind as for what they/we think data science is. I think that is why many people tend to confuse data science with statistics, simply because the definition number one expresses very well what statistics does. But, actually, is that the proper definition for data science?

Other Google definitions would be like “a particular area of this. (‘data science’)”; “a systematically organized body of knowledge on the data subject. (‘the science of data‘)”.

Do we have a systematically organized body of knowledge on the data subject? As far as I know we have systematically organized bodies of knowledge on many subjects and they use data as a foundation.

A Merriam-Webster-Would-Be Definition of Data Science

“Knowledge about or study of data based on facts learned through experiments and observation; a particular area of scientific study (such as “DATA-o-logy”, biology, physics, or chemistry); a particular branch of science (data science); a subject that is formally studied in a college, university, etc.”

A BusinessDictionary-Would-Be Definition of Data Science

“Body of knowledge comprising of measurable or verifiable facts about data acquired through application of the scientific method, and generalized into scientific laws or principles.

We are here not to precisely inform the data science definition yet, but to throw the ball to the kicker.

Nowadays (we are in January, 2016), it is possible to find many definitions of data science and many (or all) of them still lack precision or lead to a practice that may be a misnomer of something people do with data for scientific and commercial reasons. As a science, there are people studying it, defining it (what we are trying to do), and not only using it. As a practice, people do not mind if it is a science or not since the tool set works for them. As many are trying to define it, according to their observations and experiences, it looks like everybody, while succeeding in a good definition for specific purposes, fails to discover a common place for the definition. As far as all scientists know, the proper common place for the definition of any science is Science itself.

Should one say that data science is “the science of data”, that would be vague, not precise, but that innocence would throw a light on a different perspective. What is science and what is data? That might help us reach better and more common-sense oriented definitions for both the practice of extracting knowledge or insights from data and the science of data and, who knows, turn us able to affirm that there is a lot of or no difference between the two things.

Just an important note: I searched the websites http://www.sciencecouncil.org/ and http://www.businessdictionary.com/ and found no definition for data science in their websites.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


Other Readings (References):

Other sources to find popular or different perspectives about data science are:

Image credits:

 

About The Software Design Science (Part 1 of 2)

Image from page 422 of "Bell telephone magazine" (1922)

As the first person to claim the science of software design, Kanat-Alexander (2012), when he speaks about what he calls “the missing science”, explores the concept of software design. All the fundamentation of the software design laws depend on that conceptualization. The missing science is the software design science. The approach defined by Kanat-Alexander (2012) transcribed bellow, reflects the practice and the facts. The software design science acts since before the beginning of programming phase, remains during its development, after the programming is finished and until the program enters in operation, for its maintenance.

Every programmer is a designer. (KANAT-ALEXANDER, 2012, p.6).

In the original version of this specific work of Kanat-Alexander, Code Simplicity, its title represents a fundamental truth to be followed by the software developers: the simplicity of the code.

Software design, as it is practiced in the world today, is not a science. What is a science? The dictionary definition is a bit complex, but basically, in order for a subject to be a science it has to pass certain tests. (KANAT-ALEXANDER, 2012, p.7).

In this defense made by him, of a new science, are the elements long ago perceived, but not yet organized by the more experienced programmers. In this way, are listed the tests by which the software project must pass to be considered a science:

  • A science must be composed of facts, not opinions, and these facts should have been gathered somewhere (like in a book).
  • That knowledge must have some sort of organization, be divided into categories and the various parts must be properly linked to each other in terms of importance etc.
  • A science must contain general truths or basic laws.
  • A science should tell you how to do something in the physical universe and be somehow applicable at work or in life.
  • Typically, a science is discovered and proven by means of scientific method, which involves the observation of the physical universe, piece together a theory about how the universe works, perform experiments to verify its theory and show that the same experiment works everywhere to demonstrate that the theory is a general truth and not just a coincidence or something that worked only for someone.

The whole software community knows there is a lot of knowledge recorded and collected in books, in a well-organized manner. Despite that, we still miss clearly stated laws. If experienced software developers know what is right to do, nobody knows for sure why some decisions represent the right thing. Therefore, Kanat-Alexander (2012) lists definitions, facts, rules and laws for this science.

The whole art of practical programming grew organically, more like college students teaching themselves to cook than like NASA engineers building the space shuttle…. After that came a flurry of software development methods: the Rational Unified Process, the Capability Maturity Model, Agile Software Development, and many others. None of these claimed to be a science—they were just ways of managing the complexity of software development. And that, basically, brings us up to where we are today: lots of methods, but no real science. (KANAT-ALEXANDER, 2012, p. 10).

Kanat-Alexander (2012) affirms that all the definitions below are applicable when we talk about software design:

  • When you “design software”, it is planned: the structure of the code, what technologies to use, etc. There are many technical decisions to be made. Often, one decides just mentally, other times, also jots plans down or makes a few diagrams;
  • Once that is done, there is a “software design” (a plan that was elaborated), may that be a written document or only several decisions taken and kept in mind;
  • Code that already exists also has “a project” (“project” as the plan that an existing creation follows), which is the structure that it has or the plan that it seems to follow. Between “no project” and “a project” there are also many possibilities, such as “a partial project”, “various conflicting projects in a code snippet”, etc. There are also effectively bad projects that are worse than having no project, like coming across some written code that is intentionally disorganized or complex: a code with an effectively bad project.

The science presented here is not computer science. That’s a mathematical study. Instead, this book contains the beginnings of a science for the “working programmer” — a set of fundamental laws and rules to follow when writing a program in any language… The primary source of complexity in software is, in fact, the lack of this science. (KANATALEXANDER, 2012, p. 11).

The science of software design is a science to develop plans and to make decisions about software, helping in making decisions about the ideal structure of a program’s code, the choice between execution speed or ease of understanding and about which programming language is more appropriate to the case. We note then, a new point of view for what we call software design, through the prism of the programmer, and that involves not only the activities after the requirements analysis, but which also perpetuates throughout all the programming and product life cycle, including in its maintenance, because for a good maintenance, such software you will need a good design as a reference taking into account its fundamental laws.

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

Justification

This short text is a mere Portuguese to English translation from part of my monograph “THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND QUALITY IN BUSINESS INTELLIGENCE” (free translation of the title), also aliased as “Enum and Quality in BI”, which corresponds to a minor part of the document structure.


References:

Image credits:

 

When to use Enums?

Let us consider this short real-life example:

One day, I was helping the guys at our local Housing Agency to build a new online form that was going to feed the database with candidates for the government’s low-cost housing benefit. Months later, after the solution was on-air and people had fed in their information, the same small Scrum team was responsible for building reports on that data.

solar-system-word-clipart-1
An enumeration example: The Solar System

One of our team members had left the project for another job and up to that moment we were alright implementing everything as demanded from above. But one thing happened to catch my attention: there were some data that, at first sight, did not have any correspondence among the report and the database.

We went up and down the documentation trying to figure out how to map all the report fields and the database we had (one born for the other for a very simple task). Despite the database was well documented, apparently, very little could be done to solve the problem and deliver the reports on time. Where the reports asked for marital status, income range, and information of the kind that usually go in combo boxes we only could find numbers (integers) in the database!

3-states-of-matter-clipart-1
3 States of Matter: real enums do not have states added during its existence

After some hours we were struggling with that, I decided to open the code, find and clamp what had the possibility to be the data structure I was looking for.

genders
Genders: a simple and genuine enumeration example

An enumeration is a complete, ordered listing of all the items in a collection. The term is commonly used in mathematics and theoretical computer science (as well as applied computer science) to refer to a listing of all of the elements of a set. – Enumeration – Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Enumeration

I believe in the benefits, agree and support the values of Scrum and really think of it as the closest approach to good effectiveness of a small team dealing with multiple projects, and I also would never consider that colleague that left the team a bad or average programmer (actually he is one of the best programmers I’ve met in my journey). The fact is that pressure and heat generated some bad gases in that event. The exhaust valve for the major part of OO programmers is called “enum”.

trafficlights
Traffic Lights: such a good enum example that has a classic usage for reporting purposes

That situation inspired me to observe more that “enum” thing and during some other experiences with that whatchamacallit datatype I was convinced I would be able to research a little and maybe bring something useful for the scientific and software development communities (for those that my words convince, of course).

For that reason, I’m going to start a series of posts that will depict in parts my monograph
“THE PERSISTENCE OF ENUMERATIONS IN POSTGRESQL DATABASES AND THE QUALITY IN BUSINESS INTELLIGENCE”, freely translated from the original in Portuguese, and where I expect to introduce the view of some authors on Software Design Science, Business Intelligence, THE ENUM, and some other things, usually related in a BI environment and, above all, to decipher the enumerations and when and how it’s better to use them.

tencommandments
The Ten Commandments: no change forever and ever

As database people we sometimes feel uncomfortable when developers tend to use certain methods, so, my proposal to answer the question “When to use enums?” came up after some debate among our professional circles. Some colleagues support my point of view and some avoid or do not like it. All in all, there is still a gap between code and data. Let’s explore it?

rainbow
The 7 visible colors: there’s a treasure hidden for those who seek authentic enumerations

Text: Pedro Carneiro Jr.
Revision: Luis Cláudio R. da Silveira


These are the posts on the same “Enum and Quality in BI” monograph:

 

Image credits:

The images used in this post, edited or not, are Creative Commons (CC) and the originals are credited to their creators and can be found at: