Skip to main content
Engineering LibreTexts

9.3: Data Model

  • Page ID
    64435
  • The Relational Data Model

    The relational data model was introduced by C. F. Codd in 1970. Currently, it is the most widely used data model.

    The relational model has provided the basis for:

    • Research on the theory of data/relationship/constraint
    • Numerous database design methodologies
    • The standard database access language called structured query language (SQL)
    • Almost all modern commercial database management systems

    The relational data model describes the world as “a collection of inter-related relations (or tables).”

    Fundamental Concepts in the Relational Data Model

    Relation

    A relation, also known as a table or file, is a subset of the Cartesian product of a list of domains characterized by a name. And within a table, each row represents a group of related data values. A row, or record, is also known as a tuple. The columns in a table is a field and is also referred to as an attribute. You can also think of it this way: an attribute is used to define the record and a record contains a set of attributes.

    Table

    A database is composed of multiple tables and each table holds the data. Figure 7.1 shows a database that contains three tables.

    Blue cylinder with three white rectangles each with a list of words.

    Figure 7.1. Database with three tables.

    Column

    A database stores pieces of information or facts in an organized way. Understanding how to use and get the most out of databases requires us to understand that method of organization.

    The principal storage units are called columns or fields or attributes. These house the basic components of data into which your content can be broken down. When deciding which fields to create, you need to think generically about your information, for example, drawing out the common components of the information that you will store in the database and avoiding the specifics that distinguish one item from another.

    Look at the example of an ID card in Figure 7.2 to see the relationship between fields and their data.

    Record-300x177

    Figure 7.2. Example of an ID card by A. Watt.

    Domain

    A domain is the original sets of atomic values used to model data. By atomic value, we mean that each value in the domain is indivisible as far as the relational model is concerned. For example:

    • The domain of Marital Status has a set of possibilities: Married, Single, Divorced.
    • The domain of Shift has the set of all possible days: {Mon, Tue, Wed…}.
    • The domain of Salary is the set of all floating-point numbers greater than 0 and less than 200,000.
    • The domain of First Name is the set of character strings that represents names of people.

    In summary, a domain is a set of acceptable values that a column is allowed to contain. This is based on various properties and the data type for the column. We will discuss data types in another chapter.

    Records

    Just as the content of any one document or item needs to be broken down into its constituent bits of data for storage in the fields, the link between them also needs to be available so that they can be reconstituted into their whole form. Records allow us to do this. Records contain fields that are related, such as a customer or an employee. As noted earlier, a tuple is another term used for record.

    Records and fields form the basis of all databases. A simple table gives us the clearest picture of how records and fields work together in a database storage project.

    image

    Figure 7.3. Example of a simple table by A. Watt.

    The simple table example in Figure 7.3 shows us how fields can hold a range of different sorts of data. This one has:

    • A Record ID field: this is an ordinal number; its data type is an integer.
    • A PubDate field: this is displayed as day/month/year; its data type is date.
    • An Author field: this is displayed as Initial. Surname; its data type is text.
    • A Title field text: free text can be entered here.

    You can command the database to sift through its data and organize it in a particular way. For example, you can request that a selection of records be limited by date: 1. all before a given date, 2. all after a given date or 3. all between two given dates. Similarly, you can choose to have records sorted by date. Because the field, or record, containing the data is set up as a Date field, the database reads the information in the Date field not just as numbers separated by slashes, but rather, as dates that must be ordered according to a calendar system.

    Degree

    The degree is the number of attributes in a table. In our example in Figure 7.3, the degree is 4.

    Properties of a Table

    • A table has a name that is distinct from all other tables in the database.
    • There are no duplicate rows; each row is distinct.
    • Entries in columns are atomic. The table does not contain repeating groups or multivalued attributes.
    • Entries from columns are from the same domain based on their data type including:
      • number (numeric, integer, float, smallint,…)
      • character (string)
      • date
      • logical (true or false)
    • Operations combining different data types are disallowed.
    • Each attribute has a distinct name.
    • The sequence of columns is insignificant.
    • The sequence of rows is insignificant.

    Data Warehouse

    As organizations have begun to utilize databases as the centerpiece of their operations, the need to fully understand and leverage the data they are collecting has become more and more apparent. However, directly analyzing the data that is needed for day-to-day operations is not a good idea; we do not want to tax the operations of the company more than we need to. Further, organizations also want to analyze data in a historical sense: How does the data we have today compare with the same set of data this time last month, or last year? From these needs arose the concept of the data warehouse.

    The concept of the data warehouse is simple: extract data from one or more of the organization’s databases and load it into the data warehouse (which is itself another database) for storage and analysis. However, the execution of this concept is not that simple. A data warehouse should be designed so that it meets the following criteria:

    • It uses non-operational data. This means that the data warehouse is using a copy of data from the active databases that the company uses in its day-to-day operations, so the data warehouse must pull data from the existing databases on a regular, scheduled basis.
    • The data is time-variant. This means that whenever data is loaded into the data warehouse, it receives a time stamp, which allows for comparisons between different time periods.
    • The data is standardized. Because the data in a data warehouse usually comes from several different sources, it is possible that the data does not use the same definitions or units. For example, our Events table in our Student Clubs database lists the event dates using the mm/dd/yyyy format (e.g., 01/10/2013). A table in another database might use the format yy/mm/dd (e.g., 13/01/10) for dates. In order for the data warehouse to match up dates, a standard date format would have to be agreed upon and all data loaded into the data warehouse would have to be converted to use this standard format. This process is called extraction-transformation-load (ETL).

    There are two primary schools of thought when designing a data warehouse: bottom-up and top-down. The bottom-up approach starts by creating small data warehouses, called data marts, to solve specific business problems. As these data marts are created, they can be combined into a larger data warehouse. The top-down approach suggests that we should start by creating an enterprise-wide data warehouse and then, as specific business needs are identified, create smaller data marts from the data warehouse.

    Data warehouse process (top-down)

    Benefits of Data Warehouses

    Organizations find data warehouses quite beneficial for a number of reasons:

    • The process of developing a data warehouse forces an organization to better understand the data that it is currently collecting and, equally important, what data is not being collected.
    • A data warehouse provides a centralized view of all data being collected across the enterprise and provides a means for determining data that is inconsistent.
    • Once all data is identified as consistent, an organization can generate one version of the truth. This is important when the company wants to report consistent statistics about itself, such as revenue or number of employees.
    • By having a data warehouse, snapshots of data can be taken over time. This creates a historical record of data, which allows for an analysis of trends.
    • A data warehouse provides tools to combine data, which can provide new information and analysis.

    Data Mining and Stuff

    Data Mining

    Data mining is the process of analyzing data to find previously unknown trends, patterns, and associations in order to make decisions. Generally, data mining is accomplished through automated means against extremely large data sets, such as a data warehouse. Some examples of data mining include:

    • An analysis of sales from a large grocery chain might determine that milk is purchased more frequently the day after it rains in cities with a population of less than 50,000.
    • A bank may find that loan applicants whose bank accounts show particular deposit and withdrawal patterns are not good credit risks.
    • A baseball team may find that collegiate baseball players with specific statistics in hitting, pitching, and fielding make for more successful major league players.

    In some cases, a data-mining project is begun with a hypothetical result in mind. For example, a grocery chain may already have some idea that buying patterns change after it rains and want to get a deeper understanding of exactly what is happening. In other cases, there are no presuppositions and a data-mining program is run against large data sets in order to find patterns and associations.

    Privacy Concerns

    The increasing power of data mining has caused concerns for many, especially in the area of privacy. In today’s digital world, it is becoming easier than ever to take data from disparate sources and combine them to do new forms of analysis. In fact, a whole industry has sprung up around this technology: data brokers. These firms combine publicly accessible data with information obtained from the government and other sources to create vast warehouses of data about people and companies that they can then sell. This subject will be covered in much more detail in chapter 12 – the chapter on the ethical concerns of information systems.

    Business Intelligence and Business Analytics

    With tools such as data warehousing and data mining at their disposal, businesses are learning how to use information to their advantage. The term business intelligence is used to describe the process that organizations use to take data they are collecting and analyze it in the hopes of obtaining a competitive advantage. Besides using data from their internal databases, firms often purchase information from data brokers to get a big-picture understanding of their industries. Business analytics is the term used to describe the use of internal company data to improve business processes and practices.

    Knowledge Management

    All companies accumulate knowledge over the course of their existence. Some of this knowledge is written down or saved, but not in an organized fashion. Much of this knowledge is not written down; instead, it is stored inside the heads of its employees. Knowledge management is the process of formalizing the capture, indexing, and storing of the company’s knowledge in order to benefit from the experiences and insights that the company has captured during its existence.

    Key Terms

    atomic value: each value in the domain is indivisible as far as the relational model is concerned

    attribute: principle storage unit in a database

    column: see attribute

    degree: number of attributes in a table

    domain: the original sets of atomic values used to model data; a set of acceptable values that a column is allowed to contain

    field: see attribute

    file: see relation

    record: contains fields that are related; see tuple

    relation: a subset of the Cartesian product of a list of domains characterized by a name; the technical term for table or file

    row: see tuple

    structured query language (SQL): the standard database access language

    table: see relation

    tuple: a technical term for row or record

    Terminology Key

    Several of the terms used in this chapter are synonymous. In addition to the Key Terms above, please refer to Table 7.1 below. The terms in the Alternative 1 column are most commonly used.

    A database table with words.

    Table 7.1. Terms and their synonyms by A. Watt.

    • Was this article helpful?