How Databases Relate`

Bushwick NYC streetart by Phetus

A relational database differs from a bunch of text in a computer file insofar as the database identifies the different types of text and, ideally, how they relate to each other.

IBM computer scientist E. F. Codd presented this "relational model," also called the entity-relationship model, to the world in 1970, promising that the relational view "provides a basis for a high level data language which will yield maximal independence between programs on the one hand and machine representation and organization of data on the other." Also, it would provide a "sound basis" for treating derivability, redundancy, and consistency of relations.

That language turned out to be the Structured Query Language (SQL), upon which has been built generations of relational databases. Only in the past decade was this model seriously challenged by NoSQL databases, which lack this ability to relational-ize data, so to speak. That is left to the exercise of the developer.

In SQL, 'relation' is the database table, 'attribute' is a database column and 'tuple' the database row. An e-commerce shop keeps a database of its customers: 'Attributes' are “customer ID,” “last name,” “first name,” “zip code,” and so on. A 'tuple' is an individual entry describing one person (#42, “John” “Smith” “20707”). The 'relation' is the entire table.

Bushwick NYC streetart by Phetus

Thinking in terms of attributes, the set of all the implied possible values for an attribute is called a domain. For instance, the Domain for Social Security Numbers would be all the possible sets of numbers that fit in the NNN-NN-NNNN format (though an entry with a letter or letters inserted in place of a number would not be part of the domain).

A database is strict on how the data is entered -- Anything entered into this SSN field must follow this format. In addition to internal constraints, the database operator may apply some of their own rules. These rules can be enforced through the CHECK constraints in SQL.

Wrapping up, the entire set of relations is called the schema. The collected values at any one point of time is called its state.

"Any system that is advertised as, or claimed to be, a relational data base management system, that system must be able to manage data bases entirely through its relational capabilities," wrote Codd in his treatise of what defines a relational database, Codd's 12 Rules.

One Key per Tuple

Integrity constraints are the rules that ensure the relation keeps order of its data.

Each tuple must have some unique characteristic so it can be identified. It could be a 'superkey,' or 'composite primary key,' which are a combination of attributes that uniquely defines each tuple. Or, one attribute can be designated as a 'primary key.' The primary key must be completely different for each tuple in the table. If a new entry is added for each new date, then no two entries can ever fall on the same date, then the date could be the primary key. In a sales environment, the “order number” could serve as a primary key, for instance.

Alternately, the primary key could be as simple as an incremented number assigned to each new entry, by the database. This unique number is called a 'surrogate key.'

In addition to primary keys, a tuple may also have 'foreign keys, which can link to other tuples. To illustrate, think of a returning customer buying an item at an e-commerce store. In the store's customer table, the "customer ID" is the primary key, and in the transactions table, the "order number" is the primary key there. However, the transactions table includes the customer ID for each order. In that tuple, the "customer ID" is a foreign key.

It should be noted that these two relations must maintain what is called referential integrity, meaning that any "customer ID" in the transactions table must actually be in the customer table.

Basically, you can do four 'operations' with database data:

  1. Retrieveexisting data
  2. Insert new data
  3. Update existing data
  4. Delete existing data

The Glue: Functional Dependencies

Using the Enhanced Entity Relation Model, we can further specify different types of entities, into subclasses and superclasses. Subclasses are groupings within a set of entities that are clustered in different roles they play. If "employee" is an entity, then "technician" and "manager" could be two different subclasses. Inversely, "employee" is a superclass of "technician" and "manager."

Bushwick NYC streetart by Phetus

Some attributes, called specific attributes or local attributes, can only be applied to certain subclasses (a technician's specialty, for instance). Sometimes inclusion into a subclass can be determined by the existence of a certain attribute, called predicate-defined subclasses. Membership is defined by a particular value of an attribute, though subclasses can also be defined without any particular attribute--user-defined subclasses.

The actual relationships in the relational database are called functional dependencies. "The single most important concept in relational schema design theory is that of a functional dependency," write Ramez Elmasri and Shamkant Navathe in "Fundamentals of Database Systems." The relations are permanent and unalterable.

For instance, a table may have two attributes, or columns. Given that one is the primary key, we can always find the value of the second attribute by looking up the primary key. The primary key is the determinant and the other is the dependent, the parlance of database-speak. Keep in mind that this relationship does not work in reverse: You can not use a dependent value to definitively determine the primary key. You have no guarantee that the dependent value is unique in that table.

Beyond the simple connection drawn between the primary key and the dependents, a number of other inferences can be made as well, using Armb's inference rules, which cover the laws of logic such as transitivity and reflexivity.