Elements of Relational Database Design
Why Use a Relational Database?
RDBs are readily available. Thus, even though your application is object-oriented, you may choose, for convenience, to use an RDB to store objects. Since RDBs are well studied, if you use one, the following statements are reasonably accurate:
- Data entry, updates, and deletions are efficient.
- Data retrieval, summarizatio, and reporting are efficient.
- The database behaves predictably.
- The database is largely self-documenting.
- Changeing the database schema is easy.
Tables, Uniqueness, and Keys
- A table (aka relation) should store value for only 1 kind of thing (e.g., a table of human beings).
- A table is defined by a set of n columns (an n-ary relation), each of which is a data field, such as Last name (String)
- Each row is a unique n-tuple in the relation represented by the table: A unique instance of the things represented by the table (e.g., a particular human being).
- A key for a table is a column or a set of columns whose values uniquely identify a row (e.g., uniquely identify a particular human being).
- Each table has a primary key, which is a key that typically:
- Uses a minimum number of columns
- Has stable values (i.e., they change rarely)
Consider the following table.
Name some keys. Among the keys mentioned, which might be a good primary key?
In web applications, among others, it is common, even customary, to have an integer "id" field that is the primary key. Often, this is the value of a class attribute (i.e., a static int or Integer).
A foreign key is a column in 1 table that is the primary key of another table. For example, the personId may be a foreign key in a table of a race car table that has a driver column.
In the table below, CustomerId is a foreign key in tblOrder which can be used to reference a customer stored in the tblCustomer table.
Relationships
A foreign key in a table (relational model) typically models a "hasa" relation in an object model. For example, for each order (object) has a customer (object).
One-to-Many Relationships
Two tables are related in a 1-to-many (1—M) relationship if for every row in the 1st table, there can be 0, 1, or many rows in the 2nd table, but for every row in the 2nd table there is exactly 1 row in the 1st table. For example, a poker table may be associated with many players, while each player is associated with at most 1 table (at any point in time).
For example, there can be many items for each order, so tblOrder and tblOrderDetails have a one-to-many relationship.
Many-to-Many Relationships
Two tables are related in a many-to-many (M—M) relationship when, for every row in the 1st table, there can be many rows in the 2nd table, and, for every row in the 2nd table, there can be many rows in the first table. Many-to-many relationships are not directly modeled in relational databases. They are decomposed into multiple 1-to-many relationships.
A linking table, tblPtInsurancePgm, is used to model the many-to-many relationship between tblPatient and tblInsurer.
Normalization
Below, tables are presumed to represent relations:
- Each table describes 1 entity.
- Each table has no duplicate rows: It has a primary key.
- Each table's rows are unordered.
- Each table's columns are unordered.
1st Normal Form: all column values are atomic
1NF dictates requries, for every cell in the table, there is only 1 value, not an array or list of values.
The table below is not in 1NF.
The table below is in 1NF.
2nd Normal Form
A table is said to be in 2nd Normal Form (2NF), if it is in 1NF and every non-key column depends on the entire primary key.
Consider the following relation, tblOrder4. The primary key for this table is the "product" of OrderId and OrderItem#. Is this table in 2NF?
CustomerId and OrderDate do not depend on OrderItem#. What do we need to do?
Decompose the relation into 2, both of which are in 2NF. One relation groups those kinds of data that depend only on OrderId: CustomerId and OrderDate. The other relation groups those kinds of data that depend on the product of OrderId and OrderItem#.
3rd Normal Form
A table is said to be in 3rd Normal Form (3NF), if it is in 2NF and if all non-key columns are mutually independent.
Is tblOrder in 3NF? Is tblOrderItem in 3NF?
If not, what needs to be done?
General Integrity Rules
The relational model has 2 general integrity rules:
- entity integrity—primary keys cannot contain null (missing) data.
- referential integrity—the database must not contain any unmatched foreign key values:
A row may not be added to a table with a foreign key unless the referenced value exists in the referenced table.
If the value in a table that's referenced by a foreign key is changed (or the entire row is deleted),
the rows in the table with the foreign key must not be "orphaned."
There are 2 options allowed by all database management systems when a referenced primary key value changes or a row is deleted:
- Disallow. The change is completely disallowed.
- Cascade. For updates, the change is cascaded to all dependent tables. For deletions, the rows in all dependent tables are deleted.
A simply stated design procedure
- For every table, choose a primary key that is minimal and stable.
- Note foreign keys, adding them if necessary to related tables. Draw relationships between the tables, noting if they are one-to-one or one-to-many. If they are many-to-many, create linking tables.
- Decompose the relations, as needed, to put them in 1NF. Are all fields atomic? Are there any repeating groups?
- Decompose the relations, as needed, to put them in 2NF. Does each table describe a single entity? Are all non-key columns dependent on the full primary key? That is, does the primary key imply all of the other columns in each table? If the table has a compound primary key, then the decomposition should, in general, be guided by breaking the key apart and putting all columns pertaining to each component of the primary key in their own table.
- Decompose the relations, as needed, to put them in 3NF. Are there any computed columns? Are there any mutually dependent non-key columns? Omit computed columns. Eliminate mutually dependent columns by decomposing into maps.
Summary
- A database designed according to the relational model is efficient, predictable, well-performing, self-documenting, and easy to modify.
- Every table must have a primary key, which uniquely identifies rows in the table.
- Foreign keys are columns used to reference a primary key in another table.
- You can establish 3 kinds of relationships between tables in a relational database: 1-to-1, 1-to-many, or many-to-many. Many-to-many relationships require a linking table.
- A well-designed database follows the Normal Forms.
- Entity integrity forbids nulls in primary key columns.
- Referential integrity says that the database must not contain any unmatched foreign key values.
Database design is an important aspect of application design. A properly designed databases is a solid foundation for your application.