Good database design requires an understanding of relational database concepts. Relational database management systems require the proper organizing of data to minimize redundancy insure data integrity. Proper database design is required if you want the database to be easily expandable as needs grow. The process by which database design is improved is called normalization. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then reports and views of the data reflect the change everywhere because the data relationships were properly defined. Good design also insures that the database responds most efficiently.
Database design is the most critical, yet sometimes overlooked, part of the application development lifecycle. The process of creating a proper data model helps the project team uncover the right questions to ask of manager and end users. Effective database design gives application developers the ability to program systems that perform properly from the beginning. Proper database design reduces the overall time it takes to complete the project, which in turn reduces project development costs.
Normalization in database design has four key impacts:
- minimization of data redundancy,
- minimization of data restructuring,
- minimization of I/O by reduction of transaction sizes, and
- enforcement of referential integrity.
Good database design requires that you carefully consider and precisely answer the following types of questions with regard to your data (note that an answer of “rarely” is equivalent to an answer of “yes” not to an answer of “no”’):
1. What are the tables needed?
What are the groups of data that need to be tracked? For example, a group might be an “order”, or an “invoice”, or an “event” or a “person”.
2. What are the attributes, or columns, that we need to track?
For each “thing” that is to be tracked in the group can all of the attributes that we care about for that thing be stored in a single record? Or conversely, do we need to keep track of multiple occurrences of the attributes that describe the thing? Fox example, if we are talking about people, they each have only one height, one hair color, one SSN and one first and last name. But they may have multiple degrees or certifications. If you want to track attributes that have multiple occurrences you will need a “child” table that will be related to the primary table that stores the single instance attributes that describe the “thing”.
3. Do we need to track attributes over time?
The multiple occurrence issue (see question 2 above) must also be asked with respect to attributes that may change over time. With regard to each attribute do we need to track only the current information or how the information has changed over time? For example, each customer might have only one current billing address, but do we care about their previous billing addresses? If we care about the changes over time, or previous instances, then we need to keep that data in a “child” table. Note that attributes that we thought were single instance, such as the height of a person, become multiple instance if we are tracking them over time.
4. What are the appropriate primary keys?
We saw in questions 2 and 3 that there were times that we would need to create “child” tables. A child table needs to be related to its parent, and the parent to the child, so that we can retrieve corresponding data easily. To create these relationships we need a mechanism to uniquely define each record and to link the records from the parent and child tables so we can access them simultaneously. This is most critical element of database design involves the defining of keys. Each table should have a primary key that uniquely identifies each record in the table. It is critical that the primary key not change over time. Modern design and best practice indicates that a primary key should most often be non-intelligent; that is, the value should be assigned arbitrarily and should have no inherent meaning other than uniquely identifying the record.
5.What are the appropriate foreign keys?
A “foreign key” is used to link parent and child records. A foreign key requires that a primary key value must be present in the parent table before that value can be used in the child table. The concept of maintaining making sure that the keys are built, inserted and deleted in the proper order is known as “referential integrity”.
6. What are the relationships between the data groups?
Relationships are logical links. When we discuss relationships we are first concerned with identifying how we identify each of the records in the group and how do we link one group with another. For a practical example consider invoices. Invoices are usually identified by invoice number. Each invoice has a number and two different invoices never have the same number. Invoices are sent to customers and customers have a unique identifier such as customer number. Since each invoice is for exactly one customer if we store the customer number with the invoice number we can get to the customer information for that invoice. An invoice may have many detail lines with each line having a quantity, product ID and price. If we store the invoice number in each detail row we can easily get all the details for that invoice. Most relationships of this type are one-to-many relationships as multiple details belong to one parent item.
7. Are there instance relationships?
Instance relationships are more complex relationships which are used when we have data groups that have a many-to-many relationship. A practical example is high school students who take classes. Each class has many students in it. Each student takes many classes. To record that kind of data we add a record to a table that represents the occurrence or instance of each relation. So we add a record that says that Mary is taking Math 1 and another record that Mary is taking English 3. We add a record that Joe is also taking Math 1 and records for each additional class that Joe is taking. Notice that we wouldn’t keep any info about the students with those records as that info is, and should be, kept in the student record. If we tried to put student info in those instance records we would need to repeat the information multiple times as there are multiple records for each student. Creating duplicate data invariably leads to conflicts within the database.
Proper database design is a necessary requirement for the efficient coding of applications. It also offers the best likelihood that the system will be easily expandable and have the longest service life possible.
Feel free to contact us for a no-cost consultation to learn how we can help.
609-642-4666 Ext. 103