Database Normalization & Schema Design

Database normalization is a process used to organize a database into tables and columns. The main goal of normalization is to reduce data redundancy and improve data integrity. This process involves dividing large tables into smaller, related tables and defining relationships between them. Normalization is typically done in several steps, called normal forms, each with specific rules and requirements.

Normal Forms

1. First Normal Form (1NF)

A table is in the first normal form if:

  • All columns contain atomic (indivisible) values.
  • Each column contains values of a single type.
  • Each column has a unique name.
  • The order in which data is stored does not matter.

Consider a table storing student information and their enrolled courses:

StudentIDNameCourses
1AliceMath, Science
2BobHistory, English

To convert this table to 1NF, we need to ensure that each column contains atomic values:

StudentIDNameCourse
1AliceMath
1AliceScience
2BobHistory
2BobEnglish

2. Second Normal Form (2NF)

A table is in the second normal form if:

  • It is in 1NF.
  • All non-key attributes are fully functional dependent on the primary key.

Consider the 1NF table above. If we add a new column for the student's age, we need to ensure that age depends only on the student, not on the course:

StudentIDNameAgeCourse
1Alice20Math
1Alice20Science
2Bob22History
2Bob22English

To convert this table to 2NF, we separate the student information into a different table:

Students Table

StudentIDNameAge
1Alice20
2Bob22

Enrollments Table

StudentIDCourse
1Math
1Science
2History
2English

3. Third Normal Form (3NF)

A table is in the third normal form if:

  • It is in 2NF.
  • All the attributes are functionally dependent only on the primary key.

Consider the 2NF tables above. If we add a new column for the course instructor, we need to ensure that the instructor depends only on the course, not on the student:

Enrollments Table

StudentIDCourseInstructor
1MathDr. Smith
1ScienceDr. Jones
2HistoryDr. Brown
2EnglishDr. White

To convert this table to 3NF, we separate the course information into a different table:

Courses Table

CourseInstructor
MathDr. Smith
ScienceDr. Jones
HistoryDr. Brown
EnglishDr. White

Enrollments Table

StudentIDCourse
1Math
1Science
2History
2English

4. Higher Normal Forms

There are additional normal forms (4NF, 5NF, etc.) that further reduce redundancy and improve data integrity, but they are less commonly used in practice.

Schema Design

Schema design involves creating a blueprint for how data will be stored in a database. A well-designed schema ensures efficient data retrieval, storage, and maintenance. Here are some key principles of schema design:

  • Gather all the requirements and understand the data that needs to be stored, including entities, attributes, and relationships.

  • Identify the main entities (tables) and their relationships. Use ER diagrams to visualize the structure.

  • Apply normalization rules to reduce redundancy and improve data integrity.

  • Ensure each table has a primary key and define foreign keys to establish relationships between tables.

  • Consider indexing frequently queried columns and denormalizing certain parts of the schema if necessary for performance reasons.

  • Review the schema design with stakeholders and refine it based on feedback and testing.

  • Plan for scalability to handle future growth in data volume and user load, considering partitioning and sharding strategies if necessary.

  • Maintain detailed documentation of the schema design, including entity definitions, relationships, and constraints, to facilitate maintenance and future development.

  • Conduct thorough testing to ensure the schema supports all required queries and operations efficiently and accurately.

  • Continuously monitor database performance and make necessary optimizations to maintain efficiency and responsiveness.

Example: Schema Design

Let's consider an example to illustrate schema design based on the normalization principles discussed above.

Scenario

We need to design a database for a university to manage student enrollments, courses, and instructors. We'll apply the normalization principles to ensure the database is well-structured and efficient.

Step 1: Identify Entities and Relationships

First, we identify the main entities and their relationships:

  • Students: Represents the students enrolled in the university.
  • Courses: Represents the courses offered by the university.
  • Instructors: Represents the instructors teaching the courses.
  • Enrollments: Represents the relationship between students and courses.

Step 2: Apply Normalization Rules

First Normal Form (1NF)

Ensure that each table has atomic values and each column contains values of a single type.

Initial Students Table

StudentIDNameCourses
1AliceMath, Science
2BobHistory, English

Converted to 1NF

StudentIDNameCourse
1AliceMath
1AliceScience
2BobHistory
2BobEnglish

Second Normal Form (2NF)

Ensure that all non-key attributes are fully functional dependent on the primary key.

Students Table

StudentIDNameAge
1Alice20
2Bob22

Enrollments Table

StudentIDCourse
1Math
1Science
2History
2English

Third Normal Form (3NF)

Ensure that all attributes are functionally dependent only on the primary key.

Courses Table

CourseInstructor
MathDr. Smith
ScienceDr. Jones
HistoryDr. Brown
EnglishDr. White

Enrollments Table

StudentIDCourse
1Math
1Science
2History
2English

Final Schema Design

After applying the normalization rules, our final schema design includes the following tables:

Students Table

StudentIDNameAge
1Alice20
2Bob22

Courses Table

CourseInstructor
MathDr. Smith
ScienceDr. Jones
HistoryDr. Brown
EnglishDr. White

Enrollments Table

StudentIDCourse
1Math
1Science
2History
2English

SQL Queries to Create the Schema

CREATE TABLE Students (
    StudentID INT PRIMARY KEY,
    Name VARCHAR(100),
    Age INT
);
 
CREATE TABLE Courses (
    CourseID INT PRIMARY KEY,
    Title VARCHAR(100),
    Instructor VARCHAR(100)
);
 
CREATE TABLE Enrollments (
    StudentID INT,
    CourseID INT,
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
    FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);

Special thanks to Prince Kumar Prasad for contributing to this guide on Nevo Code.