- 1. Introduction to DBMS
- 2. Database Models & ER Diagrams
- 3. Relational Model & Keys in DBMS
- 4. Database Normalization & Schema Design
- 5. Indexing & Hashing
- 6. Transactions & Concurrency Control
- 7. Deadlocks & Recovery in DBMS
- 8. Storage Management & File Organization
- 9. NoSQL vs SQL Databases
- 10. DBMS Case Studies & Real-World Use Cases
- 11. Sharding & Replication Strategies
- 12. Big Data & Distributed Databases
- 13. Database Security & Encryption
- 14. Backup & Disaster Recovery
- 15. Cloud Databases & AWS RDS
Database Normalization & Schema Design
Database normalization is a process used to organize a database into tables and columns. The main goal of normalization is to reduce data redundancy and improve data integrity. This process involves dividing large tables into smaller, related tables and defining relationships between them. Normalization is typically done in several steps, called normal forms, each with specific rules and requirements.
Normal Forms
1. First Normal Form (1NF)
A table is in the first normal form if:
- All columns contain atomic (indivisible) values.
- Each column contains values of a single type.
- Each column has a unique name.
- The order in which data is stored does not matter.
Consider a table storing student information and their enrolled courses:
StudentID | Name | Courses |
---|---|---|
1 | Alice | Math, Science |
2 | Bob | History, English |
To convert this table to 1NF, we need to ensure that each column contains atomic values:
StudentID | Name | Course |
---|---|---|
1 | Alice | Math |
1 | Alice | Science |
2 | Bob | History |
2 | Bob | English |
2. Second Normal Form (2NF)
A table is in the second normal form if:
- It is in 1NF.
- All non-key attributes are fully functional dependent on the primary key.
Consider the 1NF table above. If we add a new column for the student's age, we need to ensure that age depends only on the student, not on the course:
StudentID | Name | Age | Course |
---|---|---|---|
1 | Alice | 20 | Math |
1 | Alice | 20 | Science |
2 | Bob | 22 | History |
2 | Bob | 22 | English |
To convert this table to 2NF, we separate the student information into a different table:
Students Table
StudentID | Name | Age |
---|---|---|
1 | Alice | 20 |
2 | Bob | 22 |
Enrollments Table
StudentID | Course |
---|---|
1 | Math |
1 | Science |
2 | History |
2 | English |
3. Third Normal Form (3NF)
A table is in the third normal form if:
- It is in 2NF.
- All the attributes are functionally dependent only on the primary key.
Consider the 2NF tables above. If we add a new column for the course instructor, we need to ensure that the instructor depends only on the course, not on the student:
Enrollments Table
StudentID | Course | Instructor |
---|---|---|
1 | Math | Dr. Smith |
1 | Science | Dr. Jones |
2 | History | Dr. Brown |
2 | English | Dr. White |
To convert this table to 3NF, we separate the course information into a different table:
Courses Table
Course | Instructor |
---|---|
Math | Dr. Smith |
Science | Dr. Jones |
History | Dr. Brown |
English | Dr. White |
Enrollments Table
StudentID | Course |
---|---|
1 | Math |
1 | Science |
2 | History |
2 | English |
4. Higher Normal Forms
There are additional normal forms (4NF, 5NF, etc.) that further reduce redundancy and improve data integrity, but they are less commonly used in practice.
Schema Design
Schema design involves creating a blueprint for how data will be stored in a database. A well-designed schema ensures efficient data retrieval, storage, and maintenance. Here are some key principles of schema design:
-
Gather all the requirements and understand the data that needs to be stored, including entities, attributes, and relationships.
-
Identify the main entities (tables) and their relationships. Use ER diagrams to visualize the structure.
-
Apply normalization rules to reduce redundancy and improve data integrity.
-
Ensure each table has a primary key and define foreign keys to establish relationships between tables.
-
Consider indexing frequently queried columns and denormalizing certain parts of the schema if necessary for performance reasons.
-
Review the schema design with stakeholders and refine it based on feedback and testing.
-
Plan for scalability to handle future growth in data volume and user load, considering partitioning and sharding strategies if necessary.
-
Maintain detailed documentation of the schema design, including entity definitions, relationships, and constraints, to facilitate maintenance and future development.
-
Conduct thorough testing to ensure the schema supports all required queries and operations efficiently and accurately.
-
Continuously monitor database performance and make necessary optimizations to maintain efficiency and responsiveness.
Example: Schema Design
Let's consider an example to illustrate schema design based on the normalization principles discussed above.
Scenario
We need to design a database for a university to manage student enrollments, courses, and instructors. We'll apply the normalization principles to ensure the database is well-structured and efficient.
Step 1: Identify Entities and Relationships
First, we identify the main entities and their relationships:
- Students: Represents the students enrolled in the university.
- Courses: Represents the courses offered by the university.
- Instructors: Represents the instructors teaching the courses.
- Enrollments: Represents the relationship between students and courses.
Step 2: Apply Normalization Rules
First Normal Form (1NF)
Ensure that each table has atomic values and each column contains values of a single type.
Initial Students Table
StudentID | Name | Courses |
---|---|---|
1 | Alice | Math, Science |
2 | Bob | History, English |
Converted to 1NF
StudentID | Name | Course |
---|---|---|
1 | Alice | Math |
1 | Alice | Science |
2 | Bob | History |
2 | Bob | English |
Second Normal Form (2NF)
Ensure that all non-key attributes are fully functional dependent on the primary key.
Students Table
StudentID | Name | Age |
---|---|---|
1 | Alice | 20 |
2 | Bob | 22 |
Enrollments Table
StudentID | Course |
---|---|
1 | Math |
1 | Science |
2 | History |
2 | English |
Third Normal Form (3NF)
Ensure that all attributes are functionally dependent only on the primary key.
Courses Table
Course | Instructor |
---|---|
Math | Dr. Smith |
Science | Dr. Jones |
History | Dr. Brown |
English | Dr. White |
Enrollments Table
StudentID | Course |
---|---|
1 | Math |
1 | Science |
2 | History |
2 | English |
Final Schema Design
After applying the normalization rules, our final schema design includes the following tables:
Students Table
StudentID | Name | Age |
---|---|---|
1 | Alice | 20 |
2 | Bob | 22 |
Courses Table
Course | Instructor |
---|---|
Math | Dr. Smith |
Science | Dr. Jones |
History | Dr. Brown |
English | Dr. White |
Enrollments Table
StudentID | Course |
---|---|
1 | Math |
1 | Science |
2 | History |
2 | English |
SQL Queries to Create the Schema
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
Name VARCHAR(100),
Age INT
);
CREATE TABLE Courses (
CourseID INT PRIMARY KEY,
Title VARCHAR(100),
Instructor VARCHAR(100)
);
CREATE TABLE Enrollments (
StudentID INT,
CourseID INT,
PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);
Special thanks to Prince Kumar Prasad for contributing to this guide on Nevo Code.