Database Design Best Practices: Lessons from Teaching 43 Students
As a Database Programming teaching assistant at FILKOM UB, I've seen common patterns in how students (and even experienced developers) approach database design. This article compiles the most important lessons I teach in my lab sessions.
The Foundation: Normalization
First Normal Form (1NF): Eliminate repeating groups
- Each column contains atomic values
- Each row is unique
Second Normal Form (2NF): Remove partial dependencies
- All non-key attributes depend on the entire primary key
Third Normal Form (3NF): Remove transitive dependencies
- Non-key attributes depend only on the primary key
Real Example
Bad design:
CREATE TABLE Student_Courses (
student_id INT,
student_name VARCHAR(100),
course_id INT,
course_name VARCHAR(100),
instructor_name VARCHAR(100),
instructor_email VARCHAR(100)
);
Good design:
CREATE TABLE Students (
student_id INT PRIMARY KEY,
student_name VARCHAR(100)
);
CREATE TABLE Courses (
course_id INT PRIMARY KEY,
course_name VARCHAR(100),
instructor_id INT
);
CREATE TABLE Instructors (
instructor_id INT PRIMARY KEY,
instructor_name VARCHAR(100),
instructor_email VARCHAR(100)
);
CREATE TABLE Enrollments (
enrollment_id INT PRIMARY KEY,
student_id INT,
course_id INT,
FOREIGN KEY (student_id) REFERENCES Students(student_id),
FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);
Indexing Strategy
Indexes speed up queries but slow down writes. Here's when to use them:
Always index:
- Primary keys (automatic in most databases)
- Foreign keys
- Columns used in WHERE clauses frequently
- Columns used in JOIN conditions
Example:
-- Index for faster lookups
CREATE INDEX idx_student_email ON Students(email);
-- Composite index for common query patterns
CREATE INDEX idx_enrollment_student_course
ON Enrollments(student_id, course_id);
Query Optimization
1. Use EXPLAIN ANALYZE
Always check your query execution plan:
EXPLAIN ANALYZE
SELECT s.student_name, c.course_name
FROM Students s
JOIN Enrollments e ON s.student_id = e.student_id
JOIN Courses c ON e.course_id = c.course_id
WHERE s.status = 'active';
2. Avoid SELECT *
Specify only needed columns:
-- Bad
SELECT * FROM Students WHERE student_id = 1;
-- Good
SELECT student_id, student_name, email
FROM Students
WHERE student_id = 1;
Common Mistakes I See
Mistake 1: No data types consideration
-- Bad: Storing dates as strings
CREATE TABLE Events (
event_date VARCHAR(50)
);
-- Good: Use proper date types
CREATE TABLE Events (
event_date DATE
);
Mistake 2: Missing constraints
-- Bad: No constraints
CREATE TABLE Users (
user_id INT,
email VARCHAR(100),
age INT
);
-- Good: Proper constraints
CREATE TABLE Users (
user_id INT PRIMARY KEY,
email VARCHAR(100) UNIQUE NOT NULL,
age INT CHECK (age >= 0 AND age <= 150),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Mistake 3: Ignoring transactions
-- Bad: No transaction handling
UPDATE Accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE Accounts SET balance = balance + 100 WHERE account_id = 2;
-- Good: Use transactions
BEGIN TRANSACTION;
UPDATE Accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE Accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT;
Advanced Techniques
Window Functions
Powerful for analytics:
SELECT
student_name,
grade,
AVG(grade) OVER (PARTITION BY course_id) as avg_course_grade,
RANK() OVER (PARTITION BY course_id ORDER BY grade DESC) as rank
FROM StudentGrades;
Common Table Expressions (CTEs)
Make complex queries readable:
WITH ActiveStudents AS (
SELECT student_id, student_name
FROM Students
WHERE status = 'active'
),
EnrolledCourses AS (
SELECT student_id, COUNT(*) as course_count
FROM Enrollments
GROUP BY student_id
)
SELECT
a.student_name,
COALESCE(e.course_count, 0) as total_courses
FROM ActiveStudents a
LEFT JOIN EnrolledCourses e ON a.student_id = e.student_id;
Performance Tips
- Connection pooling: Don't create new connections for every query
- Prepared statements: Prevent SQL injection and improve performance
- Batch operations: Group multiple inserts/updates
- Proper data types: Use INT not VARCHAR for numbers
- Regular maintenance: VACUUM in PostgreSQL, index rebuilds in MSSQL
Conclusion
Good database design is foundational to any application's success. These principles have served my students well in their practicum tests and real-world projects.
Remember: optimize for correctness first, then performance. A well-designed database is easier to optimize later than fixing a poorly designed one.
Recommended reading:
- "Database Design for Mere Mortals" by Michael J. Hernandez
- PostgreSQL Documentation
- Microsoft SQL Server Best Practices
Happy querying! 🗄️