Database Design Best Practices: Lessons from Teaching 43 Students

Best Practices
Database
PostgreSQL
SQL
9 November 2025

As a Database Programming teaching assistant at FILKOM UB, I've seen common patterns in how students (and even experienced developers) approach database design. This article compiles the most important lessons I teach in my lab sessions.

The Foundation: Normalization

First Normal Form (1NF): Eliminate repeating groups

  • Each column contains atomic values
  • Each row is unique

Second Normal Form (2NF): Remove partial dependencies

  • All non-key attributes depend on the entire primary key

Third Normal Form (3NF): Remove transitive dependencies

  • Non-key attributes depend only on the primary key

Real Example

Bad design:

CREATE TABLE Student_Courses (
    student_id INT,
    student_name VARCHAR(100),
    course_id INT,
    course_name VARCHAR(100),
    instructor_name VARCHAR(100),
    instructor_email VARCHAR(100)
);

Good design:

CREATE TABLE Students (
    student_id INT PRIMARY KEY,
    student_name VARCHAR(100)
);

CREATE TABLE Courses (
    course_id INT PRIMARY KEY,
    course_name VARCHAR(100),
    instructor_id INT
);

CREATE TABLE Instructors (
    instructor_id INT PRIMARY KEY,
    instructor_name VARCHAR(100),
    instructor_email VARCHAR(100)
);

CREATE TABLE Enrollments (
    enrollment_id INT PRIMARY KEY,
    student_id INT,
    course_id INT,
    FOREIGN KEY (student_id) REFERENCES Students(student_id),
    FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);

Indexing Strategy

Indexes speed up queries but slow down writes. Here's when to use them:

Always index:

  • Primary keys (automatic in most databases)
  • Foreign keys
  • Columns used in WHERE clauses frequently
  • Columns used in JOIN conditions

Example:

-- Index for faster lookups
CREATE INDEX idx_student_email ON Students(email);

-- Composite index for common query patterns
CREATE INDEX idx_enrollment_student_course 
ON Enrollments(student_id, course_id);

Query Optimization

1. Use EXPLAIN ANALYZE

Always check your query execution plan:

EXPLAIN ANALYZE
SELECT s.student_name, c.course_name
FROM Students s
JOIN Enrollments e ON s.student_id = e.student_id
JOIN Courses c ON e.course_id = c.course_id
WHERE s.status = 'active';

2. Avoid SELECT *

Specify only needed columns:

-- Bad
SELECT * FROM Students WHERE student_id = 1;

-- Good
SELECT student_id, student_name, email 
FROM Students 
WHERE student_id = 1;

Common Mistakes I See

Mistake 1: No data types consideration

-- Bad: Storing dates as strings
CREATE TABLE Events (
    event_date VARCHAR(50)
);

-- Good: Use proper date types
CREATE TABLE Events (
    event_date DATE
);

Mistake 2: Missing constraints

-- Bad: No constraints
CREATE TABLE Users (
    user_id INT,
    email VARCHAR(100),
    age INT
);

-- Good: Proper constraints
CREATE TABLE Users (
    user_id INT PRIMARY KEY,
    email VARCHAR(100) UNIQUE NOT NULL,
    age INT CHECK (age >= 0 AND age <= 150),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Mistake 3: Ignoring transactions

-- Bad: No transaction handling
UPDATE Accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE Accounts SET balance = balance + 100 WHERE account_id = 2;

-- Good: Use transactions
BEGIN TRANSACTION;
UPDATE Accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE Accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT;

Advanced Techniques

Window Functions

Powerful for analytics:

SELECT 
    student_name,
    grade,
    AVG(grade) OVER (PARTITION BY course_id) as avg_course_grade,
    RANK() OVER (PARTITION BY course_id ORDER BY grade DESC) as rank
FROM StudentGrades;

Common Table Expressions (CTEs)

Make complex queries readable:

WITH ActiveStudents AS (
    SELECT student_id, student_name
    FROM Students
    WHERE status = 'active'
),
EnrolledCourses AS (
    SELECT student_id, COUNT(*) as course_count
    FROM Enrollments
    GROUP BY student_id
)
SELECT 
    a.student_name,
    COALESCE(e.course_count, 0) as total_courses
FROM ActiveStudents a
LEFT JOIN EnrolledCourses e ON a.student_id = e.student_id;

Performance Tips

  1. Connection pooling: Don't create new connections for every query
  2. Prepared statements: Prevent SQL injection and improve performance
  3. Batch operations: Group multiple inserts/updates
  4. Proper data types: Use INT not VARCHAR for numbers
  5. Regular maintenance: VACUUM in PostgreSQL, index rebuilds in MSSQL

Conclusion

Good database design is foundational to any application's success. These principles have served my students well in their practicum tests and real-world projects.

Remember: optimize for correctness first, then performance. A well-designed database is easier to optimize later than fixing a poorly designed one.

Recommended reading:

  • "Database Design for Mere Mortals" by Michael J. Hernandez
  • PostgreSQL Documentation
  • Microsoft SQL Server Best Practices

Happy querying! 🗄️