Database Management Systems

Top 20 DBMS Interview Questions

Comprehensive DBMS and SQL questions for technical and system design rounds. Covers 10–15% of typical interview weightage.

10–15% Interview Weightage

System Design Rounds

20 Questions

What is a Database Management System (DBMS) and why is it used?

A DBMS is software that manages databases — it provides an interface between users/applications and the physical data stored. It handles data storage, retrieval, and modification while ensuring data integrity, security, and consistency. Advantages over file systems: reduced data redundancy, concurrent access control, data independence, backup/recovery, enforced constraints. Examples: MySQL, PostgreSQL, Oracle, SQL Server, MongoDB.

What is the difference between SQL and NoSQL databases?

SQL (Relational) databases use structured tables with predefined schemas, support ACID transactions, and use SQL for queries. Best for structured data with complex relationships (e.g., MySQL, PostgreSQL). NoSQL databases are schema-flexible, designed for horizontal scaling, and come in types: Document (MongoDB), Key-Value (Redis), Column (Cassandra), Graph (Neo4j). Best for large-scale, unstructured/semi-structured data. Choose based on: consistency needs, scalability, and data structure.

What is Normalization? Explain 1NF, 2NF, 3NF, and BCNF.

Normalization organizes database tables to reduce redundancy and improve integrity. 1NF: Each column has atomic values; no repeating groups. 2NF: 1NF + No partial dependency (non-key attributes depend on entire primary key — applicable when composite key exists). 3NF: 2NF + No transitive dependency (non-key attributes depend only on primary key, not on other non-key attributes). BCNF (Boyce-Codd): Stricter 3NF — for every dependency X→Y, X must be a superkey.

What are ACID properties in database transactions?

ACID ensures reliable database transactions: Atomicity — transaction is all-or-nothing; if any part fails, the entire transaction rolls back. Consistency — database remains in a valid state before and after transaction. Isolation — concurrent transactions appear to execute serially; intermediate states are invisible to others. Durability — once committed, changes persist even after system failure (ensured via write-ahead logging). ACID is critical for financial systems, booking platforms, and any data-critical applications.

What is the difference between Primary Key, Foreign Key, and Unique Key?

Primary Key: uniquely identifies each row; cannot be NULL; only one per table; automatically creates a clustered index. Foreign Key: references the Primary Key of another table; enforces referential integrity; can have NULL values; ensures no orphan records. Unique Key: ensures all values in a column are distinct; can contain one NULL; a table can have multiple unique keys; used for alternate candidate keys like email or phone number.

What is an ER Diagram and what are its components?

An Entity-Relationship (ER) Diagram is a visual representation of the database structure. Components: Entity (rectangle) — a real-world object (e.g., Student, Live Class). Attribute (ellipse) — property of an entity (name, roll_no). Relationship (diamond) — association between entities. Cardinality: One-to-One, One-to-Many, Many-to-Many. Weak Entity (double rectangle) — cannot exist without a strong entity. Derived attribute (dashed ellipse) — computed from other attributes (age from date_of_birth).

Explain SQL Joins: INNER, LEFT, RIGHT, FULL OUTER, and CROSS JOIN.

INNER JOIN: returns rows matching in both tables. LEFT JOIN: all rows from left table + matching from right (NULLs where no match). RIGHT JOIN: all rows from right + matching from left. FULL OUTER JOIN: all rows from both tables (NULLs where no match on either side). CROSS JOIN: Cartesian product — every combination of rows (n×m rows). SELF JOIN: join a table with itself (for hierarchical data like employee-manager). LEFT ANTI JOIN: rows in left not in right (using `WHERE right.id IS NULL`).

What is Indexing in databases? Explain types of indexes.

Indexing creates a data structure that speeds up data retrieval at the cost of additional storage and slower writes. Without index: O(n) scan. With index: O(log n) search. Types: Primary Index — on primary key (auto-created). Secondary/Non-clustered Index — on non-key columns. Clustered Index — data rows stored in index order; one per table. Composite Index — on multiple columns. When to avoid indexing: small tables, frequently updated columns, columns with low selectivity (e.g., boolean). Use EXPLAIN to analyze query plans.

What is a Transaction? Explain Commit, Rollback, and Savepoint.

A Transaction is a unit of work that must be completed entirely or not at all. COMMIT: permanently saves all changes made in the transaction. ROLLBACK: undoes all changes since the last COMMIT or SAVEPOINT. SAVEPOINT: creates a point within a transaction to partially rollback to. Example: in a bank transfer, debit and credit must both succeed; if credit fails, ROLLBACK undoes the debit. Transaction states: Active → Partially Committed → Committed (success) or Failed → Aborted (failure).

What are the different types of SQL commands?

SQL commands are categorized as: DDL (Data Definition Language) — CREATE, ALTER, DROP, TRUNCATE (auto-commit). DML (Data Manipulation Language) — SELECT, INSERT, UPDATE, DELETE (can be rolled back). DCL (Data Control Language) — GRANT, REVOKE (access control). TCL (Transaction Control Language) — COMMIT, ROLLBACK, SAVEPOINT. DQL (Data Query Language) — SELECT. Key difference: TRUNCATE vs DELETE — TRUNCATE removes all rows faster, cannot be rolled back (DDL), and resets auto-increment; DELETE can be filtered with WHERE and rolled back.

What is a View in SQL and what are its advantages?

A View is a virtual table based on a SQL query — it doesn't store data physically but provides a saved query interface. Advantages: Simplifies complex queries, provides security by restricting column access, offers data abstraction (changes to underlying tables don't affect application code using views), and enables code reuse. Materialized View: stores query results physically and refreshes periodically — better performance but uses storage. Views can be updatable (simple views on single tables without aggregations).

What is a Stored Procedure and how does it differ from a Function?

Stored Procedure: precompiled SQL block stored in the database, executed with CALL/EXEC. Can: return multiple result sets, use DML statements, have IN/OUT parameters, and commit/rollback transactions. Function: returns a single value (scalar) or table (table-valued), must return a value, can be used in SELECT statements, and cannot commit/rollback. Key difference: procedures are for actions/side effects; functions are for computations. Both improve performance through precompilation and reduce network traffic.

Explain Concurrency Control and common problems in concurrent transactions.

Concurrency problems:

1) Dirty Read — reading uncommitted data from another transaction.

2) Non-Repeatable Read — same query returns different results because another transaction modified data.

3) Phantom Read — new rows appear in a re-executed query due to another transaction's inserts.

4) Lost Update — two transactions update the same data, one overwrites the other. SQL Isolation Levels (ascending strictness): Read Uncommitted → Read Committed → Repeatable Read → Serializable. Each level prevents specific anomalies at the cost of performance.

What is Denormalization and when should you use it?

Denormalization intentionally introduces redundancy into a normalized database to improve read performance. Achieved by merging tables, adding redundant columns, or pre-computing aggregates. When to use: read-heavy systems (data warehouses, analytics), when JOIN operations are too slow, reporting systems requiring frequent aggregations. Trade-offs: faster reads but slower writes, more storage, risk of data inconsistency. Example: storing order_total in Orders table instead of computing it from OrderItems every time.

What are Aggregate Functions in SQL? Give examples.

Aggregate Functions perform calculations on multiple rows: COUNT() — number of rows. SUM() — total of a column. AVG() — average value. MAX()/MIN() — highest/lowest value. GROUP BY — groups rows for aggregate calculations. HAVING — filters groups (like WHERE but for aggregates). Example: `SELECT department, COUNT(*), AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > 50000`. ORDER: WHERE → GROUP BY → HAVING → SELECT → ORDER BY.

What is the difference between DELETE, TRUNCATE, and DROP?

DELETE: DML command, removes rows one by one with optional WHERE clause, can be rolled back, triggers fire, slower, does not reset auto-increment. TRUNCATE: DDL command, removes all rows at once, typically cannot be rolled back (database-dependent), no triggers, resets auto-increment, much faster. DROP: DDL command, completely removes the table structure and all data permanently, cannot be rolled back, removes all associated indexes, constraints, and triggers. Use DELETE for selective removal, TRUNCATE to clear a table, DROP to eliminate the table entirely.

What is a Subquery and how does it differ from a JOIN?

A Subquery (nested query) is a query inside another query. Types: Scalar subquery (returns one value), Row subquery, Column subquery, Table subquery. Correlated subquery: references outer query, executed once per outer row — slower. Non-correlated: independent of outer query, executed once. JOINs vs Subqueries: JOINs are generally faster as the optimizer can choose execution strategy; subqueries are more readable for certain cases. Use EXISTS instead of IN for correlated subqueries with large datasets — EXISTS short-circuits on first match.

What is the CAP Theorem?

CAP Theorem (Brewer's Theorem) states that a distributed system can guarantee at most 2 of 3 properties: Consistency (C) — all nodes see the same data simultaneously. Availability (A) — every request receives a response (not necessarily the most recent). Partition Tolerance (P) — system continues operating despite network partitions. Since network partitions are inevitable in distributed systems, the practical choice is CP (consistent + partition tolerant, e.g., HBase, Zookeeper) or AP (available + partition tolerant, e.g., Cassandra, DynamoDB, CouchDB).

What is Database Sharding and Replication?

Sharding (Horizontal Partitioning): splits a large database into smaller, more manageable pieces called shards, each stored on a separate server. Each shard holds a subset of rows. Strategies: Range-based, Hash-based, Directory-based. Benefits: improved performance, scalability. Challenges: cross-shard queries, rebalancing. Replication: copies database to multiple servers. Master-Slave: writes go to master, reads distributed among slaves. Master-Master: all nodes accept writes. Benefits: high availability, fault tolerance, read scalability. Challenges: replication lag, conflict resolution.

Explain SQL Window Functions with examples.

Window Functions perform calculations across a set of rows related to the current row without collapsing results (unlike GROUP BY). Syntax: `function() OVER (PARTITION BY col ORDER BY col ROWS BETWEEN ...)`. Common functions: ROW_NUMBER() — unique sequential numbers. RANK() — same rank for ties, gaps after. DENSE_RANK() — same rank for ties, no gaps. LAG()/LEAD() — access previous/next row. SUM()/AVG() OVER — running totals/moving averages. Example: `SELECT name, salary, RANK() OVER (PARTITION BY dept ORDER BY salary DESC) as rank FROM employees`.

Explore other Interview Questions:

DSA

OOPs

Networks