The Complete Guide to Databases: From Zero to Production
Everything you actually need to know explained the way a senior engineer would explain it to you over coffee, not in a textbook.
1. What is a database, really?
At its core, a database is a persistence system a structured place to create, read, update, and delete data that survives beyond a single running program.
When developers talk about databases in a backend context, they almost always mean a diskbased database. RAMbased (inmemory) databases like Redis and Memcached offer blazingfast speed but data is lost on restart and they're expensive. Diskbased databases like Postgres and MongoDB are slower in comparison but survive restarts, are cheap, and scalable.
The system that sits on top of your data and handles reading, writing, security, and concurrency is called a DBMS Database Management System. Postgres is a DBMS. The database is the data; the DBMS is the engine that manages it.
2. Why not just use a file?
Before databases, engineers stored data in plain text files. This worked fine until it didn't. Three problems killed it:
Parsing: Every application needed its own parsing logic. Change the format once and everything breaks.
Structure: No builtin schema. A missing comma could corrupt your entire dataset.
Concurrency: Two users updating the same file simultaneously? One of them loses their changes. Silently.
Concurrency is the core problem databases were built to solve. Everything else is a bonus.
3. Relational vs nonrelational
This is the question every backend developer gets asked in interviews and the honest answer is: it depends on your data shape and access patterns.
SQL (Relational) databases have a structure of tables, rows, and columns with a strict fixed schema. NoSQL databases use documents or keyvalue structures with flexible schemas. SQL scales vertically; NoSQL scales horizontally.
A practical decision framework:
Use SQL for structured data where integrity matters financial transactions, user accounts, orders, inventory.
Use NoSQL for flexible or rapidly evolving schemas reviews, recommendation feeds, usergenerated content.
Use SQL when you need complex joins, aggregations, or analytics.
Use NoSQL when you need horizontal sharding across servers social media posts, realtime location data.
4. Why Postgres wins (even against MongoDB)
Postgres is opensource, battletested, and fully SQLstandard. But its real killer feature is JSONB support.
With JSONB, you can store flexible, dynamic JSON inside a relational database indexed and queryable. You get the schema flexibility of MongoDB and the integrity of SQL, in one system. For most production backends, this makes Postgres the obvious default.
My stack recommendation: For hosted Postgres, Neon is excellent serverless, branches like Git, and has a generous free tier. Pair it with Prisma as your ORM for a modern TypeScriptfirst backend.
5. Data types that actually matter
Most tutorials gloss over types. Don't. Picking the wrong type causes subtle bugs that are painful to migrate away from later.
Numbers:
SERIAL / BIGSERIAL autoincrementing integers, great for simple IDs.
DECIMAL / NUMERIC exact precision. Always use for money. Never use FLOAT for currency floatingpoint math has rounding errors.
SMALLINT, INTEGER, BIGINT pick based on the range you need.
Strings:
TEXT variable length, no limit. In Postgres, prefer this over VARCHAR(n). Performance is identical.
VARCHAR(255) the infamous default. That 255 is a MySQL legacy habit. In Postgres, just use TEXT.
CHAR(n) fixed length, pads with spaces. Almost never the right choice.
Special types:
UUID better than integer IDs for distributed systems. Also prevents ID enumeration attacks.
JSONB binary JSON. Always prefer over plain JSON Postgres indexes and queries it faster.
ENUM restrict a column to a defined set of values like 'pending', 'paid', 'refunded'.
TIMESTAMPTZ timestamp with time zone. Always store everything in UTC.
6. Schema design: relationships & constraints
Naming convention: Use plural table names (users, projects) and snake_case for columns (full_name, created_at). Postgres is caseinsensitive CamelCase silently lowercases, causing confusing bugs.
The three relationship types:
Onetoone (User → Profile): Split into two tables to keep the users table lean.
Onetomany (Project → Tasks): The tasks table holds a project_id foreign key pointing back to the parent project.
Manytomany (Users ↔ Projects): Requires a linking table (e.g. project_members) with a composite primary key of (user_id, project_id).
Constraints:
Primary key: Implicitly unique and not null.
Foreign key: Ensures you can't reference a record that doesn't exist.
Check constraint: Encodes business logic at the DB level e.g. CHECK (priority BETWEEN 1 AND 5).
Unique constraint: Prevents duplicate values e.g. one email per user.
Referential integrity what happens on delete?
RESTRICT blocks the delete if child records exist.
CASCADE deletes all children automatically. Use carefully.
SET NULL sets the foreign key to null on the child.
7. Eight schema design rules for production
id String @id @default(uuid())
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt8. ACID the guarantees that keep your data sane
Imagine you're building an ecommerce checkout. When a customer places an order, your backend needs to check availability, deduct payment, reduce stock, and create the order record. What if step 2 succeeds but step 3 fails?
This is why databases have transactions a group of operations that either all succeed or all fail together.
Atomicity all or nothing. If any one statement fails, everything rolls back.
Consistency every transaction takes the database from one valid state to another.
Isolation concurrent transactions don't interfere with each other.
Durability once committed, data survives crashes and power outages.
BEGIN;
UPDATE accounts SET balance = balance 500 WHERE id = 1;
UPDATE accounts SET balance = balance + 500 WHERE id = 2;
COMMIT;
If the second UPDATE fails, the first is rolled back automatically.Note: NoSQL databases often sacrifice some ACID guarantees for speed and scale. If your application depends on financial consistency, use a SQL database.
9. Indexes the single biggest performance lever you control
An index is a separate data structure that Postgres maintains alongside your table. It trades a small amount of write performance for dramatically faster reads.
Without an index, a query causes a sequential scan Postgres reads every single row until it finds a match. With 10 million rows, that query times out.
CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_bookings_user_id ON bookings (user_id);Rule of thumb: Add indexes on any column you regularly filter, sort, or join on.
Composite indexes: If you frequently query by two columns together, a composite index on (user_id, status) is faster than two separate indexes. Column order matters.
Partial indexes: Index only a subset of rows:
CREATE INDEX idx_pending_orders ON orders (user_id) WHERE status = 'pending';Always EXPLAIN your slow queries:
EXPLAIN ANALYZE SELECT * FROM bookings WHERE user_id = 42;
"Seq Scan" on a large table → you need an index
"Index Scan" → the index is being used correctly10. The N+1 problem
One of the most common performance bugs in backend applications almost always introduced at the ORM layer without the developer noticing.
You want 10 projects with their tasks.
What your ORM is actually doing:
SELECT * FROM projects LIMIT 10; 1 query
SELECT * FROM tasks WHERE project_id = 1; then 10 more...
SELECT * FROM tasks WHERE project_id = 2;
Total: 11 queries. This is N+1.The fix is eager loading fetch all related data in a single JOIN:
const projects = await prisma.project.findMany({
include: { tasks: true }
// Generates a single JOIN query, not 11 separate selects.
});An endpoint that feels instant in development with 10 records can become catastrophically slow in production with 10,000.
11. ORMs & Prisma
An ORM (ObjectRelational Mapper) sits between your application code and the database. It lets you interact with your data using your programming language instead of raw SQL strings.
What Prisma gives you:
Type safety: Queries are fully typed. If you rename a column, TypeScript catches it at compile time.
Autogenerated client: Prisma reads your schema and generates a typed client.
Migration system: Tracks schema changes and generates SQL migration files automatically.
Readable syntax: Far easier to audit than raw SQL strings.
model User {
id String @id @default(uuid())
email String @unique
name String?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
projects ProjectMember[]
}
model Project {
id String @id @default(uuid())
name String
description String?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
tasks Task[]
members ProjectMember[]
}
model Task {
id String @id @default(uuid())
title String
priority Int @default(1)
completed Boolean @default(false)
projectId String
createdAt DateTime @default(now())
project Project @relation(fields: [projectId], references: [id], onDelete: Cascade)
@@index([projectId])
}Common Prisma query patterns:
// Find with filters, pagination, and sorting
const users = await prisma.user.findMany({
where: { active: true },
orderBy: { createdAt: 'desc' },
skip: 0,
take: 20,
});
// Find with related data (avoids N+1)
const project = await prisma.project.findUnique({
where: { id: projectId },
include: {
tasks: { where: { completed: false } },
members: { include: { user: true } }
}
});
// Create inside a transaction (atomic)
const result = await prisma.$transaction([
prisma.order.create({ data: orderData }),
prisma.inventory.update({
where: { productId },
data: { stock: { decrement: 1 } }
})
]);Always use prisma.$queryRaw with tagged template literals never string concatenation. The tagged template handles parameterization automatically, preventing SQL injection.
12. Migrations
Your schema will change. Migrations are how those changes travel safely from your laptop to production without manual SQL scripts.
A migration is a versioned, tracked SQL file that describes a schema change. Every developer on your team runs the same migrations, in the same order.
# Development (can reset and reseed if needed)
npx prisma migrate dev name add_phone_to_users
# Production (never resets only applies pending migrations)
npx prisma migrate deployCritical: Never run migrate dev in production. It can reset your database. Always use migrate deploy in CI/CD pipelines.
13. Connection pooling
This is the production gotcha that nobody warns you about in tutorials. Your app works perfectly with 10 users. At 100 concurrent users, your database starts refusing connections.
Every database connection consumes memory on the Postgres server. A standard Postgres instance handles around 100 connections maximum. The problem is especially severe with serverless functions each invocation wants its own DB connection.
The fix: a connection pooler sits between your app and Postgres. Your app connects to the pooler which handles thousands of incoming connections and the pooler maintains a small, fixed pool of real Postgres connections (e.g. 20).
Without pooling: App (1000 requests) → Postgres (1000 connections) → CRASH
With pooling: App (1000 requests) → PgBouncer → Postgres (20 connections) → FineOptions: Neon has builtin connection pooling. PgBouncer is the standard selfhosted solution. Prisma Accelerate is Prisma's hosted connection pooler with a global caching layer.
14. Optimistic vs pessimistic locking
Two users open the same document at the same time. Both make edits. One saves first. The other overwrites the first user's changes. This is a lost update.
Pessimistic locking lock it before you touch it:
BEGIN;
SELECT * FROM bookings WHERE id = 42 FOR UPDATE;
UPDATE bookings SET status = 'confirmed' WHERE id = 42;
COMMIT;
Other transactions trying to read this row will wait.Optimistic locking check before you save. Add a version column. When saving, check that the version matches what you originally read:
UPDATE documents
SET content = 'new content', version = version + 1
WHERE id = 42 AND version = 3;
0 rows affected → someone else updated first → show conflict to userUse pessimistic for short, highcontention operations (booking a seat). Use optimistic for longer operations with low collision probability (editing a document).
15. Soft deletes
A hard delete permanently removes the row. But in most production systems, you don't actually want that.
A user deletes their account their orders, reviews, and messages still reference their user ID. A hard delete either orphans all that data or cascades and destroys it. Regulations like GDPR also require you to prove what data existed and when.
Soft delete means adding a deleted_at timestamp column. Instead of deleting, you set it:
UPDATE users SET deleted_at = NOW() WHERE id = 42;
SELECT * FROM users WHERE deleted_at IS NULL;The tradeoff: WHERE deleted_at IS NULL must be in every query. Unique constraints also break (a deleted user with email X blocks a new signup). Use soft deletes for userfacing records (users, orders). Use hard deletes for operational data (sessions, logs).
Wrapping up
You now have a complete picture of databases from first principles to production concerns. The concepts that matter most: schema design with proper normalization, ACID transactions, indexes on the right columns, migrations as part of your deploy process, and connection pooling before you go live. Everything else builds on these foundations.