What is PostgreSQL?
PostgreSQL (often called "Postgres") is a powerful, open-source relational database management system (RDBMS) with over 35 years of active development. It's known for its reliability, feature robustness, and performance.
Think of PostgreSQL as a highly organized filing cabinet where data is stored in tables (like spreadsheets), and you can create complex relationships between different pieces of information. It ensures your data is safe, consistent, and quickly retrievable.
Why PostgreSQL?
PostgreSQL has become the preferred database for modern web applications:
- ACID Compliance: Guarantees data integrity with Atomicity, Consistency, Isolation, Durability
- Advanced Features: JSON support, full-text search, geospatial data (PostGIS)
- Open Source: Free, with a permissive license and active community
- Scalability: Handles small apps to enterprise systems with billions of rows
- Standards Compliant: Follows SQL standards closely
- Extensible: Custom functions, data types, and operators
- Python Integration: Excellent support via psycopg2 and Django/SQLAlchemy
- Reliability: Known for data integrity and crash recovery
When to Use PostgreSQL (Relational Databases)
Use PostgreSQL when you need:
- Structured Data: Data fits naturally into tables with defined relationships
- Data Integrity: Banking, e-commerce, or any app where accuracy is critical
- Complex Queries: Joins, aggregations, and complex filtering
- ACID Transactions: Need to ensure all-or-nothing operations (e.g., money transfers)
- Reporting: Analytics and business intelligence queries
- Multi-user Applications: Concurrent access with proper locking
When NOT to use relational databases:
- Unstructured Data: Use MongoDB or document stores for flexible schemas
- Simple Key-Value Storage: Redis is faster for caching
- Extreme Horizontal Scaling: Cassandra or DynamoDB for massive scale
- Graph Relationships: Neo4j for social networks or recommendation engines
Database Basics: Tables and Data Types
Tables are the foundation of relational databases. Each table has columns (fields) with specific data types.
-- Create a table
CREATE TABLE users (
id SERIAL PRIMARY KEY, -- Auto-incrementing integer
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
is_active BOOLEAN DEFAULT TRUE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Common data types
-- Text types
VARCHAR(n) -- Variable-length string (max n characters)
TEXT -- Unlimited text
CHAR(n) -- Fixed-length string
-- Numeric types
INTEGER -- Whole numbers (-2B to 2B)
BIGINT -- Large whole numbers
DECIMAL(p,s) -- Exact decimals (e.g., money)
REAL -- Floating point (approximate)
SERIAL -- Auto-incrementing integer
-- Date/Time types
DATE -- Date only (YYYY-MM-DD)
TIME -- Time only (HH:MM:SS)
TIMESTAMP -- Date and time
INTERVAL -- Time interval (e.g., '2 days')
-- Other types
BOOLEAN -- TRUE/FALSE
JSON -- JSON data
JSONB -- Binary JSON (faster, recommended)
UUID -- Universally unique identifier
ARRAY -- Arrays of any type
CRUD Operations: Create, Read, Update, Delete
-- CREATE (INSERT)
INSERT INTO users (username, email, password_hash)
VALUES ('alice', 'alice@example.com', 'hashed_password_123');
-- Insert multiple rows
INSERT INTO users (username, email, password_hash)
VALUES
('bob', 'bob@example.com', 'hashed_pass'),
('charlie', 'charlie@example.com', 'hashed_pass');
-- READ (SELECT)
-- Get all users
SELECT * FROM users;
-- Get specific columns
SELECT username, email FROM users;
-- Filter with WHERE
SELECT * FROM users WHERE is_active = TRUE;
-- Multiple conditions
SELECT * FROM users
WHERE is_active = TRUE AND created_at > '2024-01-01';
-- Pattern matching
SELECT * FROM users WHERE email LIKE '%@gmail.com';
-- Sorting
SELECT * FROM users ORDER BY created_at DESC;
-- Limit results
SELECT * FROM users LIMIT 10 OFFSET 20; -- Pagination
-- UPDATE
UPDATE users
SET is_active = FALSE
WHERE username = 'alice';
-- Update multiple columns
UPDATE users
SET
email = 'newemail@example.com',
updated_at = CURRENT_TIMESTAMP
WHERE id = 1;
-- DELETE
DELETE FROM users WHERE id = 1;
-- Delete with condition
DELETE FROM users WHERE is_active = FALSE;
Relationships: Foreign Keys and Joins
Relationships connect tables together, allowing you to model real-world data structures.
-- Create related tables
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id) ON DELETE CASCADE,
title VARCHAR(200) NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE comments (
id SERIAL PRIMARY KEY,
post_id INTEGER REFERENCES posts(id) ON DELETE CASCADE,
user_id INTEGER REFERENCES users(id) ON DELETE CASCADE,
content TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Relationship Types:
-- ON DELETE CASCADE → Delete related records
-- ON DELETE SET NULL → Set foreign key to NULL
-- ON DELETE RESTRICT → Prevent deletion if references exist
-- INNER JOIN (only matching records)
SELECT users.username, posts.title
FROM users
INNER JOIN posts ON users.id = posts.user_id;
-- LEFT JOIN (all from left table, matching from right)
SELECT users.username, COUNT(posts.id) as post_count
FROM users
LEFT JOIN posts ON users.id = posts.user_id
GROUP BY users.username;
-- Multiple joins
SELECT
users.username,
posts.title,
comments.content
FROM comments
INNER JOIN posts ON comments.post_id = posts.id
INNER JOIN users ON comments.user_id = users.id
WHERE posts.id = 1;
-- Self join (join table to itself)
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
manager_id INTEGER REFERENCES employees(id)
);
SELECT
e.name as employee,
m.name as manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
Aggregation and Grouping
-- Aggregate functions
SELECT COUNT(*) FROM users; -- Total users
SELECT COUNT(*) FROM users WHERE is_active; -- Active users
SELECT MAX(created_at) FROM posts; -- Most recent post
SELECT MIN(created_at) FROM posts; -- Oldest post
SELECT AVG(age) FROM users; -- Average age
SELECT SUM(price) FROM orders; -- Total sales
-- GROUP BY
SELECT user_id, COUNT(*) as post_count
FROM posts
GROUP BY user_id;
-- GROUP BY with JOIN
SELECT users.username, COUNT(posts.id) as total_posts
FROM users
LEFT JOIN posts ON users.id = posts.user_id
GROUP BY users.username
ORDER BY total_posts DESC;
-- HAVING (filter after grouping)
SELECT user_id, COUNT(*) as post_count
FROM posts
GROUP BY user_id
HAVING COUNT(*) > 5; -- Users with more than 5 posts
-- Complex aggregation
SELECT
DATE_TRUNC('month', created_at) as month,
COUNT(*) as posts_count,
COUNT(DISTINCT user_id) as unique_users
FROM posts
WHERE created_at >= '2024-01-01'
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;
Indexes: Making Queries Fast
Indexes are like book indexes - they help the database find data quickly without scanning every row.
-- Create an index
CREATE INDEX idx_users_email ON users(email);
-- Unique index (also enforces uniqueness)
CREATE UNIQUE INDEX idx_users_username ON users(username);
-- Composite index (multiple columns)
CREATE INDEX idx_posts_user_created ON posts(user_id, created_at DESC);
-- Partial index (only index some rows)
CREATE INDEX idx_active_users ON users(email) WHERE is_active = TRUE;
-- Full-text search index
CREATE INDEX idx_posts_content ON posts USING GIN(to_tsvector('english', content));
-- View indexes
\di -- In psql
-- Drop an index
DROP INDEX idx_users_email;
-- When to use indexes:
-- ✅ Columns used in WHERE clauses
-- ✅ Columns used in JOIN conditions
-- ✅ Columns used in ORDER BY
-- ✅ Foreign keys (usually)
-- ❌ Small tables (overhead not worth it)
-- ❌ Columns with low cardinality (e.g., boolean)
-- ❌ Tables with frequent INSERTs/UPDATEs (indexes slow these down)
Transactions: Ensuring Data Integrity
Transactions ensure that multiple operations either all succeed or all fail - no partial changes.
-- Example: Transfer money between accounts
BEGIN; -- Start transaction
UPDATE accounts SET balance = balance - 100 WHERE user_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE user_id = 2;
COMMIT; -- Save changes
-- If something goes wrong
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE user_id = 1;
-- Oops, error occurred!
ROLLBACK; -- Undo all changes in this transaction
-- Real-world example: E-commerce order
BEGIN;
-- Insert order
INSERT INTO orders (user_id, total) VALUES (1, 99.99) RETURNING id;
-- Insert order items
INSERT INTO order_items (order_id, product_id, quantity)
VALUES (1, 5, 2);
-- Update inventory
UPDATE products SET stock = stock - 2 WHERE id = 5;
-- If all succeed
COMMIT;
-- Transaction properties (ACID):
-- A = Atomicity → All or nothing
-- C = Consistency → Data remains valid
-- I = Isolation → Transactions don't interfere
-- D = Durability → Changes are permanent
Constraints: Enforcing Data Rules
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
price DECIMAL(10, 2) CHECK (price > 0), -- Must be positive
stock INTEGER DEFAULT 0 CHECK (stock >= 0), -- Can't be negative
category VARCHAR(50) NOT NULL,
sku VARCHAR(50) UNIQUE NOT NULL, -- Must be unique
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Primary key (unique identifier)
PRIMARY KEY -- Unique and not null
-- Foreign key (relationship)
FOREIGN KEY -- Links to another table
-- NOT NULL (required field)
NOT NULL -- Must have a value
-- UNIQUE (no duplicates)
UNIQUE -- All values must be different
-- CHECK (custom validation)
CHECK (price > 0) -- Custom condition
-- DEFAULT (default value)
DEFAULT 0 -- Value if none provided
-- Add constraint to existing table
ALTER TABLE users
ADD CONSTRAINT check_email_format
CHECK (email LIKE '%@%.%');
Working with JSON in PostgreSQL
PostgreSQL's JSON support lets you combine relational structure with flexible schema.
-- Create table with JSON
CREATE TABLE user_preferences (
user_id INTEGER PRIMARY KEY REFERENCES users(id),
settings JSONB -- Use JSONB (not JSON) for better performance
);
-- Insert JSON data
INSERT INTO user_preferences (user_id, settings)
VALUES (1, '{"theme": "dark", "notifications": true, "language": "en"}');
-- Query JSON data
SELECT settings->>'theme' as theme FROM user_preferences WHERE user_id = 1;
-- Filter by JSON field
SELECT * FROM user_preferences
WHERE settings->>'theme' = 'dark';
-- Update JSON field
UPDATE user_preferences
SET settings = jsonb_set(settings, '{theme}', '"light"')
WHERE user_id = 1;
-- Check if JSON key exists
SELECT * FROM user_preferences
WHERE settings ? 'notifications';
-- Array in JSON
INSERT INTO user_preferences (user_id, settings)
VALUES (2, '{"tags": ["developer", "python", "postgres"]}');
-- Query JSON array
SELECT * FROM user_preferences
WHERE settings->'tags' @> '["python"]';
Connecting PostgreSQL with Python
# Using psycopg2 (raw SQL)
import psycopg2
# Connect to database
conn = psycopg2.connect(
dbname="myapp",
user="postgres",
password="password",
host="localhost",
port="5432"
)
cur = conn.cursor()
# Execute query
cur.execute("SELECT * FROM users WHERE username = %s", ('alice',))
users = cur.fetchall()
# Insert data
cur.execute(
"INSERT INTO users (username, email) VALUES (%s, %s)",
('bob', 'bob@example.com')
)
conn.commit()
# Close connection
cur.close()
conn.close()
# Using with Django (see Django article)
# Django automatically handles connections via settings.py
# Using with SQLAlchemy (see SQLAlchemy article)
from sqlalchemy import create_engine
engine = create_engine('postgresql://user:pass@localhost/mydb')
# Then use with SQLAlchemy ORM or raw SQL
Common Query Patterns
-- Find duplicates
SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
-- Top N per group
SELECT DISTINCT ON (category) category, name, price
FROM products
ORDER BY category, price DESC;
-- Running total
SELECT
created_at,
amount,
SUM(amount) OVER (ORDER BY created_at) as running_total
FROM orders;
-- Rank by score
SELECT
username,
score,
RANK() OVER (ORDER BY score DESC) as rank
FROM leaderboard;
-- Find records not in another table
SELECT * FROM users
WHERE id NOT IN (SELECT user_id FROM posts);
-- Or use LEFT JOIN
SELECT users.*
FROM users
LEFT JOIN posts ON users.id = posts.user_id
WHERE posts.id IS NULL;
-- Upsert (insert or update if exists)
INSERT INTO users (id, username, email)
VALUES (1, 'alice', 'alice@example.com')
ON CONFLICT (id)
DO UPDATE SET email = EXCLUDED.email;
Best Practices
- Use indexes wisely: Index columns used in WHERE, JOIN, ORDER BY
- Avoid SELECT *: Only select columns you need
- Use EXPLAIN: Analyze query performance with
EXPLAIN ANALYZE - Use transactions: For multi-step operations that must be atomic
- Parameterize queries: Prevent SQL injection, use placeholders
- Choose right data types: VARCHAR(50) not TEXT if length is limited
- Use foreign keys: Enforce referential integrity
- Regular backups: Use pg_dump for backups
- Connection pooling: Reuse database connections in production
- Normalize data: Avoid redundancy, but denormalize for performance when needed
Master PostgreSQL with Expert Mentorship
Our Full Stack Python program covers PostgreSQL from basics to advanced query optimization. Learn database design, indexing strategies, and integration with Django and SQLAlchemy.
Explore Full Stack Python Program