PostgreSQL Stopped Being 'Just SQL' a Long Time Ago

Mon, May 25, 2026 · 9 min read

Every few years someone publishes a blog post titled something like “PostgreSQL: The Everything Database” and the comments fill with people saying “well, obviously.” The thing is, it wasn’t obvious. In 2010, if you told a room full of engineers that the correct database for their document store, their geospatial queries, their full-text search, and their vector similarity lookups was the same 30-year-old relational database, they would have politely suggested you hadn’t used MongoDB yet.

Fifteen years later, that’s exactly what happened. PostgreSQL didn’t win by being the best at any single workload. It won by being good enough at all of them, inside one process, with one backup strategy, one replication topology, and one connection pool. This post traces how that happened, why the extension model is the key architectural advantage, and where the actual trade-offs still bite.

A timeline of becoming everything

PostgreSQL has been accumulating capabilities at a pace that’s easy to miss if you’re not tracking release notes:

1996 — PostgreSQL 6.0. The project renames from Postgres95. Already has user-defined types, functions, and operators — the seed of the extension model.
2001 — PostGIS 0.1. Refractions Research releases PostGIS, proving that you can bolt a full geospatial engine onto PostgreSQL’s type system. GIS teams start migrating off Oracle Spatial.
2003 — PostgreSQL 7.4. tsearch2 lands as a contrib module. Full-text search with ranking, stemming, and custom dictionaries inside the database. Not as polished as Solr, but good enough to eliminate a separate search service for many workloads.
2008 — PostgreSQL 8.3. Full-text search moves into core (tsvector, tsquery). GIN and GiST indexes make it fast.
2012 — PostgreSQL 9.2. JSON data type arrives. Crude by today’s standards — basically a validated text field — but it signals intent.
2014 — PostgreSQL 9.4. JSONB lands. This is the inflection point. Binary storage, GIN indexing, containment operators. Suddenly PostgreSQL can handle document-style workloads that people were shipping to MongoDB.
2016 — PostgreSQL 9.6. Parallel query execution. Analytics workloads on OLTP databases stop being completely unreasonable.
2017 — PostgreSQL 10. Logical replication, declarative partitioning. Table partitioning goes from “technically possible with inheritance hacks” to a first-class feature.
2020 — PostgreSQL 13. B-tree deduplication, incremental sorting. Steady performance work that compounds over releases.
2021 — pgvector 0.1.0. Andrew Kane releases pgvector. Vector similarity search inside PostgreSQL. The AI era gets a SQL interface.
2023 — PostgreSQL 16, pgvector 0.5.0. HNSW indexing arrives in pgvector. Performance goes from “prototype-grade” to “production-viable for most embedding workloads.”
2024 — PostgreSQL 17. Incremental backup, SQL/JSON standard functions, performance gains across the board. COPY gets pipeline mode.
2025 — PostgreSQL 18 (beta). Virtual generated columns, async I/O foundations (io_method), MAINTAIN privilege. The engine keeps getting faster without breaking the extension contract.

The pattern is consistent: the core team improves the engine and the index infrastructure, then extensions use those primitives to cover new workload types. Nobody had to fork PostgreSQL to add geospatial or vector search. They just wrote extensions.

JSONB and the end of the “SQL vs. NoSQL” argument

JSONB in PostgreSQL 9.4 (2014) didn’t just add a data type. It collapsed an entire architectural category.

Before JSONB, teams that needed flexible schemas had a real argument for MongoDB or CouchDB: relational databases forced you to define every column upfront, and schema migrations on large tables were painful. The document model genuinely solved a problem.

JSONB changed the calculus. You get a binary-stored, indexable JSON column that supports containment queries (@>), key-existence checks (?), and jsonpath expressions. You can index individual keys with GIN indexes, or use expression indexes on specific paths. A query like this just works:

SELECT * FROM events
WHERE payload @> '{"type": "purchase", "region": "eu-west-1"}'

And it’s fast — backed by a GIN index, not a sequential scan.

The practical effect: teams that previously ran PostgreSQL for their relational data and MongoDB for their event store or configuration documents started consolidating into PostgreSQL alone. One fewer database to back up, monitor, patch, and page about at 3 a.m.

This isn’t to say JSONB replaces a dedicated document database in every scenario. If your entire data model is documents with deep nesting and you never do joins, MongoDB’s query planner and sharding are purpose-built for that. But most teams don’t have that data model. Most teams have relational data with a few flexible-schema columns, and JSONB handles that perfectly.

pgvector and the AI workload that arrived on schedule

When the embedding wave hit in 2023, the default answer for “where do I store vectors?” was a purpose-built vector database — Pinecone, Weaviate, Qdrant, Milvus. Reasonable choices, but they meant adding another stateful service to the stack, with its own backup strategy, its own connection management, and its own learning curve.

pgvector offered a different proposition: store your embeddings in a column, index them with HNSW or IVFFlat, and query them with the same SELECT statements you already know. A basic RAG retrieval looks like:

SELECT content, embedding <=> $1 AS distance
FROM documents
ORDER BY embedding <=> $1
LIMIT 10

The <=> operator does cosine distance. The HNSW index makes it sublinear. The results come back through your existing connection pool, in the same transaction as your relational queries.

For the majority of RAG applications — where the corpus is under 10 million vectors and the recall requirements aren’t pushing the absolute frontier — pgvector is sufficient. You get vector search, metadata filtering (WHERE clauses on regular columns), and full ACID guarantees in one query. No separate ingestion pipeline, no eventual consistency between your relational data and your vector index.

The limitations are real: pgvector’s HNSW implementation doesn’t match the recall-at-scale of purpose-built engines doing quantization and distributed sharding across hundreds of millions of vectors. If you’re building a web-scale semantic search product, you probably need dedicated infrastructure. But if you’re building internal search, a support chatbot, or document retrieval for a product with a few million rows, pgvector avoids an entire category of operational complexity.

The extension model is the actual moat

The common thread through all of this — PostGIS, pgvector, full-text search, pg_stat_statements, TimescaleDB, Citus, pg_cron, pgaudit — is the PostgreSQL extension API. This is the architectural decision that separates PostgreSQL from every other open-source database.

Extensions can define new data types, operators, index access methods, and background workers. They run inside the database process with full access to the storage engine, the planner, and the executor. This isn’t a plugin system bolted on after the fact — it’s been part of the design since the Berkeley days.

The result is a composable platform. You can run PostGIS, pgvector, and pg_cron in the same database instance. They don’t conflict because they’re all using the same extension contract. And critically, they all benefit from the same core improvements: when PostgreSQL gets faster parallel scans, PostGIS queries get faster too. When WAL compression improves, pgvector indexes replicate more efficiently.

MySQL has a plugin system. MariaDB extended it. Neither comes close to the depth of the PostgreSQL extension API, which is why the MySQL ecosystem never developed equivalents to PostGIS or pgvector that run inside the engine with the same level of integration.

Operational maturity nobody talks about

PostgreSQL’s workload story gets the blog posts, but the operational story is what makes it viable for teams that have to keep things running:

Replication. Streaming replication (synchronous and asynchronous) has been stable for over a decade. Logical replication (PostgreSQL 10+) enables zero-downtime major version upgrades and selective table-level replication. Tools like pg_basebackup, Barman, and pgBackRest give you point-in-time recovery that actually works.

High availability. Patroni (from Zalando, production-tested at scale) handles automatic failover with etcd or Consul as the consensus store. It’s well-documented, well-maintained, and the standard answer for self-managed PostgreSQL HA.

Cloud managed services. AWS RDS and Aurora PostgreSQL, Google Cloud SQL and AlloyDB, Azure Database for PostgreSQL, Neon (serverless with branching), Supabase (PostgreSQL-as-a-platform), Tembo (extension-focused). The managed PostgreSQL market is deep and competitive, which benefits everyone.

Observability. pg_stat_statements gives you query-level performance data. auto_explain logs slow query plans. The pg_stat_* views cover everything from buffer cache hit ratios to replication lag. No proprietary tooling required.

Why MySQL lost the default-choice throne

This isn’t about MySQL being bad. MySQL 8.x and 9.x are competent databases. InnoDB is solid. The performance is good. Group Replication works. But several structural factors shifted mindshare:

The Oracle acquisition (2010). Sun’s acquisition by Oracle created uncertainty about MySQL’s future direction. The MariaDB fork fragmented the community and the ecosystem. Enterprise features started appearing in MySQL Enterprise Edition rather than the community release.

Feature gap accumulation. While PostgreSQL was shipping JSONB, CTEs (Common Table Expressions — which MySQL didn’t get until 8.0 in 2018), window functions, and a rich extension system, MySQL was catching up rather than leading. The gap wasn’t any single feature — it was the cumulative weight of PostgreSQL being first to ship capabilities that modern applications needed.

The extension asymmetry. MySQL’s plugin architecture couldn’t support the kind of deep integration that PostGIS and pgvector achieve in PostgreSQL. When new workloads emerged (geospatial, vector search), PostgreSQL had answers and MySQL didn’t — or had weaker ones.

Cloud provider investment. AWS, Google, and Microsoft all invested heavily in PostgreSQL-compatible managed services (Aurora PostgreSQL, AlloyDB, Azure Flexible Server). These products often got new features and optimization attention ahead of their MySQL counterparts.

The result isn’t that MySQL disappeared. It’s still everywhere — WordPress, legacy enterprise systems, plenty of SaaS products. But when a new team starts a new project in 2026 and asks “which database?”, the default answer has shifted to PostgreSQL in a way that would have been surprising fifteen years ago.

When to choose PostgreSQL (and when not to)

Choose PostgreSQL when:

You need relational data with some document-style flexibility (JSONB).
You want vector search without a separate database (pgvector).
You need geospatial queries (PostGIS is unmatched in open source).
You want one database to back up, monitor, and operate instead of three.
Your team already knows SQL and doesn’t want a new query language.

Think twice when:

You need sub-millisecond key-value lookups at millions of ops/sec. Redis or DynamoDB are better tools for that job.
Your vector corpus exceeds tens of millions of high-dimensional embeddings and you need frontier-grade recall. Dedicated vector databases have more room to optimize.
You need globally distributed, multi-region active-active writes. CockroachDB, Spanner, or YugabyteDB are designed for that; PostgreSQL isn’t, even with Citus.
Your workload is pure time-series at massive ingest rates. TimescaleDB helps, but ClickHouse or InfluxDB may be more appropriate at the extreme end.

The consolidation thesis

The trend line is clear: PostgreSQL absorbs workloads that used to require separate systems. Not because it’s the best at each one, but because “good enough at five things in one place” beats “best-in-class at five things in five places” for most teams, most of the time.

Every database you remove from the stack is a backup you don’t have to verify, a failover you don’t have to test, a connection pool you don’t have to tune, and a 3 a.m. page you don’t have to answer. PostgreSQL’s real value proposition in 2026 isn’t any single feature. It’s the operational cost it eliminates by being the one database that credibly handles relational, document, search, geospatial, and vector workloads under one roof.

That’s not “just SQL.” That’s a platform.