Rethinking data in the AI age | AnythingGraph Blog

For decades, “working with data” meant picking a storage engine and shaping your problem to fit it. Rows in Postgres. Documents in MongoDB. Edges in Neo4j. Parquet files in a lake. Each generation solved real problems — and each left a new silo on the org chart.

AI agents do not care about your storage taxonomy. They care about questions: Who owns this account? Can this employee see that payroll row? Which MongoDB customer maps to which CRM account? Those questions cut across systems, schemas, and naming conventions that were never designed to meet in one query.

The AI age did not make databases obsolete. It made “one more database” the wrong default answer.

1. The SQL era — tables, keys, and joins

Relational databases gave the enterprise a shared grammar: tables of rows, columns with types, primary keys that identify a row, and foreign keys that point at other rows.

Consider a minimal CRM in Postgres:

users          accounts
─────────      ─────────────────────────────
user_id (PK)   account_name
full_name      industry
               owner_user_id  →  users.user_id

To answer “which accounts does Alex own?”, you join the tables on the foreign key:

SELECT account_name, industry
FROM accounts
JOIN users ON accounts.owner_user_id = users.user_id
WHERE users.full_name = 'Alex Anderson';

SQL joins are elegant because they assume a single coherent schema inside one database. The optimizer knows the types. Transactions keep rows consistent. DBAs guard the boundary.

What SQL is great at

Structured operational data with clear relationships inside one system
ACID guarantees for finance, inventory, and identity
Ad hoc analytics when an engineer writes the query

Where the model breaks down for AI

Cross-system joins do not exist. Your CRM is in Postgres; customer profiles are in MongoDB; payroll is a CSV export. There is no JOIN across connection strings.
Agents do not speak SQL. They speak intent — and need a governed vocabulary, not raw table names scattered in a prompt.
Business keys are not always keys. A MongoDB field company: "Northwind Traders" matches Postgres account_name by string equality, not by foreign key. That relationship is real — but invisible to the schema.

2. NoSQL, graph databases, and lakes — each solves a different problem

The last fifteen years added specialized stores. None of them removed silos; they optimized for different workloads.

NoSQL (e.g. MongoDB, DynamoDB)

Problem solved: flexible documents, horizontal scale, fast product iteration without rigid migrations.
Trade-off: relationships are application-level. You denormalize, embed, or join in code. Great for user profiles and event streams — not for federated governance across CRM + HR + finance.

Graph databases (e.g. Neo4j)

Problem solved: when the graph is the product — fraud rings, knowledge graphs, recommendation paths native to nodes and edges.
Trade-off: you still load data into the graph. Operational Postgres and Salesforce do not live there by default. Another copy, another sync job, another drift surface.

Data lakes & warehouses (e.g. Snowflake, S3 + Spark)

Problem solved: batch analytics, historical trends, ML feature stores, petabyte scans.
Trade-off: built for scheduled pipelines, not sub-second agent Q&A on operational truth. Freshness is measured in hours or days. Compliance scope expands with every landing zone.

Store type	Optimized for	Typical AI gap
SQL (RDBMS)	Transactional rows, in-database joins	One system at a time; agents need cross-source joins
NoSQL	Flexible documents, scale-out writes	No shared schema with the rest of the stack
Graph DB	Native relationship traversal	Requires ingesting copies from source systems
Lake / warehouse	Analytics & history at scale	Stale for live agent questions; expensive to feed prompts

Common thread: every option above is still a place to put data. The AI age adds a workload none of them were built for — natural-language questions across live silos, with access rules and proof — asked thousands of times a day.

3. The AI age — a reasoning layer, not another database

When an agent asks “show me customer john’s account and who owns it,” the hard part is not storage. The rows already exist — in MongoDB and Postgres. The hard parts are:

Meaning — map “customer”, “account”, and “owns” to the right tables and fields
Federation — execute steps against each source without copying data
Access — enforce who may see which rows, every time
Proof — show which records and paths produced the answer

That is not a fifth database. It is a reasoning layer — software that sits between agents and your existing adapters, compiles intent into read-only plans, and returns answers from live systems.

AnythingGraph / Anything CLI — stores no operational data

Anything CLI is deliberately not a database in the traditional sense. It persists:

Playbooks — business entities, relationships, and access rules (your ontology)
Bindings — how each entity maps to Postgres tables, MongoDB collections, CSV files, Salesforce objects
Proof envelopes — audit trails of which queries ran, not a cache of rows

Customer records stay in MongoDB. Account rows stay in Postgres. The reasoning layer does not ETL them together — it queries in place at answer time, through MCP tools like introspect_source, sample_source, and query_graph.

Anything CLI as a thin reasoning layer between AI agents and enterprise data sources

A concrete cross-source example

In a real demo playbook (crm_customer_relationship), a MongoDB customer document has company: "Northwind Traders". Postgres has an account with the same account_name. There is no shared UUID — only a business string.

The playbook declares:

Entity customer (MongoDB) linked to entity crm_account (Postgres)
Relationship belongs_to_account on matching account name
Relationship owns_account from crm_user to crm_account inside Postgres

One agent question traverses both sources — Mongo resolve, Postgres join — with SQL and Mongo queries generated from bindings, not pasted into the prompt. See the cross-source CRM walkthrough for the full MCP session.

Think of it this way: SQL gave engineers a language for rows. Graph DBs gave graph specialists a language for edges. Playbooks give agents a language for your business — while data stays where it already lives.

Database vs. reasoning layer

	Traditional database / lake	AnythingGraph reasoning layer
Stores	Rows, documents, files	Playbooks, bindings, proof metadata
Join model	Within one engine (SQL JOIN, graph traverse)	Federated plans across adapters at query time
Freshness	Depends on ETL schedule or replication lag	Live reads from source systems
Consumer	Engineers writing queries	Agents (via MCP) asking in business terms
Governance	DB permissions per store	ReBAC / playbook access on every graph query

The bottom line

We still need Postgres for transactions. We still need MongoDB for flexible documents. We still need lakes for history. None of that goes away.

What changes in the AI age is the question layer. Agents do not want another copy of your CRM — they want a governed way to reason over the copies you already have, in real time, with evidence.

That is not a new kind of disk. It is a new kind of middleware: Reasoning as Code — playbooks and bindings you version, review, and ship like any other critical infrastructure.

Stop asking which database wins. Start asking what layer lets AI use all of them safely.

1. The SQL era — tables, keys, and joins

What SQL is great at

Where the model breaks down for AI

2. NoSQL, graph databases, and lakes — each solves a different problem

NoSQL (e.g. MongoDB, DynamoDB)

Graph databases (e.g. Neo4j)

Data lakes & warehouses (e.g. Snowflake, S3 + Spark)

3. The AI age — a reasoning layer, not another database

AnythingGraph / Anything CLI — stores no operational data

A concrete cross-source example

Database vs. reasoning layer

The bottom line

Try the reasoning layer