Database Schema Design for LLM Applications

Most teams building LLM applications start with a flat table, a JSON blob, or — worse — no persistent storage at all. That works until the first time a user says "you told me something different yesterday" and your system has no record of the conversation. The moment you add retrieval-augmented generation, multi-turn chat, A/B-tested prompts, and human feedback collection to the same product, an ad-hoc schema becomes a maintenance crisis. Getting the relational foundation right before you scale saves weeks of painful migrations. This post walks through every table you need, explains the trade-offs, and shows production-ready DDL you can drop straight into PostgreSQL.

TL;DR

Model conversations and messages as separate entities with a role ENUM and per-message token counts.
Store embedding metadata in a dedicated table that records model name, vector dimensions, and chunk provenance — not just the vectors themselves.
Version every prompt template in a prompt_versions table so you can reproduce any response in your history.
Capture thumbs-up/down and free-text feedback in a structured feedback table to build a labelled RLHF dataset over time.
Use JSONB liberally for application-specific metadata so new fields never require a migration.

Core Entities in an LLM Application

Before writing a single CREATE TABLE, map the domain. A production LLM application typically has six distinct conceptual entities: users, conversations, messages, documents and their embeddings, prompt templates, and human feedback. Each one deserves its own table, not a column in a monolithic "ai_events" log.

The relationship graph is straightforward: one user has many conversations; one conversation has many messages; each message may reference one or more embeddings that were retrieved before the LLM call; each message was generated using a specific prompt version; and one or more feedback records can be attached to any message. Keeping these normalized means you can answer operational questions — "which prompt version drives the best thumbs-up rate?" — with a simple JOIN rather than a full-table JSON scan.

Users already exist in most applications, so the schema below assumes a users table with a UUID primary key is handled by your auth layer. Every table here uses UUID DEFAULT gen_random_uuid() as the primary key for safe distributed inserts and forwards-compatible replication.

Tip

Enable the pgcrypto or pg_uuidv7 extension in PostgreSQL so that UUID primary keys sort chronologically. Monotonic UUIDs eliminate random B-tree page splits on high-write tables like messages.

Conversation and Message Storage

Conversations are the top-level session container. A conversation belongs to a user, carries a title, and accumulates total token usage over its lifetime. Denormalizing total_tokens at the conversation level avoids a costly aggregate query every time you render a usage dashboard.

sql

CREATE TABLE conversations (
    id              UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id         UUID        NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    title           TEXT,
    total_tokens    INTEGER     NOT NULL DEFAULT 0,
    metadata        JSONB       NOT NULL DEFAULT '{}',
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_conversations_user_id ON conversations(user_id);
CREATE INDEX idx_conversations_created_at ON conversations(created_at DESC);

Messages are the atomic unit of every interaction. The role column captures who produced the content: the human user, the assistant, or a system prompt injected by your application. Using a native ENUM rather than a free-text column enforces the OpenAI/Anthropic message format at the database layer, preventing rogue values that break SDK parsing.

sql

CREATE TYPE message_role AS ENUM ('system', 'user', 'assistant', 'tool');

CREATE TABLE messages (
    id                  UUID            PRIMARY KEY DEFAULT gen_random_uuid(),
    conversation_id     UUID            NOT NULL REFERENCES conversations(id) ON DELETE CASCADE,
    role                message_role    NOT NULL,
    content             TEXT            NOT NULL,
    token_count         INTEGER,
    prompt_version_id   UUID            REFERENCES prompt_versions(id),
    model_name          TEXT,
    finish_reason       TEXT,
    latency_ms          INTEGER,
    metadata            JSONB           NOT NULL DEFAULT '{}',
    created_at          TIMESTAMPTZ     NOT NULL DEFAULT now()
);

CREATE INDEX idx_messages_conversation_id ON messages(conversation_id, created_at);
CREATE INDEX idx_messages_prompt_version  ON messages(prompt_version_id) WHERE prompt_version_id IS NOT NULL;

Storing token_count per message rather than only at the conversation level lets you run per-turn cost analysis and detect anomalously long responses — a practical early-warning signal for prompt injection attempts. The finish_reason field mirrors the model API response (stop, length, content_filter) and is essential for diagnosing truncated answers in production. latency_ms rounds out the observability picture by recording end-to-end response time at the row level, so a slow query on messages filtered by model_name immediately surfaces which model is underperforming.

Warning

Never store raw API keys, bearer tokens, or PII inside the metadata JSONB column. The flexible schema is convenient, but these columns are frequently exported to data warehouses or read replicas that have weaker access controls than your primary OLTP database.

Embedding Metadata Table

Vector databases (pgvector, Pinecone, Weaviate) handle the actual similarity search, but the metadata that describes each embedding lives most naturally in PostgreSQL alongside the rest of your relational data. A dedicated embeddings table records where each vector came from, which model produced it, and the exact text chunk it represents.

sql

CREATE TABLE embeddings (
    id              UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    source_type     TEXT        NOT NULL,          -- 'document', 'message', 'faq_entry'
    source_id       UUID        NOT NULL,
    chunk_index     INTEGER     NOT NULL DEFAULT 0,
    chunk_text      TEXT        NOT NULL,
    model_name      TEXT        NOT NULL,          -- e.g. 'text-embedding-3-small'
    dimensions      INTEGER     NOT NULL,          -- e.g. 1536
    vector_store_id TEXT,                          -- external ID in pgvector / Pinecone
    token_count     INTEGER,
    metadata        JSONB       NOT NULL DEFAULT '{}',
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE (source_type, source_id, chunk_index, model_name)
);

CREATE INDEX idx_embeddings_source ON embeddings(source_type, source_id);
CREATE INDEX idx_embeddings_model  ON embeddings(model_name);

The dimensions column is not redundant — different embedding models produce different vector sizes (1536 for text-embedding-3-small, 3072 for text-embedding-3-large), and mixing them in the same vector index produces silently wrong similarity scores. Recording dimensions here makes it trivial to write a nightly validation query that catches accidental model switches. The UNIQUE constraint on (source_type, source_id, chunk_index, model_name) guarantees idempotent re-embedding: running the ingestion pipeline twice will conflict-and-skip rather than duplicate rows.

Prompt Version Management

Prompt engineering is software development. Every change to a system prompt is a deployment, and deployments need version control. Storing prompt templates in code alone — a common shortcut — means you cannot join a historical message row back to the exact template that produced it. A prompt_versions table closes that gap.

sql

CREATE TABLE prompt_versions (
    id              UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    name            TEXT        NOT NULL,          -- human-readable name, e.g. 'support-agent-v2'
    version         INTEGER     NOT NULL,
    template        TEXT        NOT NULL,          -- the full prompt template with {{variable}} slots
    variables       JSONB       NOT NULL DEFAULT '[]', -- JSON array of expected variable names
    model_name      TEXT,                          -- model this template was tuned for
    is_active       BOOLEAN     NOT NULL DEFAULT FALSE,
    notes           TEXT,
    metadata        JSONB       NOT NULL DEFAULT '{}',
    created_by      UUID        REFERENCES users(id),
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE (name, version)
);

CREATE INDEX idx_prompt_versions_name_active ON prompt_versions(name, is_active) WHERE is_active = TRUE;

The partial index on (name, is_active) WHERE is_active = TRUE makes it extremely fast to fetch the current live template for a given prompt name — the hot path every time a user sends a message. The variables JSONB array documents which template slots exist, enabling a lightweight validation layer in your application before the LLM call. Set is_active = FALSE on old versions rather than deleting them; those rows are the audit trail that makes reproducibility possible months later.

Tip

Add a database trigger that enforces only one active version per name at a time. This prevents the race condition where two deployments briefly mark different versions as active simultaneously, causing non-deterministic prompt selection under load.

Feedback and Evaluation Storage

Human feedback is the raw material of RLHF (Reinforcement Learning from Human Feedback). Even if your team is not yet fine-tuning models, collecting structured feedback from day one means you have a labelled dataset ready the moment you need it. The schema below captures binary thumbs-up/down ratings, free-text explanations, and optional categorical tags — enough to support both automatic metrics dashboards and export to a fine-tuning pipeline.

sql

CREATE TYPE feedback_rating AS ENUM ('thumbs_up', 'thumbs_down', 'neutral');

CREATE TABLE feedback (
    id              UUID            PRIMARY KEY DEFAULT gen_random_uuid(),
    message_id      UUID            NOT NULL REFERENCES messages(id) ON DELETE CASCADE,
    user_id         UUID            REFERENCES users(id) ON DELETE SET NULL,
    rating          feedback_rating NOT NULL,
    comment         TEXT,
    categories      TEXT[]          DEFAULT '{}',  -- e.g. ARRAY['hallucination','off-topic']
    correction      TEXT,                          -- what the ideal response would have been
    metadata        JSONB           NOT NULL DEFAULT '{}',
    created_at      TIMESTAMPTZ     NOT NULL DEFAULT now()
);

CREATE INDEX idx_feedback_message_id ON feedback(message_id);
CREATE INDEX idx_feedback_rating     ON feedback(rating);
CREATE INDEX idx_feedback_created_at ON feedback(created_at DESC);

The correction column is the most valuable field for RLHF dataset construction. When a user rates a response as thumbs-down and provides their preferred answer, that (prompt, bad_response, good_response) triple is a ready-made training example. The categories text array enables structured tagging without requiring a join to a separate taxonomy table — appropriate here because the tag vocabulary tends to be small and changes infrequently. For larger deployments, promote categories to a proper many-to-many relationship with a feedback_categories reference table.

To export a labelled dataset for fine-tuning, the following query collects all thumbs-down messages alongside their corrections and the prompt version used:

sql

SELECT
    m.content                AS prompt_input,
    m.metadata->>'context'   AS retrieved_context,
    pv.template              AS system_prompt,
    m.content                AS bad_completion,
    f.correction             AS ideal_completion,
    f.categories,
    f.created_at
FROM feedback f
JOIN messages m      ON m.id = f.message_id
LEFT JOIN prompt_versions pv ON pv.id = m.prompt_version_id
WHERE f.rating = 'thumbs_down'
  AND f.correction IS NOT NULL
ORDER BY f.created_at DESC;

Warning

Before exporting feedback data to any external fine-tuning service, apply your data retention and privacy policy. A user's free-text correction in the correction column may contain personally identifiable information that must be redacted or pseudonymized under GDPR or CCPA before it leaves your PostgreSQL instance.

Key Takeaways

Normalize LLM application data into six tables — users, conversations, messages, embeddings, prompt_versions, and feedback — rather than logging everything into a single events table.
Use a message_role ENUM to enforce the system/user/assistant/tool taxonomy at the database layer; store token_count and latency_ms per message for cost and performance observability.
The embeddings table records model name and vector dimensions alongside chunk provenance, making it safe to run multiple embedding models in parallel without cross-contaminating vector indexes.
Treat prompt templates as versioned, immutable rows; use a partial index on the active version for fast hot-path lookups and a database trigger to prevent dual-active race conditions.
The feedback table's correction column produces (bad, good) training pairs automatically as users interact with the product — building your RLHF dataset at zero incremental engineering cost.
JSONB metadata columns on every table absorb application-specific fields gracefully, deferring schema migrations until a field is truly load-bearing and worth indexing.

Working with JusDB on AI Application Databases

Designing a schema that keeps pace with a fast-moving LLM product requires more than good SQL — it requires experienced judgment about when to normalize further, which JSONB fields to promote to columns, how to partition the messages table as it crosses hundreds of millions of rows, and how to tune autovacuum for the write-heavy feedback pipeline. JusDB specializes in exactly these decisions, working with engineering teams from initial schema design through production-scale optimization.

Explore JusDB PostgreSQL Consulting → | Talk to a DBA

Database Schema Design for LLM Applications

Core Entities in an LLM Application

Conversation and Message Storage

Embedding Metadata Table

Prompt Version Management

Feedback and Evaluation Storage

Working with JusDB on AI Application Databases

Share this article

Related Articles

Azure SQL Managed Instance vs SQL Server on Azure VM vs Azure SQL Database

Ola Hallengren's SQL Server Maintenance Solution: Production Setup Guide

SQL Server Wait Stats: A Diagnostic Playbook for Slow Queries