An AI agent's reply lands in the wrong place — a new conversation thread instead of the existing one — and the recipient never sees it. The recipient's filters look for the original thread, not a new one. The agent keeps sending. The conversation goes silent. The agent does not notice. This is the most common threading bug we see in production agent inboxes, and it is always caused by the same thing: a missing or malformed In-Reply-To header. Threading is a forty-five-year-old protocol with two and a half headers worth of state, and getting it right is the difference between an agent that holds a real conversation and an agent that talks past every reply.
This piece is the field guide. What the three threading headers actually do, how Gmail and Outlook use them to reconstruct conversations, the edge cases that break threading in practice, and the code an agent needs to send a correctly threaded reply. We'll cover where AgentMail and Dead Simple Email differ on threading defaults, and what to do when you need to thread across a forwarded message, a quoted reply, or a re-subject.
The Three Threading Headers
Email threading is defined in RFC 5322 Section 3.6.4. Three headers do the work.
Message-ID — the unique identifier
Every message has a globally unique Message-ID. The format is <local-part@domain>, where the local part is anything that does not collide with other messages and the domain is the sending domain. Mail clients generate this automatically; agents that build raw messages need to generate one themselves. A missing Message-ID is the single most common cause of "Gmail dropped my reply into the wrong place" — without an ID, no other client can reference your message.
Message-ID: <1a2b3c4d5e6f7g8h@agents.yourdomain.com>
In-Reply-To — what this reply is replying to
In-Reply-To holds exactly one Message-ID — the message being directly replied to. Mail clients use it to decide which existing thread a new message belongs to. An agent generating a reply must copy the parent's Message-ID into In-Reply-To verbatim, including the angle brackets.
Message-ID: <9z8y7x6w5v4u@agents.yourdomain.com> In-Reply-To: <1a2b3c4d5e6f7g8h@agents.yourdomain.com>
References — the full ancestor chain
References holds every Message-ID in the conversation, in order, space-separated. The first ID is the root of the thread; the last ID is the immediate parent. When an agent generates a reply, it copies the parent's References chain forward and appends the parent's Message-ID. References is what well-behaved clients prefer for thread reconstruction because it survives gaps — if a message in the middle of the chain is deleted, the thread still resolves.
References: <root@example.com> <reply1@example.com> <reply2@agents.yourdomain.com>
That is the entire threading specification. Three headers, one tree.
How Gmail and Outlook Actually Reconstruct Threads
Different mail clients prefer different headers when reconstructing a thread. Knowing which client uses what helps you debug "why does this thread look broken in Gmail but fine in Outlook?"
| Client | Primary signal | Fallback | Subject heuristic |
|---|---|---|---|
| Gmail | References + In-Reply-To | Subject + participants | Yes (within 7 days) |
| Outlook / Microsoft 365 | Conversation-ID (proprietary) | References + Subject | Yes |
| Apple Mail | References | In-Reply-To | Rarely |
| Thunderbird | References (Jamie Zawinski's algorithm) | In-Reply-To | No |
| Dovecot IMAP | References (server-side THREAD) | In-Reply-To | Opt-in (REFS algorithm) |
Two practical lessons fall out of this table. First, References is the universal currency — every client uses it as either the primary or fallback signal. If your agent sets References correctly, threading works everywhere. Second, Gmail's subject-line fallback is a seven-day window. Replies that arrive more than a week after the previous message in the thread will not be auto-grouped on subject alone — they need the headers. For long-running agents that wait days or weeks between replies, headers are not optional.
The Five Threading Bugs We See Most Often
These are the patterns that show up when an agent's reply lands as a new conversation. None are mysterious; all are easy to fix once you know what to look for.
Bug 1: Missing In-Reply-To on agent-generated replies
An agent builds an outbound message with to, subject, and body but does not copy the parent's Message-ID into In-Reply-To. The reply shows up as a new conversation in every client. The fix: always set In-Reply-To. The Dead Simple Email reply endpoint sets it automatically — see the code example below.
Bug 2: Missing References chain
The agent sets In-Reply-To but not References. The reply threads correctly for one hop but breaks if the recipient forwards to someone else who replies — Outlook and Apple Mail need the full chain. The fix: copy the parent's References header verbatim, then append the parent's Message-ID.
Bug 3: Rewritten subject lines
The agent changes "Re: Pricing question" to "Following up on pricing" and Gmail's subject heuristic does not match. If the threading headers are also missing or malformed, Gmail starts a new thread. The fix: either preserve the subject (prefix with "Re:" if it is not already prefixed) or set the headers correctly so the subject does not matter.
Bug 4: Generated Message-ID domains that do not match the sender
Spam filters and some clients reject or downgrade messages whose Message-ID domain does not match the From domain. An agent that generates IDs like <abc@localhost> while sending as agent@yourdomain.com looks like a misconfigured bot. The fix: always use your sending domain (or a subdomain of it) in Message-ID.
Bug 5: Quoting strips threading headers
Some agent prompts ask the model to "compose a reply" without explicitly preserving the parent message metadata. The model emits clean prose that an SDK then sends as a new message rather than a reply. The fix: keep the threading logic out of the model. The model decides what to say; the SDK decides how to send it. The Dead Simple Email reply() method takes a message_id and a body — the headers are not the model's concern.
What a Correctly Threaded Reply Looks Like (in Code)
Here is what we recommend agents do — same pattern across Python and Node, agent framework agnostic.
from deadsimple import DeadSimple dse = DeadSimple(api_key=os.getenv("DSE_API_KEY")) # Get the incoming message — including its Message-ID and Reference chain incoming = dse.messages.get(inbox_id, message_id) # Reply — threading headers are set automatically: # In-Reply-To: <incoming.message_id> # References: <incoming.references> <incoming.message_id> # Subject: Re: <incoming.subject> (if not already prefixed) dse.messages.reply( inbox_id=incoming.inbox_id, message_id=incoming.id, body="Thanks for the question — pricing details below...", )
The same pattern in TypeScript:
import { DeadSimple } from "deadsimple-email"; const dse = new DeadSimple({ apiKey: process.env.DSE_API_KEY }); const incoming = await dse.messages.get({ inboxId, messageId }); await dse.messages.reply({ inboxId: incoming.inboxId, messageId: incoming.id, body: "Thanks for the question — pricing details below...", });
The agent never touches a header. The SDK reads the incoming Message-ID and References, builds the reply with the correct chain, prefixes the subject with "Re:" if needed, and hands it to the MTA. Every reply your agent sends is threaded correctly by default. If you want to inspect or override the threading metadata — for unit tests, audit logs, or unusual cases like cross-thread merges — every Dead Simple Email message exposes the raw messageId, inReplyTo, and references fields. AgentMail provides a similar reply endpoint that handles threading automatically, but does not expose the raw headers on the message object — debugging cross-client thread display issues against an opaque Thread abstraction is harder than against the protocol itself.
Threading Across Forwards, Quoted Replies, and Re-Subjects
Three real-world edge cases come up in agent inboxes more often than the docs suggest. Here is how each behaves.
Forwarded messages
When someone forwards an agent's message to a third party who replies, the question is whether the agent's original thread should pick up that reply. Most clients (Outlook, Apple Mail, Thunderbird) preserve the original References chain on forwards. Gmail's web client strips References on user-initiated forwards but preserves it on filter-driven auto-forwards. The agent should always check References on incoming replies — if the chain includes a Message-ID the agent sent, treat the new message as part of the original thread regardless of what the recipient address looks like.
Quoted replies in body text
Most clients quote the parent message's body inline ("On Tuesday, you wrote..."). Agents that include the quoted body verbatim in their reply are fine — clients are accustomed to it and threading is unaffected. The only failure mode is when an agent quotes a different message than the one whose Message-ID it referenced in In-Reply-To. Visual quote and threading metadata should agree.
Subject re-writes
Agents often want to summarize a long thread by rewriting the subject ("Re: Re: Re: Pricing" becomes "Pricing decision needed"). Doing this is fine as long as the threading headers are correct — Gmail and Outlook will display the new subject but keep the message in the existing thread. Do not, however, rewrite the subject and drop the threading headers at the same time — that is bug 3 above, and it produces an orphaned new conversation.
Inbox-Level Thread APIs vs Raw Headers
Most agent-email vendors expose a higher-level Thread object that groups related messages so the agent can reason over a conversation without parsing headers itself. Dead Simple Email's thread API has the same shape — list_threads, read_thread, reply_to_thread — and resolves thread membership using the References chain plus subject heuristic, matching the algorithm Gmail uses for IMAP X-GM-THRID. AgentMail's approach is similar; both vendors hide most of the threading complexity behind the abstraction.
Where the abstractions differ is debuggability. When a thread breaks — when an agent's reply does not group with the parent — you need to be able to see the headers to fix it. Dead Simple Email exposes messageId, inReplyTo, and references as first-class fields on every message. AgentMail's Message object exposes thread membership but not the underlying headers; debugging a broken thread means reproducing the SMTP envelope by hand. For a deeper dive on how email-as-memory leverages this header transparency for long-running agents, see Email as Memory: Long-Running AI Agents in 2026.
Frequently Asked Questions
What headers control email threading?
Three RFC 5322 headers: Message-ID (globally unique per message), In-Reply-To (parent's Message-ID), and References (full ancestor chain). Together they form a tree every modern mail client uses to group replies into conversations.
Why does my AI agent's reply show up as a new conversation in Gmail?
Most commonly a missing or malformed In-Reply-To header. The second-most-common cause is rewriting the subject without setting headers — Gmail's subject-line fallback only works within 7 days, and only when the subject still matches. Dead Simple Email's reply endpoint sets both headers automatically.
What's the difference between In-Reply-To and References?
In-Reply-To holds exactly one Message-ID — the parent. References holds the full ancestor chain back to the root. Clients prefer References for thread reconstruction because it survives gaps.
Does AgentMail handle threading automatically?
Yes, AgentMail's reply API sets headers automatically and groups messages into Thread objects. Dead Simple Email does the same and additionally exposes the raw messageId, inReplyTo, and references on every message for debugging.
Can an agent reply to a forwarded message and stay in the original thread?
Yes, if the forwarding client preserved References (most do — Outlook, Apple Mail, Thunderbird). Gmail strips it on user-driven forwards but preserves it on auto-forwards. As long as one ancestor Message-ID is preserved, threading resolves.
The Bottom Line
Email threading is a forty-five-year-old protocol with two and a half headers worth of state. AI agents that get the headers right hold real conversations — the recipient sees a normal thread, replies are grouped, the agent's memory of the conversation matches the recipient's view of it. Agents that get the headers wrong send replies into the void. Most agent-email SDKs (ours included) handle the threading automatically when you use the reply() method. The bugs we see in production come from code paths that bypass the SDK and assemble raw messages without copying the threading metadata forward.
The shorter version: do not let your agent build SMTP envelopes by hand. Use reply(), pass the parent's messageId, let the SDK set the headers. Start free on Dead Simple Email and the threading is one less thing your agent gets wrong in production.