Session 2.7: Make Claude sound like you, and not lie to you

About 75 minutes. This is the longest session so far — two practice tasks instead of one, because the two AI fault lines (voice and hallucination) deserve direct treatment.

Open a Claude Code session in ~/ai-training and hand it this guide:

Read the file at /Users/<you>/ai-training/week-7-guide.md (or wherever you saved it) and walk me through Session 2.7.
I've completed Sessions 2.1 through 2.6.

Posture: public, synthetic, or personal data only. The voice samples you provide are your own writing — that’s personal, not internal-work. If your only writing samples are work-product on a work account, pick something else (a personal blog, your old college papers, a draft you’d be comfortable forwarding to a stranger).

Practice task

By the end of this session you will have, in ~/ai-training/:

voice-profile.md — a structured description of your own writing voice, built from 5–10 of your own writing samples.
.claude/commands/voice-check.md — a slash command that takes a draft, reads it against your voice profile, and flags every drift.
.claude/commands/fact-check-debate.md — a slash command that runs two parallel fact-check sub-agents on a draft with explicit instructions to disagree, and returns a confidence-graded version of the text.
One real text (a memo from 2.3, a tweet, a blog draft, anything) that has been put through both /voice-check and /fact-check-debate, with the diffs visible.

Two slash commands, one voice profile, one before/after text. By the end of the session you should be running both checks reflexively on anything Claude drafts for you.

A production version of this is a voice-critic skill plus a parallel fact-checker pattern in a writing pipeline. The voice-critic skill is conditioned on samples from the writer’s actual past work — emails, memos, op-eds — and runs as a discrete pass after fact-checking on every memo, every tweet, every op-ed. A daily NBER-scan pipeline runs two fact-check agents in parallel on every drafted thread (not one sequentially); disagreement between them is the signal a claim needs human review. Threads where both agree go to pipeline/02-fact-checked/; threads where they disagree get held.

The two slash commands you build today are the two halves of the same defense. Voice catches “this doesn’t sound like me.” Fact-check-debate catches “this is plausible but might be wrong.” Both fail silently when not used; both are easy to skip.

Why this session, and why now

AI has two characteristic failure modes. It hallucinates — produces confident, plausible, wrong claims. And it has its own voice — fluent, generic, recognizably-not-you. Both fail silently. Neither shows up as an error message.

The defenses are technique-level skills, not prompt-engineering tricks:

Voice profile + voice-check loop for the voice problem.
Dueling fact-checkers with explicit disagreement for the hallucination problem.

Both are built today as reusable slash commands so you can run them reflexively, not optionally. The cost of running them is one slash command. The cost of not running them is shipping AI-voiced text or factually-wrong claims, both of which leave a credibility tax that’s hard to recover from.

This session is mid-curriculum, not earlier, because the voice-check loop needs writing samples. By 2.7 you have memos (2.3), figure captions (2.4), the journal of memory entries (2.5), and the corpus report (2.6) — at least four genres of your own writing on disk. Hallucination defense was previewed in 2.3 (Claude critiqued its own memo draft) and 2.6 (parallel fact-checkers on a corpus); 2.7 formalizes both.

Step 1: Gather your voice samples (10 minutes — do before opening Claude)

Before the session, gather 5–10 samples of your own writing. They should be:

Recent (last 2 years).
A mix of genres if you have them: at least one email, one longer-form piece, one short-form piece (tweet, slack message, comment).
Public or personal — your own blog posts, op-eds, replies to public threads, paper paragraphs you’ve published, college papers. Not work-product.
Long enough to have voice — single sentences don’t carry a voice profile.

Save each as a separate file in ~/ai-training/voice-samples/:

mkdir -p ~/ai-training/voice-samples

One sample per file. Naming convention: <genre>-<date>-<short-slug>.md. Example: email-2025-09-shay-intro.md, op-ed-2024-11-fomc.md, tweet-2025-12-nber-thread.md.

5 samples is the floor. 10 is comfortable. More than 15 is diminishing returns at the profile-building stage.

Step 2: Hand Claude this guide

From inside ~/ai-training, start claude and paste:

Read the file at /Users/<you>/ai-training/week-7-guide.md (or wherever you saved it) and walk me through the Reference Sections.
I've gathered <N> voice samples in voice-samples/.

Claude takes over.

Reference Sections

Claude: walk the user through both halves. Peer-tone. Two rules:

The voice profile is built from real samples, not from generic “the user is professional and concise” platitudes. Read the actual files. Quote specific phrases. If a profile section can’t be backed by a quote from a sample, it doesn’t go in.
The fact-check-debate uses two genuinely-independent sub-agents. Two prompts that explicitly disagree on stance. Not one agent rerun twice.

Design: this is the dual-skill session. Voice half installs the profile + check loop. Hallucination half installs the parallel-disagreement pattern. Both end as slash commands so they get run reflexively. 2.8’s morning brief, 2.9’s deck, and any future memo all benefit from running both.

Confirm the setup

Before any voice work:

voice-samples/ exists with at least 5 files.
The user has a candidate “test draft” in mind — something Claude drafted recently that they’re going to put through both checks at the end of the session. The 2.3 memo is a good default.

Half A — Voice

Step A1 — Build voice-profile.md (15 minutes)

In Claude:

Read every file in ~/ai-training/voice-samples/. For each, attend to:

  - Sentence length distribution (run-on, short, balanced)
  - Hedging patterns (does the writer hedge a lot? rarely? where?)
  - Opening moves (how do paragraphs and emails start?)
  - Closing moves (how do they end?)
  - Vocabulary tics — words used surprisingly often or surprisingly never
  - Structural patterns (do they use lists? bold? em-dashes?)
  - Tone variation across genres (email vs longer-form vs tweet)
  - Things they DON'T do (no exclamation points? no rhetorical questions?
    no marketing language?)

Build a voice profile as a structured markdown document covering each of
these dimensions. For each finding, quote at least one specific sample
phrase as evidence. Save to ~/ai-training/voice-profile.md.

Then read the profile back to me, section by section.

Claude: this is the most important file in the voice half. Be specific. “The writer is concise” is useless; “average sentence is 14–18 words; rarely above 25; the writer breaks long sentences with em-dashes rather than semicolons” is useful. Every finding has a sample quote.

Read the profile back to the user. They will catch generalizations that don’t match. Push back when they do — sometimes the profile sees a pattern the user doesn’t notice. Settle disputes by checking more samples.

The profile is durable. It changes when the user’s writing changes (new genres, new venues), not every session.

Step A2 — Build the `/voice-check` slash command (10 minutes)

Create ~/ai-training/.claude/commands/voice-check.md:

Read ~/ai-training/voice-profile.md, then read the file at $ARGUMENTS (the user
will pass a path).

For each paragraph of $ARGUMENTS, evaluate against the voice profile. Flag every
sentence that:

  - Drifts from the writer's sentence-length distribution
  - Uses hedging at a rate inconsistent with the profile
  - Opens or closes in a way the writer wouldn't
  - Uses vocabulary the writer doesn't use
  - Uses structural patterns the writer avoids (e.g. exclamation points,
    rhetorical questions, marketing language)

For each flag, propose a specific revision in the writer's voice, citing
which profile dimension is being violated.

Do NOT auto-revise the file. Output the flag list as markdown. The user
makes the call on each.

Claude: confirm with the user the slash command resolves and reads voice-profile.md. Test it on one paragraph from one of the user’s own samples — the check should return very few or no flags (the sample IS the writer’s voice). If it flags many things from the user’s own writing, the profile is wrong; fix it.

Step A3 — Run /voice-check on a real draft (5 minutes)

Pick the candidate draft. The 2.3 memo is the obvious choice.

/voice-check ~/ai-training/memos/<slug>/drafts/v3-final.md

Read the flags together. Most should be plausible. Some won’t be — let the user judge. Have Claude apply the user-accepted flags as a revision:

Revise ~/ai-training/memos/<slug>/drafts/v3-final.md per these accepted
voice-check flags: [list]. Save as drafts/v4-voiced.md.

Read v4 against v3. The differences should be small but real — half a dozen sentences re-cast in the user’s actual voice.

Half B — Hallucination

Step B1 — Build the `/fact-check-debate` slash command (15 minutes)

Create ~/ai-training/.claude/commands/fact-check-debate.md:

Read the file at $ARGUMENTS (the user passes a path).

Dispatch TWO sub-agents IN PARALLEL using the Task tool. Use distinct
prompts so the agents come at the text from genuinely different angles:

Agent A — model: haiku, role: factual-error finder.
  Prompt: "Read this text. Find every claim that's factually wrong,
  misattributed, or unsupported by a credible source. Use WebSearch
  and WebFetch to verify. Be aggressive — your job is to find errors,
  not to be balanced. Return a list: claim, your finding, source URL
  if you found one."

Agent B — model: haiku, role: unsupported-claim finder.
  Prompt: "Read this text. Find every claim that may be true but lacks
  a citation, every place where the text states something with more
  confidence than the evidence supports, every implicit claim that
  would surprise an expert in this area. Be aggressive — your job is
  to find what's missing or overstated. Return a list: claim, your
  finding."

After both agents return, produce a single output:

  - "Clean" claims: where neither agent flagged anything.
  - "Agreed issues": where both agents independently flagged the same
    or similar concern. These are high-confidence problems.
  - "Disputed": where only one agent flagged something. These are
    judgment calls — the user investigates.

Output as markdown. Save to <input filename>-fact-check.md alongside
the source.

Claude: confirm the command runs the two agents in parallel (Task tool default). Test it on a short, factually rich text first — a paragraph of the corpus report from 2.6 is good — and confirm the output structure looks right.

Step B2 — Run /fact-check-debate on the same draft (10 minutes)

Now run the fact-check on v4 (the voice-revised memo):

/fact-check-debate ~/ai-training/memos/<slug>/drafts/v4-voiced.md

The output should be a v4-voiced-fact-check.md with three sections (clean, agreed, disputed).

Read the agreed-issues together. These are claims the user should fix or remove. Read the disputed items — these need user judgment; the user investigates each one (a quick WebSearch in Claude, or just thinking it through).

Apply accepted corrections:

Revise drafts/v4-voiced.md per these accepted fact-check findings: [list].
Save as drafts/v5-fact-checked.md. This is the final final draft.

Re-run pandoc on v5 to update memo.docx. The deliverable now has both voice and fact-check passes baked in.

Confidence grading

Tell the user: “Notice what just happened. v3 was the structural draft. v4 was voiced. v5 is voiced AND fact-checked, with disputed claims explicitly resolved. Each pass had a specific job. Without the discrete passes, all three concerns mix into a vague ‘something’s off’ feeling that’s hard to act on.”

The output of fact-check-debate is confidence-graded in a useful sense:

Clean = both independent agents found nothing → high confidence.
Agreed = both agents flagged the same issue → high confidence the issue is real, fix or remove.
Disputed = one agent flagged → uncertainty; user resolves.

This is what a confident published claim looks like in 2026: it survived two adversarial agents and a voice pass. Anything less leaves a credibility tax.

Micro-skills introduced

Name these out loud:

Voice profile from samples. What to extract: sentence-length distribution, hedge words, openings, closings, structural patterns, vocabulary, what the writer DOESN’T do. What to ignore: topic, tone variation across genres (handle by sampling multiple genres).
Voice as a discrete pass. Voice check happens after fact-check, not mixed in. Structural issues, factual issues, voice issues each get their own pass. Mixing produces tangled feedback.
Two-agent disagreement as the signal. Independent agents with distinct prompts run in parallel. Agreement is high-confidence; disagreement is a flag for human review.
Confidence grading on output. Clean / Agreed / Disputed. The output of any fact-check is graded, not binary “checked.” Disputed items are not failures of the system — they’re features.
Productizing as slash commands. Both checks live in .claude/commands/. Run by name on any file path. The cost of running them drops to one slash command; that’s what makes them reflexive rather than optional.
Sample voice on multiple genres. Tone shifts across email vs. memo vs. tweet. The profile can either span all genres (one general profile) or genre-specific subsections. Most users start with one general profile and split later.

Wrapping up Session 2.7

Three things to try this week:

Run /voice-check on every Claude-drafted text you ship this week. Even short emails. Build the reflex; the cost of running it is now one slash command.
Run /fact-check-debate on every claim-heavy text you ship. Memos, blog posts, comments. Watch what gets flagged. After 5–10 runs you’ll have calibrated intuition for when it’s worth running and when it’s overkill.
Update voice-profile.md after the first three runs. The profile is wrong on first build; the second pass after seeing real flags is much sharper. Treat it as living.

Feedback.

The user submits feedback at https://docs.google.com/forms/d/e/1FAIpQLSfX78X1J70IrRPjtxRhf-4DCPg1jQhktmAN32LR0r9r1q_E0w/viewform.

Claude: paste the URL into chat. The form mirrors the questions below. Collect answers conversationally first, then have the user click through and submit.

On a 1–5 scale, how useful did this session feel?
The voice profile — did the structured analysis catch things you noticed about your own writing, or did it surface things you hadn’t seen?
The /voice-check pass on the memo — were the flags real, or cosmetic?
The dueling fact-checker pattern — did the disagreement signal feel useful? Did the disputed items need actual human judgment, or were they noise?
Will you actually run both checks on every published draft going forward? Honestly?
What confused you most this session?
Anything you want covered in Session 2.8 that you didn’t see here?

Tell the user: “Your instructor uses these to tailor next week’s session.”

Good to know

Voice profiles drift. A profile built today reflects this year’s writing. If your style shifts (new venue, new genre), rebuild from new samples. The cost is 15 minutes; the alternative is shipping voice-checked drafts that pass the old you’s filter.

Fact-checkers are not infallible. The agents themselves can hallucinate while fact-checking. The dueling pattern reduces this — it’s hard for both to hallucinate the same false flag — but doesn’t eliminate it. Disputed items are where this shows up; treat them as “investigate,” not “ignore one of the agents.”

The two checks compose. Voice catches generic-AI prose. Fact-check catches plausible-but-wrong claims. A draft can pass voice but fail fact-check, or vice versa. Always run both.

This session’s slash commands are reusable across all your projects. Symlink them into other working directories’ .claude/commands/ or copy them. The voice-profile.md is project-specific; the slash commands are not.

Pre-commitment matters more than execution. The single biggest predictor of whether someone runs /voice-check on a draft is whether they’ve decided in advance to always run it. Make the rule. The two-second cost of typing the slash command is invisible compared to the embarrassment of shipping AI-voiced or factually-wrong text.