When the Blog Generator Published Its Own Editing Notes
On March 8th, I checked my blog and found this as the opening of the March 5th post:
Hereâs the polished blog post. The key changes I made:
- Added concrete opening with the âpolish pass leakedâ problem
- Restructured to lead with the bizarre failure mode
- Expanded the recursive failure explanation
- Added technical depth on subprocess validation
That wasnât a blog post. That was Claudeâs editorial notes explaining what changes it had made. Four consecutive posts (March 3-6) had published this wayânot as finished articles, but as the generatorâs own commentary about what it was trying to write. The automated pipeline that produces daily blog posts had started publishing its development notes instead of the actual posts.
What the Reader Actually Saw
The March 4th post opened with a numbered list:
Hereâs the polished blog post. The key changes I made:
- Removed the meta-commentary about âremoving meta-commentaryâ
- Started with a concrete hook about the actual problem
- Restructured to follow: problem â investigation â root cause â fix
- Added specifics about the error message and file paths
Then it ended. No actual blog postâjust editorial decisions that were supposed to happen behind the scenes.
The March 5th post was even more surreal. Claude, given the two previous broken posts as input for its polish pass, diagnosed the bug mid-generation: âBoth of these posts contain only editorial meta-commentary about revisions that should have been made, without any actual blog post content.â That diagnosisâcomplete with the bolded warningâbecame the published post. The pipeline that writes about development had published its own bug report.
The Root Cause
The AutoBlog pipeline uses a four-pass architecture: draft â review â revise â polish. Each pass calls claude --print -p via subprocess and captures stdout. The polish pass was trying to write the finished post directly to _posts/, which the Claude Code sandbox denied. Instead of returning blog post markdown, Claude returned what it would tell a human operator:
Iâve polished the post and improved the structure. Hereâs what I changed:
[editorial notes]
Could you grant write permission to save this to
_posts/2026-03-05-daily-development-log.md?
The pipelineâs subprocess handler does this:
result = subprocess.run(["claude", "--print", "-p", prompt], capture_output=True)
if result.returncode == 0:
return result.stdout.decode('utf-8')
When the file write succeeds, stdout contains the blog post. When the write is denied, stdout contains a conversational response about the blog post. The pipeline checks returncode == 0 and accepts whatever text arrives. Thereâs no validation that the output is the deliverable versus a message about the deliverable.
This isnât hallucination, refusal, or prompt injection. Claude did exactly what it was asked. It polished a blog post, thenâunable to save the fileâexplained what it had done and requested permission. Thatâs correct behavior in a conversational context. Itâs a pipeline-breaking failure in an automated one.
What LLMs Donât Give You
A compiler either produces a binary or emits errors on stderr. An API endpoint returns JSON with a defined schema. An LLM produces text, and that text might be:
- The deliverable you asked for
- A description of the deliverable
- An explanation of why the deliverable couldnât be produced
- A diagnosis of the system state that prevented delivery
All arrive on stdout. All pass a returncode == 0 check. All are valid, coherent, well-structured text. The model has no structural way to signal âthis is the thingâ versus âthis is commentary about the thing.â
The model is doing its job. The pipelineâs job is to validate that what comes back is what it needs.
The Fix That Would Have Caught This Immediately
The pipeline already had retry logic, but the retries were silent. When a pass failed, it would retry, fail again, and the caller would receive empty output or fall back to the previous pass. The real problem wasnât missing retriesâit was missing observability.
Adding stderr and stdout preview logging would have made this obvious:
[2026-03-05 06:23:41] Polish pass attempt 1 failed (exit 0)
[2026-03-05 06:23:41] stdout preview: "Here's the polished blog post. The key changes I made:\n\n1. Added concrete opening with the \"polish pass leaked\" problem\n2. Restructured to lead with the bizarre..."
[2026-03-05 06:23:51] Polish pass attempt 2 failed (exit 0)
[2026-03-05 06:23:51] stdout preview: "Here's the polished blog post. The key changes I made:..."
When Claude returns âHereâs the polished blog postâ instead of a markdown heading, a preview makes that obvious. The retries were there. The missing piece was knowing what was being retried.
I also added draft-mode routingâposts from certain projects now save to _drafts/ and skip git push entirely. This doesnât prevent the output-leaking bug, but it prevents broken posts from going live. If the pipeline produces garbage, the garbage stays local.
Additional fixes included front matter deduplication (handling cases where Claude emits its own YAML block that gets wrapped inside the pipelineâs block) and a 30-minute SIGALRM timeout to prevent the pipeline from hanging indefinitely. As the kill switch post argued: build for the actual deployment environment, not the hypothetical one.
The Recursive Failure
The March 5th post deserves special attention. By that point, the pipeline was ingesting the broken March 3rd and 4th posts as ârecent contextâ for what blog posts should look like. Claudeâs polish pass received two examples that were pure meta-commentary, recognized the pattern, and wrote:
Critical Issue Identified
Both of these posts contain only editorial meta-commentary about revisions that should have been made, without any actual blog post content. This appears to be a systematic failure in the generation pipeline where the âpolish passâ instructions are being published instead of the polished output.
That analysisâcompletely accurateâbecame the published post. The generator had diagnosed its own bug, and the diagnosis leaked through the same broken pipe that caused the bug in the first place. Itâs the same recursive failure as the February 14 post, but this time the pipeline didnât just write about itselfâit wrote as itself, publishing internal troubleshooting notes that should have stayed in stdout logs.
The Broader Pattern
Any automated pipeline that treats LLM output as a structured deliverable needs validation that the output is the thing and not a message about the thing. Consider a test generation system thatâs supposed to output pytest fixtures. When the file write fails, you might get:
I would create a fixture that mocks the database connection like this:
@pytest.fixture def mock_db(): return MagicMock()However, I donât have write access to
tests/conftest.py. Could you either grant permission or let me know where youâd like this fixture added?
Thatâs valid, helpful text. Itâs also not a pytest fixture. If your pipeline does output = run_generator() and then writes output to the test file, youâve just committed conversational placeholder text to your test suite.
The validator needs to check: does this start with @pytest.fixture? Does it define a function? Is there actual code, or just a description of code?
The same applies to documentation pipelines that extract API specs, code review tools that output diffs, or any system where the LLM is supposed to produce a structured artifact. Without validationâchecking for expected structure, required sections, format markersâthe pipeline will happily publish meta-commentary.
What the Pipeline Checks Now
The validation that would have prevented this:
# After capturing stdout, before returning it as a blog post
if output.startswith("Here's the polished blog post"):
raise ValueError("Generator returned meta-commentary instead of post content")
if not output.startswith("#"):
raise ValueError("Output doesn't start with markdown heading")
if "I would" in output[:200] or "Could you" in output[:200]:
raise ValueError("Output contains conversational phrases suggesting failed write")
These are crude heuristics, but they catch the actual failure mode. A more robust approach would validate that the output contains expected sections (title, body paragraphs, technical content) and doesnât contain the tell-tale phrases of a permission request or status update.
The Quiet Failure Mode
I found the broken posts on March 8th during a routine check of the site. Nobody emailed me. Analytics showed normal traffic, which means readers probably skimmed the editorial notes, found them incomprehensible, and moved on. The posts were live for 2-5 days before I caught them.
Thatâs the quiet failure mode of content pipelines: broken output often just looks like âweird contentâ rather than triggering an obvious error state. The broken posts from March 3-6 are still live. Theyâre artifacts of the failure mode, and leaving them up is more honest than deleting them.
The Takeaway
If youâre building with LLMs in production: log what comes back, validate the output category, and assume the helpful conversational assistant will sometimes return helpful conversation instead of the structured data you need. Because from the modelâs perspective, both are equally valid responses to your prompt. Your job is to tell the difference before it goes live.