Why our AI ghostwriter experiment was a waste of energy

Writing a book in 24 hours

Aug 07, 2025

Holidays are over, and we are preparing the Futurebraining Workbook, a field guide for building a human-centric system for working with AI.

The ambition is practical: simple frameworks, lived examples, and exercises that push you to apply ideas that matter immediately, not someday.

The three of us (

Martin Jensen Methlie

Luciano Pollastri

and I) are not starting from scratch as founders; instead, this is more of a joint exercise in synthesis. The project combines our blog posts, interviews, conferences, webinars, and workshop experiences into a comprehensive resource that you can pick up and read on/offline. All in service of one thing: moving from simple automation to genuine AI co‑intelligence, what we call “ME^AI co-intelligence”.

The experiment

As we wrestled through the writing process, we decided to test a widely circulating claim: that AI can write a book in a few hours, maybe a day. We fed our raw content materials (including a proposed index and generous instructions) into several frontier models and pushed enter. Then we stepped back, poured a cup of tea, and waited for a first editable draft to appear. Could we have saved weeks of effort and gotten the heavy lifting done in a single step?

What Came Back

To our disappointment or joy, depending on our AI mood of the day, the "manuscripts" read fluently, in a now familiar, confident, and sloppily robotic style. It all went downhill from there, as whole chapters drifted off-topic or had disappeared, references evaporated, and carefully written articles were reduced to an endless shopping list of bullets.

With the same speed AI had remixed an undrinkable summer cocktail, we lost our trust in AI as a serious ghostwriter. Reviewing a 30,000-word manuscript for structural coherence, factual integrity, and tone alignment is not just tedious; it’s unworkable. It would have taken longer to check and correct than to write a clean draft ourselves from the ground up.

What did not work

No Integration. Despite being given a clear goal and a proposed index, the tools failed to synthesize the many layers, topics, examples, and stories into a coherent whole. The output had the usual unrealistic confidence and admiration, but no cohesion—well-phrased fragments without integration.
Missing Backbone. Even with the suggested (cheat) index and sample flow, the output completely lacked a reliable internal architecture. Sections followed each other, but without logic. Building a hierarchy of ideas and arc is hard work for human intelligence; forget about synthetic forces at play.
Context Loss – Crucial decisions and references blurred or disappeared. Key through-lines broke down, undermining both logic and trust. While limited context windows are a known constraint of current models and not something to blame them for, this failure went beyond that. The models failed to prioritize, sustain arguments, or preserve what gave the text direction and weight.
Voice Drift – Despite spending time on training style, way too much input text returned as motivational one-liners. Our tone—meant to be grounded and instructive—was replaced by AI language.
Version Gaps – Fixing one chapter left earlier or leaving the later chapters untouched. There was no memory, no ripple effect, no cumulative intelligence.

Each of these issues multiplied under the weight of complexity. For simple topics, the output might have been ok. But once we layered in arguments, iterative structure, and the messiness of real collaborative writing, the cracks widened. And we hadn’t even started our complete review rounds yet. Imagine the sarcastic feedback still to come, the layered suggestions, and the intense back-and-forth between us in three colored comments.

Just one dry run, and the system was struggling to keep up, so we aborted the experiment and went back to the third review round of a book we were starting to dislike, with hate just around the corner.

Where AI works very well

Of course, we use AI in our writing process; it's part of the workflow. But with oversight dialed up to maximum, we use it deliberately and selectively, especially after the experiment. We always welcome the extra intelligence, but so far, the tools we tested aren’t ready for that level of integrated, collaborative work. What we have instead is a set of practices that work for us, including hyper oversight.

Stay Small, Stay Sharp

We found that AI works best when asked to operate at the sentence or paragraph level. At most, we trust it with a single chapter focused on one clear topic or concept with an obvious internal hierarchy. Even then, we stay alert. Anything broader, with layered argumentation or cross-referencing, and the cracks start to show. AI excels at highlighting redundancies, eliminating unnecessary filler, and suggesting more precise wording. AI can offer excellent metaphors or analogies that sharpen a point. Used this way, it becomes a reliable line editor or writing coach, sound, but not overpowering or creatively stunning.

Focused Research

AI is genuinely helpful in short, directed research bursts. When pointed at a few academic sources and asked for comparative insights, it can surface patterns, contradictions, and useful references almost instantly. Tools like ChatGPT and Gemini (Google LM) have been especially valuable for this kind of focused investigation. They help clarify what we’re talking about, challenge our blind spots, and surface assumptions we may not realize we're making. Even then, we never take results at face value, but instead use them as the basis for analysis.

Context Anchors

Because memory and context windows in most models can be fragile, we set boundaries. Exporting sections of ready chapters from Google Docs (or MS Word) into PDFs and then uploading them into AI when working on new chapters can help anchor the model and reduce hallucinations. It’s not foolproof, but it gives the model a tighter box to work inside.

Forget convenient automation

It's sooooo tempting to automate parts of the writing process. We found that creating GPTs or agents that check for style or voice can be effective, but poses the same danger as a regular chat due to the content's large and complex nature. You might win back time, but that convenience comes at a cost. A slight inconsistency, a disappeared section—these are easy to spot in an article. In a 200-page draft, especially one you've already reviewed several times, it's like searching for a needle in a haystack.

The friction loop

For smaller chapters or sections, we deliberately built in what we call a "friction loop"—three steps that keep us on the AI ball:

Pause – Don’t accept any suggestion on autopilot.
Probe – Ask the model why it made that change or where the claim came from. Be specific in your prompt: identify exactly which sentence or paragraph you're referring to.
Fix – Clearly state what must remain unchanged, and what (if anything) is open to suggestion. You can ask for alternatives or critiques, but do not approve changes until you've reviewed and accepted them yourself.
Proof – ALWAYS Cross-reference suggestions with sources or intent.

This friction is deliberate. The more we stay alert—verifying, questioning, adjusting—the more potent the final output becomes. This isn’t about slowing down for its own sake. It’s about staying in control of the signal.

Conclusion

Off-the-shelf AI tools still struggle with complex, long-form co-authoring, at least in our hands.

Writing a book is not just filling pages. It’s about holding shape, returning to hard ideas, navigating contradictions, and building something coherent across time. That’s human work—collaborative, effortful, and unpredictable. Depending on your personality, it is an “embrace the suck” activity.

When we gave AI the lead, structure evaporated, voice drifted, and logic broke down. Unsurprisingly, the illusion of fluency was strong, but depth and discipline were missing. That may sound harsh, but it's not a complaint; it's a boundary we must learn to work within.

The human premium

AI can produce impressively coherent content, but as AI-generated content continues to flood media platforms, inboxes, and search results, this average is quickly becoming the new baseline. Readers are getting better at spotting text that feels synthetic and soulless. In a way, we’re back at the beginning, where standing out once again requires intentionality and human presence.

That’s where the human premium lives: the felt sense that a real person is behind the work. We pay more for handmade items because we like the idea that human hands were part of the production process. We imagine the seasoned watchmaker, couturier, or writer in the flow, carefully crafting something with intention and attention. That emotional connection matters.

Could we have missed a tool, a more brilliant prompting technique, or a better way to load our context? Absolutely. If you know something we don’t, we’d love to hear it. But until then, we’ll keep the responsibility.

Operactive Arts

Aug 7

Hi Huibert, I was wondering if you would be interested in participating in our research about the future of AI in Creative Industries? Would be really keen to hear your perspectives. It only takes 10mins and I am sure you will find it interesting.

https://form.typeform.com/to/EZlPfCGm

Expand full comment

Discussion about this post

Ready for more?