TDD Vibe Coding in 2025

As of April 2025, you can successfully write small applications without looking at your code using AI code editors like Cursor or Windsurf. This is “vibe coding.” However, both you and your AI will quickly get bogged down in a confusing, credit-burning hot mess unless you level up your workflow.

In this post, I’d like to share how I sustainably write 1000+ lines of tested code per day without consuming too many credits. I treat my AI copilot like a gifted senior engineer who doesn’t care about code quality — I don’t trust it, but I know how to maximise its capabilities. The secret is Test Driven Development (TDD).

I use Windsurf because, as of December 2024, I found it handles large codebases better than Cursor.

My Workflow

Reset Context — Close tabs and start a new feature branch. This keeps the AI focused and makes it easier to review and reset when necessary.
Describe the Feature — Explain the next feature in plain English. It’s okay to ramble — the AI excels at extracting clarity from semantic chaos. I use the dictate feature on my Mac but don’t hit enter yet.
Plan Before Coding — Ask the AI to outline a step-by-step plan without writing any code. Have it analyse your codebase, create a test plan, and identify files to update. Converse until you have a detailed, correct plan. This is your first opportunity to enforce code standards. Save this plan separately, as you might need to reset context later.
Lock in the Plan with Tests — Direct the AI to write tests without implementing any features. It doesn’t matter if the tests are initially rough — we want a comprehensive overview. This ensures thoroughness upfront and prevents shortcuts when fatigue sets in later.
Work One Test at a Time — Keep tasks small. Prevent the AI from trying to solve everything at once — the more it attempts simultaneously, the harder it is to review. Less is definitely more.
Review Everything Meticulously — Discipline is crucial here. Passing tests aren’t automatically “good.” Review every change and take notes before attempting corrections. Tackle each problem in priority order. For instance, “DRY that out”… “Test x doesn’t test anything”…
Rewind Quickly — If the AI goes off track, it’s often faster to retry with clearer instructions than to let it attempt messy fixes. Use your IDE’s rollback feature or git reset. For minor issues, quickly fix them yourself.
Limit AI Retries — Give the AI two or three attempts at most. Continually hitting retry burns credits, wastes time, and compromises the original solution as it tries random hacks. Roll up your sleeves and fix the small details yourself. This is especially necessary when dealing with UI-dependent frameworks like Playwright, which the AI can’t visualise.
Commit Regularly and Update Your Plan — Mark completed tasks as “DONE” in your saved plan from step 3. Your amended plan helps with step 10.
Consider Resetting Context — Long AI chat sessions tend toward degraded performance as context piles up. Starting fresh with your updated plan from step 9 improves performance.
Rinse and Repeat — Continue until the feature is complete, manually testing and adding more tests as needed.

Warnings

This approach requires strong personal discipline. It’s easy to commit functioning but poorly structured code. Even as I write this, I keep finding surprises in my code because at some point I was lazy.
I write primarily in Python, which LLMs handle well. Results may vary depending on your programming language or framework.
AI amplifies your starting point. Good code improves; bad code deteriorates quickly. LLMs function like sophisticated autocomplete engines. For instance, as a Rails developer working in Django, I’ve never learned conventional Django — my AI copilot follows my home-brew style without suggesting standard practices.
Peer-reviewing 1000–3000 lines of code daily per developer remains an unsolved challenge.
AI technologies and IDEs evolve rapidly, so adapt as necessary.

Conclusion

I see test-driven vibe coding as a significant level-up, allowing developers to operate at higher abstraction levels with enhanced thoroughness. To use a construction metaphor, developers spend more time as architects, structural engineers, and surveyors, while AI acts as the tradesperson executing the tasks.