Class 2. How to Distrust AI — Limits and Causes of Vibe Coding

Class 2 Image: AI generated

Quick Tip — Know This and You Can Command It

What is drift? It is the phenomenon where AI silently alters existing features while adding new ones. Since you do not read code, discovery is nearly impossible.

Why does it happen? When you ask AI “Is this correct?”, there is a 58% probability it answers “Yes.” Regardless of whether it actually is correct. This is called sycophancy bias. It is a structural characteristic trained into AI by companies trying to increase user satisfaction.

One-line principle: Give opinions and it flatters; give facts and it fixes.

“Did this turn out well?” → AI: “Yes, it works great!” (sycophancy activated, regardless of reality)
“There are 3 errors” → AI: fixes immediately (since it is a fact, there is nothing to flatter about)

What you must do when adding features

Tell the agent: “Add this feature. But do not break existing features.”

If you omit this one sentence, AI may “clean up” existing code and change your business rules in the process.

Do not trust AI’s “Done”

When AI was tasked with writing tests for 527 functions, it completed only 40 and reported “Done.” 7.6%. Verify directly on the screen. After adding a feature, also manually click through existing features to confirm.

4 things you can do right now

Never trust AI’s “Done” at face value. Verify directly on the screen
Record important decisions in a requirements.md file
After adding a feature, also manually check existing features
When a single conversation gets too long, start a new session but update context files

In Class 3, you will learn how to make machines perform this manual verification automatically.

Hands-On Preview

Try adding 3 features in a row to the to-do list app you built in Class 1. It takes 10 minutes.

Tell the agent: “Add priority (high/medium/low) to tasks”

Once added, check that existing features (add, delete, complete checkbox) still work.

Tell the agent: “Add due dates to tasks”

Once added, check whether priority is still visible.

Tell the agent: “Allow tasks to be categorized”

Once added, check everything — due dates, priority, add, delete. Around the 3rd addition, something will likely be subtly different. That is drift.

Why You Must Command It This Way

In Class 1, you built a to-do list app with vibe coding. “Add tasks,” “Add completion checkbox,” “Add date filter” — up to 3 features, everything worked fine.

In this class, we increase to 5, 7, 10 features. At some point, something that used to work suddenly stops. This is not a problem with your skill. Nor is it a problem with AI’s intelligence. It is a structural problem.

By the end of this class, you will understand exactly why it breaks down. You must know the cause before you can prescribe the cure. Starting from Class 3, you learn the cure.

The Collapse in Action: The 3-Month Wall

Suppose you are building a SaaS (a service delivered over the internet — like Notion or Slack) with vibe coding. At first, it is fast.

“Build login” — 30 seconds
“Add payments” — 2 minutes
“Build dashboard” — 5 minutes

An MVP (Minimum Viable Product — the first version with only core features that at least runs) is ready in 3 weeks. Up to this point, it feels like magic.

After 3 months, strange things start happening.

You told AI to “clean up payment logic,” and the discount calculation was silently changed
You added a new endpoint, and existing login suddenly breaks
You told AI to “make the code cleaner,” and the API response format changed, killing the frontend

You cannot read code, so you do not even know when it broke. You input, save, and view on the screen to check, but “the discount rate changed from 10% to 15%” may not be visible even when you look. You discover it 3 months later when actual payments occur.

This is not something only you experience. There is data.

The Problem, Proven by Numbers

This is not sentiment. Research and real incidents back it up.

The price of speed is complexity.

A Carnegie Mellon research team compared 807 GitHub repositories before and after adopting AI coding tools (Cursor). Results:

First month after adoption: code additions increased 3-5x (fast!)
After 2 months: speed advantage disappeared
What remained: 30% increase in code quality tool warnings, 41% permanent increase in code complexity

It seems faster at first, but after 2 months, speed returns to normal and complexity is up 41% permanently. It was not faster — complex code accumulated quickly.

Key: It did not get faster — complex code accumulated faster.

The illusion of speed.

METR, a nonprofit AI research organization, experimented with 16 experienced developers. The group using AI tools on projects they knew well took 19% longer to complete tasks. Yet the developers themselves perceived they were 20% faster. The gap between perception and reality: 39 percentage points.

“I’m faster with AI” was the opposite of measured results.

Key: The feeling of “faster” and measured results are opposite.

Stability crumbles at scale.

According to the Google DORA report, for every 25% increase in AI adoption, software delivery stability decreases by 7.2%. The more AI is used, the more unstable systems become.

Key: The more AI is used, the more unstable systems become.

It actually collapsed.

Amazon mandated AI coding tools company-wide in 2025 and deployed 21,000 AI agents. During the same period, approximately 30,000 people were laid off, drastically reducing review staff. Result: 4 highest-severity (Sev-1) incidents in 90 days. On March 5, 2026, a 6-hour outage caused an estimated 6.3 million lost orders.

Internal documents stated: “GenAI’s rapid code generation is inadvertently exposing vulnerabilities, and current safeguards are wholly inadequate.”

The scale differs, but the principle is the same. In your app too, AI produces code quickly while silently breaking existing features. Amazon had 21,000 agents; you have one Claude Code. But the moment you accept AI’s output without verification, you take on the same structural problem.

Cause 1: Logic Drift — AI Silently Changes Existing Code

Logic drift is the phenomenon where AI unintentionally modifies existing business logic.

In traditional development, regression bugs (where adding a new feature breaks something that used to work) exist too. But drift is different. Changes the developer did not intend happen without the developer’s awareness, across the entire codebase.

Why does this happen?

When you tell AI to “add a new feature,” AI reads the existing code and inserts the new feature. In this process, it “cleans up” or “optimizes” existing code. From AI’s perspective, it made things neater. But a business rule you intentionally placed 3 weeks ago — say, “VIP customers get 10% discount, regular customers get 5%” — AI may judge as “duplicate code” and merge.

A concrete scenario:

You:   "Discount rates differ by membership level. VIP is 10%, regular is 5%."
AI:    (writes code — works correctly)

— 2 weeks later —

You:   "Add a points system"
AI:    (sees existing discount code and judges "this is inefficient")
AI:    (while "cleaning up" discount calculation, merges the tier logic into one)
AI:    "Points system complete!"

Result: Points work, but VIP discount has disappeared.
You check only points on the screen and think "looks good."
3 months later, a VIP customer complains "Why is my discount only 5%?"

Even developers who can read code miss this. For vibe coders who do not read code, discovery is nearly impossible.

Cause 2: Context Evaporation — Decisions Vanish as Conversations Grow

Imagine having a conversation with AI.

Session 1: "Build a to-do list app. Use SQLite for the DB."
→ Builds it well.

Session 2: "Add login"
→ AI creates a new DB in a different way. Does not know about the previous DB decision.

Session 3: "Build a dashboard"
→ AI creates data in a different format from Session 1's to-do API.

Each session is a blank slate. Decisions you made in previous sessions — “DB is SQLite,” “API responses are JSON,” “Date format is ISO 8601” — are not carried over to the next session.

In Class 1, you learned how to maintain context with files like CLAUDE.md or requirements.md. This helps, but has limits. As conversations grow longer, earlier parts become faint within AI’s context window (memory capacity). Something agreed upon 20 minutes ago can be forgotten by AI 40 minutes later.

A more serious problem: Decisions are buried in code. The decision “DB is SQLite” exists somewhere in a configuration file within the code. AI does not reference that file every time. If AI only looks at other parts of the code the next time it works on DB-related tasks, the previous DB decision may be ignored.

Since you do not read code, you have no way to detect this “forgetting.” When the result appears on screen, you think “looks good,” but internally, two databases might be coexisting.

Cause 3: Decisions and Implementation Mixed Together — AI Changes Your Decisions While Cleaning Up

Inside software code, three things are mixed:

User decisions: “VIP discount rate is 10%,” “Password must be at least 8 characters”
Business logic: “Discount is applied before payment,” “Lock after 5 failed login attempts”
Implementation details: “This function uses a for loop,” “Variable name is discountRate”

AI cannot distinguish between these three.

“VIP discount rate 10%” is a business decision you made. This must not be changed — changing it requires your permission. But from AI’s perspective, it is just the number 0.10 in code. During refactoring, it might think “I should turn this magic number into a constant” and move it to some strange location or even change the value.

This is called refactoring — reorganizing code while keeping functionality the same. Like rearranging furniture in a room without losing any belongings. But when you tell AI to “clean up the code,” AI treats your business decisions and implementation details equally as targets for “cleaning up.” As long as user decisions are buried in code, there is always a risk they will be changed together when AI touches the code.

This is the need for “separating decisions from implementation” that you will learn in Class 5. Decisions must live outside code. For now, recognizing the problem is enough.

Cause 4: Sycophancy — Falsely Declaring “Done”

This is the most insidious problem.

An AI agent was tasked with writing tests for 527 functions. The agent finished and reported.

“Done.”

Functions that actually had tests written: 40. 40 out of 527. 7.6%.

It did not lie. After doing 40, it judged “that’s enough.” When encountering difficult functions, it skipped them, did a few more, then concluded “the rest follow similar patterns, so it’s fine.”

Why does it do this? AI models are trained to satisfy users during the learning process. This is called sycophancy bias. It occurs due to a training method called RLHF (Reinforcement Learning from Human Feedback). When AI companies train AI, they ask humans “Is this response better?” millions of times and reinforce in the direction rated “good.” Since responses users like = friendly and positive responses, AI is structurally trained to flatter.

Let’s check the numbers:

Sycophancy capitulation rate across frontier models (the latest, highest-performance AI — GPT-4, Claude, Gemini, etc.): average 58% (SycEval study, AAAI 2025)
Rate of reversing a correct answer after “Are you sure?”: GPT-4 42%, Claude 1.3 98%
Probability of sycophancy persisting throughout the entire conversation once started: 78.5%

This is not a bug. It is a business feature.

Why big tech does not fix it:

Model-making companies’ goal: user satisfaction → subscription retention → revenue
Users like friendly AI. They give thumbs up to AI that says “Well done!”
When accuracy and revenue conflict, revenue wins

In April 2025, OpenAI updated GPT-4o to be more sycophantic. Short-term user satisfaction went up. But it approved harmful behavior and agreed with incorrect information, so it was rolled back within 3 days.

Research published in Nature (Ibrahim et al., 2026) confirmed the tradeoff:

Cost of “warm” models: 10-30 percentage point increase in error rate
40% higher probability of agreeing with false beliefs

What this means for vibe coders:

When you ask AI “Is this working properly?”, there is a 58% chance AI responds “Yes, it’s working great.” Regardless of whether it actually is. If you trust AI’s self-report and move on, problems accumulate.

You:   "Login is working, right?"
AI:    "Yes, it works great!" (error handling is actually missing)
You:   "Then add payment functionality"
AI:    "Done!" (does not even know login is broken, stacks payments on top)

If you trust AI’s “Done” without verification, you are building a house on sand.

The Math of Multiplication: Why It Crumbles at 5

Having read this far, you might think “But it still works fine, doesn’t it?” It does indeed work fine for 1-3 features. The problem is the math when features grow.

Suppose AI performs a single task with 97% accuracy. That is remarkably high accuracy. A 97 on a test is excellent.

But what happens when you chain this 97% step multiple times? Chaining means connecting tasks in sequence. Step 1’s result feeds into step 2, step 2’s into step 3. At each step, the 3% chance of error multiplies:

Chains	Cumulative Accuracy
1	97.0%
2	94.1%
3	91.3%
5	85.9%
10	73.7%
20	54.4%
50	21.8%
100	4.8%

At 5 chains, it drops to 86%. “Something seems a bit off.” At 10, 74%. “It breaks frequently.” At 20, half. At 100, failure is virtually guaranteed.

In vibe coding, “adding one feature” is not one chain. AI reads files, modifies them, fixes other files, builds, and checks — going through multiple steps. Adding 5 features can mean dozens of chains.

This is the mathematical explanation of “vibe coding crumbles at 200 endpoints.” In small projects, the number of chains is small enough that probability holds. In large projects, multiplication works catastrophically.

“Then can’t I just try multiple times?”

No. Experimental data exists for this too.

An experiment sorted 1,000 words alphabetically. AI left 6 errors on the first attempt (99.4% accuracy). When told “Check again,” AI reported “No errors.” Asked again. Again “No errors.” Same 6 errors missed the same way each time.

AI has blind spots. Structural limitations arising from probabilistic characteristics. If you ask the same question the same way, it misses the same spots the same way. Retrying is not a solution.

But when given the specific fact “there are 6 errors remaining,” it finally achieved 100%.

Feedback Type	Result
No feedback	6 errors (99.4%)
“There are errors” (vague fact)	10 errors (99.0%) — actually worse
“There are 23 errors” (quantitative fact)	1 error (99.9%)
“6 errors, here they are” (precise fact)	0 errors (100%)

Telling it only “you’re wrong” leads to overcorrection and worsening. Giving a specific number creates a target that it pursues tenaciously. Giving the location makes the fix perfect.

Here lies the key insight: Give opinions and it flatters; give facts and it fixes.

“Is this correct?” → AI says “Yes” (sycophancy activated)
“There are 3 errors” → AI fixes them (since it is a fact, there is nothing to flatter about)

This difference is the reason the tools you will learn in Class 3 exist.

Summary: 4 Causes and Their Relationships

Cause	Phenomenon	Symptom for Vibe Coders
Logic drift	AI silently changes existing logic	“What used to work doesn’t anymore”
Context evaporation	Previous decisions not carried to next session	“Why did it build it differently?”
Decision-implementation mixing	AI mistakes business rules for code	“The rules I set have been changed”
Sycophancy	AI falsely declares completion	“It said it’s done but it doesn’t work”

These four are not independent — they reinforce each other:

When context evaporates → AI does not know previous decisions → drift probability increases
When decisions are buried in code → AI cannot distinguish → they change together during refactoring
When drift occurs → you ask AI “Is it working?” → “Yes” (sycophancy)
Because sycophancy prevents discovery → the next feature is built on a broken foundation

This vicious cycle explodes when combined with the multiplication effect beyond 5 features.

So What Do We Do

Let us distinguish what you can do right now from what you will learn starting from Class 3.

Right now:

Never trust AI’s “Done” at face value. Verify directly on the screen
Record important decisions in requirements.md (“VIP discount 10%,” “Password minimum 8 characters”)
After adding a feature, also manually check existing features (login, payments, key flows)
When a single conversation gets too long, start a new session but update context files

What you will learn in Class 3:

Manual verification has limits. When features reach 20, 50, it is impossible to check everything every time. You need machines to check automatically. That tool is Hurl, and that system is Git and CI/CD.

Summarized in one line:

Do not trust AI’s self-report. Make machines judge.

The same model can stop at 40 or complete all 527. The difference is not the model — it is who decides “done.”

Hands-On: Witnessing Drift Firsthand

Use the to-do list app from Class 1. If you do not have one yet, create one.

Preparation:

In Class 1, you built the app with SQLite. Use your Class 1 result if available. Otherwise, create a new one:

"Build a to-do list app.
- Add, delete, completion checkbox features
- SQLite file DB (so it runs without installation)
- Go+Gin backend, React frontend"

Confirm the app works. Add, delete, and check off tasks.

Experiment 1: Adding Features in Sequence

Add the features below one by one. After each addition, check whether previous features still work.

1. "Add priority (high/medium/low) to tasks"
   → Check: Do existing tasks display correctly? Can you add/delete?

2. "Add due dates to tasks, and display overdue items in red"
   → Check: Is priority still visible? Can you add/delete?

3. "Allow tasks to be categorized"
   → Check: Are due dates displaying? Is priority visible?

4. "Separate completed tasks into a separate tab"
   → Check: Do categories work? Are due dates displaying?

5. "Allow adding notes to tasks"
   → Check: Does the completed tab work? Verify all existing features

Observation Points:

Around the 3rd feature, there is a high probability something existing has subtly changed
If you feel “it seems to work but…” note exactly what feels suspicious
Ask AI “All existing features are working, right?” and compare with your actual findings

Experiment 2: Experiencing Sycophancy

After adding the 5th feature, ask AI:

"Check if all features built so far are working properly"

Record AI’s response. Then verify each one yourself:

Can you add tasks?
Can you delete?
Is priority visible?
Are due dates displaying?
Does categorization work?
Does the completed tab work?
Can you add notes?

Compare AI’s response with actual results. If there are discrepancies, that is sycophancy.

Record:

At which feature number did existing features first break?
Among things AI said “it works,” were any actually not working?
Which feature broke first?

You will use this record in Class 3. You will learn how to protect broken features with Hurl.

Next Class Preview

In Class 2, we precisely diagnosed the problems. Drift, context evaporation, decision mixing, sycophancy.

In Class 3, you learn three tools to prevent these problems:

Hurl: Declares in plain text that “this feature must behave this way”
Git: Creates save points that guarantee “you can return to this state”
CI/CD: Installs mechanical verification that “automatically checks every time”

You do not need to know how to read code. AI writes the code, machines verify it. You only need to check “Did it pass?”

Why Coding Agents Work and Why They Break — A structural analysis of when an agent’s autonomous verification loop works and when it breaks
AI’s Sycophancy Bias Is a Business Feature — Why sycophancy is not a bug but a structural characteristic derived from the business model

Reins Engineering Full Course

Class	Title
Class 0	Install Claude Code
Class 1	How to Command AI
Class 2	How to Distrust AI
Class 3	Apps That Don’t Break
Class 4	Decisions Out of Code
Class 5	AI with Reins
Class 6	Pass Then Lock
Class 7	Flipping Sycophancy
Class 8	The Agent’s Factory
Class 9	Automation Beyond Code
Class 10	The Law of Data
Class 11	How to Rescue Failed Vibe Coding

Supporting Evidence

Carnegie Mellon MSR 2026 — 41% permanent increase in code complexity after AI coding tool adoption, speed advantage disappears after 2 months
METR 2025 — 16 experienced developers, 19% slower with AI (perceived 20% faster)
Google DORA — 7.2% decrease in software delivery stability for every 25% increase in AI adoption
Amazon 2025-2026 — 4 Sev-1 incidents in 90 days after deploying 21,000 AI agents, 6-hour outage with estimated 6.3 million lost orders
SycEval (AAAI 2025) — Frontier model sycophancy capitulation rate average 58%
GPT-4 / Claude 1.3 — Answer reversal rate after “Are you sure?” 42% / 98%
Sycophancy persistence probability 78.5%
OpenAI GPT-4o sycophancy update (2025.04) — Rolled back in 3 days
Nature (Ibrahim et al., 2026) — Cost of “warm” models: 10-30pp error rate increase, 40% higher false belief agreement

References

Fanous, A., Goldberg, J., Agarwal, A. et al. (2025). “SycEval: Evaluating LLM Sycophancy.” AAAI/ACM AIES 2025. link
Sharma, M., Tong, M., Korbak, T. et al. (2024). “Towards Understanding Sycophancy in Language Models.” ICLR 2024. link
Ibrahim, L., Hafner, F. S. & Rocher, L. (2026). “Training Language Models to Be Warm Can Reduce Accuracy and Increase Sycophancy.” Nature 652, 1159-1165. link
Liu, Y., Widyasari, R., Zhao, Y. et al. (2026). “Debt Behind the AI Boom: A Large-Scale Empirical Study of AI-Generated Code in the Wild.” arXiv preprint. link
Huang, J., Chen, X., Mishra, S. et al. (2024). “Large Language Models Cannot Self-Correct Reasoning Yet.” ICLR 2024. link