What I Found and Had to Fix After Joining a Vibecoded Project
In April 2026 I joined a platform built by someone with deep expertise in Design and zero formal background in software development, using Cursor as their primary tool. In roughly two months they shipped a full multi-tenant healthcare platform running in production with real users: four user roles, real-time chat, an installable PWA, Google OAuth, tokenized invite flows, OpenAI-powered report generation, and clinic management. Building that in two months is not trivial.
What I found when I read the code is what this post is about.
What was built before I arrived
The platform is built for pediatric therapy clinics. It connects parents, therapists, supervisors, and school staff around a child’s development: scheduling, clinical reports, progress tracking, multi-party chat, LGPD-sensitive data. A security failure here is a risk to children’s privacy and safety, not just a technical problem.
The commit history is honest about how it got there. Over two months, Cursor helped deliver route handlers, Supabase RLS policies, React components, authentication flows, invite emails, and database migrations. The velocity was real, the product worked, and users were relying on it.
When I joined on April 10, my job was not to confirm that it worked. It was to find out how.
The first hour without running anything
Before opening a terminal, before running the first npm install, I just read code.
That sounds almost old-fashioned now, but it is how we worked before coding agents existed: read the commit history, the database schema, the API routes, the RLS policies, look for patterns, assumptions, inconsistencies. The hype around “AI can do everything, you don’t need engineers anymore” is usually about the doing part: writing code, generating tests, fixing bugs. Maybe that part is on the way to being solved, I cannot predict the future. But the looking part is a different skill, and it is the one that is hardest to delegate. It comes from having seen enough code, enough systems, and enough failures to recognize patterns that are invisible to someone who has not.
The service role key was leaking in error messages
SUPABASE_SERVICE_ROLE_KEY is the equivalent of a database root password: it bypasses every Row Level Security policy you have, and whoever holds it can read, write, or delete any row in any table.
I found it being returned in HTTP response bodies (not in logs, in actual responses) in at least 14 API routes. The pattern was always the same: a configuration check at the start of a handler would find the key missing and return a 500 with the original error message attached, and the Supabase client helpfully included the key reference in that message.
This is not a mistake that surfaces in a code review. It surfaces in git log entries like fix: improve invite email failure handling and diagnostics: commits that were doing the right thing (better error messages) without the experience to notice that the error message itself was the vulnerability.
RLS was recursively broken
The policies on the profiles table referenced the profiles table. A policy evaluates by querying the database. If that query triggers another policy evaluation on the same table, you get infinite recursion. The database kills the query with an error, and the user gets a 500.
The commit history has six entries matching fix(sql): recursion before I arrived. Each one was a patch on a symptom: add an exception here, reorder the policies there, wrap something in a function. None of them touched the structural problem, which was that the policies were designed around a circular reference that PostgreSQL simply cannot resolve.
I fixed it once by removing the circular dependency from the schema. It has not recurred.
This is not something you find by running the app. You find it by knowing how PostgreSQL evaluates RLS policies and recognizing that the pattern in front of you violates that evaluation model.
Rate limiting that did nothing
There was a rate limiter: a Map in memory tracking requests per IP.
On Vercel serverless, each function invocation runs in its own process. Processes do not share memory. The map is empty on every invocation. The rate limiter accepted every request regardless of frequency, because it never saw more than one request from any IP.
This is the kind of issue that is invisible in development (where you test with one browser, one IP, no load) and invisible in production monitoring (no errors, no alerts, everything looks fine). It is only visible to someone who has built rate limiting before and knows the constraints of the deployment environment.
Multi-tenant isolation with gaps
The code had a function called userCanEditChildAsStaff. The logic was correct. A supervisor should only be able to act on children who belong to their clinic.
But it was not called on every endpoint.
Two endpoints I found early: the onboarding merge flow (where a supervisor could merge a child record from any clinic) and the unlink endpoint (which accepted any child_id without checking the caller’s clinic). A supervisor from Clinic A could operate on data from Clinic B.
The function existed. The enforcement did not.
No input validation on POST and PATCH routes
Every route handler that accepted a request body destructured it directly and used the values as-is. No schema validation, no type narrowing, no sanitization. The assumption was that the client sent what the server expected.
// what I found in multiple handlers
const { childId, therapistEmail, role } = await req.json();
// childId, therapistEmail, role are used directly below
In a system where a therapist’s email or a child’s ID is passed to database operations, accepting whatever the client sends without validation is a direct path to injection and privilege escalation. Adding Zod at the boundary is a one-time cost per route; not doing it is a permanent liability.
Missing HTTP security headers
There were no Content-Security-Policy, Strict-Transport-Security, X-Frame-Options, or X-Content-Type-Options headers anywhere. For a healthcare platform, the kind of application that can be framed inside an iframe, that handles session cookies, that loads third-party scripts (Supabase realtime, Google OAuth), this is not optional hardening. It is a baseline.
These do not protect against every attack. But their absence means that entire classes of browser-based attacks (clickjacking, MIME sniffing, cross-site script injection) are not being mitigated by any layer of the stack.
No audit trail for destructive operations
Operations like unlinking a child from a therapist, archiving a patient, or modifying a user’s role were executed without any record of who did what and when. No timestamp, no actor, nothing.
In a system with children’s health data under LGPD, traceability is not a nice-to-have. If a child’s record gets modified incorrectly, there needs to be a way to answer “who changed this, and when?” The absence of an audit trail also means there is no way to detect if someone is abusing elevated access, even if you later discover the access was there to abuse.
What the codebase looked like structurally
Beyond the individual vulnerabilities, there were systemic patterns that told a story about how the code had been generated.
The data layer lived in a single db.ts file of 2,200 lines, every domain mixed together. That is not a design choice, it is what happens when each feature adds a few functions and nobody steps back to ask where those functions belong. Three authentication patterns had ended up coexisting in the route handlers: some used requireAuth, some had a different inline pattern, some called Supabase directly. All of them worked more or less, none were interchangeable, and the inconsistency made auditing painful.
Next.js App Router was being used, but the project followed Pages Router patterns inside it: Client Components everywhere (including layouts with no interactivity), no route groups, no loading.tsx, no error.tsx. The framework’s capabilities were present but unused, which meant a larger bundle and no streaming behavior to speak of.
The supabase/ directory had grown to 69 SQL files with no distinction between schema definitions, development seeds, production seeds, diagnostic queries, point-in-time fixes, and obsolete files. Reconstructing the current schema from scratch meant running them all and hoping they were idempotent. There was no local development environment either: every schema change went directly to production, with no staging, no ordered migrations, and no supabase db reset to validate anything beforehand.
What I did in 28 days
Security: structure over patches
For the RLS recursion, I did not write another policy variant. I mapped the dependency graph of the policies, found the cycle, and removed it by restructuring the schema. GitHub Copilot helped me write the corrected SQL and the tests that validated the fix. The diagnosis was mine.
For multi-tenant isolation, I audited every endpoint against userCanEditChildAsStaff, added the missing calls, and created a test suite (pnpm test:isolation) that runs against a local Supabase instance and validates that a user from one clinic cannot access data from another. If the isolation breaks, the tests fail. That is different from documenting that isolation should exist.
For the service role key, I audited all 14 routes and removed it from every response body. For rate limiting, I’m replacing the in-memory map with a Redis-backed implementation that actually persists across invocations.
I added an admin_audit_log table with immutable entries for destructive operations. In a system with children’s health data, traceability is not a normal feature, it is a requirement. And I added a dedicated ADMIN role, separate from SUPERVISOR, with its own middleware and access surface.
Performance: the problems that only appear with real data
None of the performance issues were visible in development. They required knowing where to look.
The agenda loaded sessions without pagination. PostgREST has a default limit of 1,000 rows and silently truncates anything beyond that. No error, no warning, just missing data. I implemented a function to fetch in pages until the full range was loaded.
Creating a year of recurring sessions (about 52 per child) in a single PostgREST call timed out in production. It worked in development with small seeds. I split it into a function to work with batches, inserting in chunks of 50 sequentially. The insight came from a postmortem: a user who could not save their agenda.
I added a partial unique index on (child_id, session_date, start_time) in the database. The conflict detection existed only in the application layer. Putting it in the database makes it O(log n) and impossible to bypass.
Architecture: following the framework
The App Router migration was systematic. I moved role layouts to Server Components, introduced (authenticated) and (public) route groups, added loading.tsx and error.tsx to every route with data fetching, and applied the colocation conventions (_components/, _lib/ per route). I wrote specs first, got them approved, then implemented.
The result is code that does what the framework’s documentation says it should do: correct SSR, smaller bundle, streaming where it matters, and caching that actually works in production.
Developer experience: building the floor
The most impactful work was the least visible from outside.
I audited all 69 SQL files, categorized each one, and documented the state in docs/proposals/supabase-audit/ before touching anything. Then I adopted the Supabase CLI: pnpm db:start for local containers, pnpm db:reset to reconstruct the schema from scratch, versioned migrations with mandatory headers (description, rollback, idempotency declaration).
I set up local development seeds with realistic synthetic data: therapists, children, sessions for the current week, school data. Inbucket as a local SMTP server so invite emails could be tested without sending anything. A deterministic OpenAI mock so report generation could be developed without a real API key or quota.
Migrating from npm to pnpm
The project used npm. I migrated to pnpm, and immediately the migration surfaced a phantom dependency: @internationalized/date was imported directly in three files but was only declared as a transitive dependency of react-aria-components. npm resolves phantom dependencies silently, it flattens node_modules and the import just works. pnpm does not. With pnpm’s strict isolation, the build failed.
This is the kind of invisible risk that accumulates in any project that has never run under strict dependency resolution. The code looks fine. The tests pass (if there are any). The production build works, today, on this exact lockfile. But the moment someone runs npm ci on a fresh machine, or a CI environment resolves slightly differently, it breaks in a way that takes time to diagnose.
The fix was simple once found: add @internationalized/date as an explicit dependency in package.json. The value of the migration was not pnpm itself, it was the strict module resolution that made the implicit assumption visible.
Biome as the linter and formatter
There was no linter. No formatter. Code style was whatever Cursor had generated, inconsistent across files. Magic numbers in CSS (text-[13px] instead of text-xs from the design system), inconsistent quote styles, unused imports left in place.
I introduced Biome as the single tool for both linting and formatting. One binary, fast, zero config friction. The important part was not choosing Biome over ESLint plus Prettier, but establishing the principle that the codebase has a consistent style that is enforced automatically, not by convention.
The first pnpm lint pass returned hundreds of findings. Most were auto-fixable. A handful were real issues: unreachable code, suspicious equality checks, explicit any types in places where the type was actually knowable. Going through those findings manually was its own audit.
I also extended the same discipline to the SQL layer: scripts/lint-migrations.mjs validates that every migration file has a mandatory header (description, rollback plan, idempotency note) before the commit. The same culture of automated checks that applies to TypeScript should apply to the database schema.
CI/CD: making the implicit explicit
There was no CI/CD pipeline. As the deploy is handled by Vercel, the CI/CD was nonexistent. Also, nothing enforced that pnpm build passed before code reached production. Nothing enforced that a lint was clean. The only gate was the developer’s own discipline.
I set up a GitHub Actions workflow with two jobs:
jobs:
check:
steps:
- run: pnpm lint:ci
- run: pnpm build
test:
steps:
- run: pnpm test
- run: pnpm test:coverage
This is not sophisticated CI/CD. It is the minimum viable gate: a PR cannot be merged if the build breaks or the linter reports errors. It sounds obvious, but without it every shortcut taken under pressure (skipping the lint run, not checking if the build still passes after a quick fix) becomes invisible until it reaches production.
The coverage threshold in vitest.config.ts is part of the same discipline. When coverage of src/lib/ exceeds the threshold, you update the number, it never goes down. That ratchet is what keeps coverage from becoming a metric that sounds good but means nothing.
How I used AI as assistant, not protagonist
The split was always the same: I diagnosed, AI executed.
The RLS recursion was not something I asked AI to find. I found it by reading the schema. AI helped me write the corrected policies and generate the tests around them. I knew what was wrong and what the fix should look like; AI made producing the code faster.
Setting the rules before touching any code
One of the first things I did was write AGENTS.md: a file that lives in the repository root and defines the rules any AI agent must follow when working in this codebase. Not suggestions. Rules.
It covers how API routes must be structured, which authentication pattern is canonical, how to access the database (through db.ts, not by instantiating raw Supabase clients), which component prefix to use, how to handle multi-tenancy, when to write a spec before writing code. Every convention that matters is documented there, with examples.
Without it, every new AI session starts from the patterns already in the codebase. If those patterns include the bad ones (three auth patterns, raw client instantiation, no input validation), the AI replicates them confidently. It has no way to know those are the patterns you are trying to eliminate. It only knows what it sees.
With AGENTS.md, the agent has a contract. It can propose code that is consistent with where the project is going, not where it came from.
I also created skills, focused instruction files for specific workflows. One for writing unit tests with the correct mock setup. One for writing RLS integration tests against a local Supabase instance. One for the spec-driven workflow. One for architecture decisions. Each skill is a document the agent reads at the start of a relevant task, giving it domain context it would otherwise have to infer (or hallucinate).
The agent no longer starts from zero on each session. It starts from a known baseline and does not repeat the mistakes the rules explicitly forbid.
Spec-driven development: plan before execute
For any non-trivial change, the workflow is:
- Trigger the spec generator skill
- The agent produces a spec: what is changing, why, what the acceptance criteria are, what the rollback plan is
- I review and approve the spec
- Only then does the agent begin implementation, with the approved spec as the source of truth
The agent know that it cannot start writing code until the spec is approved, because we have this rule in AGENTS.md. The spec is not a formality. It is a critical step that forces the agent to surface its assumptions and the ambiguity in the task before it starts writing code.
The difference is between asking someone to “fix the auth problem” and handing them a document that says: “the problem is X, the fix is Y, success means tests A, B, and C pass, and if it goes wrong we revert by doing Z.” The spec step is not process for its own sake.
Without the spec step, the agent fills in the blanks. Sometimes it fills them in correctly. Sometimes it fills them in with a workaround that looks like a fix but is not. The spec step forces the ambiguity to surface before the code is written, not after.
The docs/specs/ directory grew to over 100 specs during the 28 days, each with a lifecycle: Draft → Approved → In Progress → Implemented. If a spec is not approved, implementation does not start. That sequencing is what keeps the codebase from accumulating changes that nobody can fully explain.
Knowing when to revert
The commit history has four Revert entries before I joined. Each represents a moment when AI proposed a workaround (use service role in more places to avoid the broken RLS) when the correct answer was to fix the cause. Recognizing the difference between a patch and a fix is engineering judgment, not model output.
The skills and the spec workflow do not eliminate this problem entirely. But they reduce it significantly, because the agent is not guessing what you want, it is implementing a plan you already validated. When the implementation diverges from the plan, that is visible. When the plan itself is wrong, the review step is where you catch it, not after the code is merged.
What vibecoding actually enabled
The person who built the platform made decisions that made my work significantly easier, even when those decisions created problems I had to fix.
Cursor helped them choose Supabase, which gave me RLS as a foundation to fix instead of no access control at all. It helped them build real-time features, so the architecture already supported the user experience. And it helped them ship fast enough that there were real users on the product when I arrived, which meant the problems I found were real problems, not hypothetical ones.
The four Revert commits in the history are more interesting than they look. They are moments where the AI tried something, it did not work, the builder rolled it back, and tried a different direction. That is not a failure, it is exactly how iterative development works. The difference is that a senior engineer recognizes faster which approaches are patches and which ones are fixes, and skips the iteration cycle on the structural problems.
What vibecoding is not yet good at is the kind of knowledge that comes from having been burned: knowing that in-memory rate limiting fails on serverless, that RLS policies cannot self-reference without recursion, that PostgREST truncates results silently, or that the framework has a convention the generated code is quietly violating.
That knowledge is not in the training data for “write me a rate limiter.” It is in the experience of having debugged a system where the rate limiter did nothing, and having to explain to someone why their working code was not protecting anything.
So where does this leave us
What worked was a sequence: a non-developer using AI to build something real at impressive speed, then an experienced engineer using AI to fix, harden, and extend it. Neither approach alone would have produced the same result. Without the engineering work, the vibecoded version would have stayed in production with critical vulnerabilities and the kind of debt that compounds. Without the vibecoded foundation, the engineering work would have started from scratch and those users would have waited a lot longer.
So yes, vibecoding can work. But it needs a reviewer who has seen enough broken systems to read code before running it, recognize a structural problem from a symptom, and know when a workaround is making the actual fix harder. That reviewer does not have to be the person building. In this case it was not. But they have to exist, they have to have real access to the codebase, and they have to look before shipping to production with real users.
Without that, the speed is real and the vulnerabilities are real and there is nobody to catch the difference.
This article, images or code examples may have been refined, modified, reviewed, or initially created using Generative AI with the help of LM Studio, Ollama and local models.