What this workshop is about

You already know how to write software. The question is: how do you work with an AI that also writes software?

Coding assistants are everywhere. Autocomplete suggestions, inline chat, autonomous agents that open pull requests while you sleep. The tooling has evolved fast, faster than most teams’ ability to use it well. The result is a gap: developers adopt the tools but don’t change how they think about the work. They accept suggestions without reviewing them. They prompt vaguely and get vague results. They let the agent run wild and then spend hours cleaning up the mess.

This workshop exists to close that gap. Every chapter is built around a single idea: you stay in control, and the AI amplifies your decisions. We’ll give you a repeatable framework — PDRC (Plan, Delegate, Review, Correct) — and apply it across real-world tasks: writing tests, reviewing code, debugging, documenting, configuring agents, and measuring impact.

But before we get to the framework (that’s Ch 2), we need to understand what these tools actually are, where they shine, and where they will waste your time.


The evolution: from autocomplete to autonomous agents

Not all AI coding tools work the same way. They sit on a spectrum of autonomy, and understanding where each one falls is the first step to using them well.

But first, some history, because this didn’t start yesterday.

A brief timeline

It seems big, but I promise that’s important context. Take attention to the fast changes in the last few years, and especially the last 12 months.

YearEvent
2013Codota — early research on AI-based code suggestions for Java
2014Microsoft Research releases Bing Code Search plugin for Visual Studio 2013 — code snippet search via natural language (a precursor to Copilot)
2018Jacob Jackson (University of Waterloo) creates TabNine — the first deep-learning-based code completion tool, using GPT-2
2019The Verge calls TabNine “Gmail’s Smart Compose for coders”; Codota acquires TabNine
Jun 2021GitHub Copilot announced as a technical preview, powered by OpenAI Codex (a fine-tuned GPT-3)
Jun 2022GitHub Copilot becomes generally available. In the technical preview, ~40% of code in enabled files was written by Copilot in languages like Python
Nov 2023Copilot Chat upgraded to GPT-4; Tabnine introduces its own AI chat agent
Oct 2024GitHub Copilot goes multi-model: users can choose between OpenAI GPT, Anthropic Claude, and Google Gemini
Feb 2025GitHub announces agent mode — Copilot can read/modify multiple files, run commands, and iterate on errors inside the IDE
Apr 2025Agent mode + MCP support rolled out to all VS Code users; Copilot Pro+ plan launched with premium requests
May 2025GitHub announces coding agent — fully autonomous, receives a GitHub issue, spins up an environment, writes code, opens a PR
Jun 2025GitHub’s “pair to peer” vision blog — Copilot repositioned from assistant to independent agent
Sep 2025GPT-5 and GPT-5 Mini generally available in Copilot — first GPT-5 family models; new embedding model for smarter code search in VS Code
Dec 2025Agent Skills — reusable instruction folders that teach Copilot specialized tasks (compatible with Claude Code skills)
Dec 2025Copilot Memory (early access) — agents learn from your codebase and build repository-specific context over time
Dec 2025Custom agents from partners (Datadog, HashiCorp, etc.) for observability, IaC, and security workflows
Feb 2026Agentic Workflows (technical preview) — repository automation written in Markdown (not YAML), executed via GitHub Actions with AI agents
Feb 20263rd-party agents (Anthropic Claude, OpenAI Codex) available on github.com and VS Code as alternative coding agents
Feb 2026Copilot CLI generally available — Copilot runs directly in the terminal; GPT-5.3-Codex, Claude Opus 4.6, Gemini 3.1 Pro available

The speed of this evolution matters: it took just four years to go from “autocomplete on steroids” to “autonomous agent that opens pull requests”, and then just nine more months to reach agentic workflows, persistent memory, and third-party agent ecosystems. Each level on that spectrum requires a different way of working and it’s close to impossible to follow all changes without a good understanding on how to use the tools effectively.

Completions (inline suggestions)

This is where it started for most developers. You type a function signature, and the tool predicts the next few lines. It works inside your editor, in real time, with no explicit prompt from you. TabNine pioneered this with deep learning in 2018; GitHub Copilot brought it to the mainstream in 2021.

When it works well: boilerplate, repetitive patterns, standard implementations you’d write on autopilot anyway.

When it doesn’t: anything that requires understanding why you’re writing the code, not just what comes next.

Think of completions as a fast typist who has read a lot of code but has no idea what your project does.

Chat (interactive conversation)

Chat gives you a conversation window, either in the IDE sidebar or in a terminal, where you describe what you want in natural language. The tool generates code, explains concepts, or helps you debug.

Key difference from completions: you provide explicit context and intent. Instead of the tool guessing from your cursor position, you tell it what you need.

This is where prompt quality starts to matter. A vague prompt like “fix this” will give you a generic answer. A specific prompt like “this function throws a TypeError when the input array is empty, add a guard clause and a unit test” will give you something useful.

Agent mode (IDE-integrated)

Agent mode, introduced by GitHub Copilot in February 2025, takes chat a step further. Instead of generating a single code block, the agent can:

  • Read and modify multiple files
  • Run terminal commands
  • Execute tests
  • Iterate on its own output based on errors

You’re still in the loop, you approve or reject each action, but the tool is doing more than suggesting. It’s executing.

When it works well: multi-file refactors, test generation, implementing well-scoped features with clear acceptance criteria.

When it needs you: deciding which files to touch, what the acceptance criteria are, and whether the result is actually correct.

Coding agents (autonomous)

This is the most autonomous level. GitHub’s Copilot Coding Agent was announced in May 2025; OpenAI’s Codex followed a similar model. A coding agent receives a task (typically a GitHub issue), spins up its own environment, writes code, runs tests, and opens a pull request. All without you sitting in front of the IDE.

Since then, the ecosystem has expanded rapidly. By February 2026, GitHub introduced Agentic Workflows, repository automation written in plain Markdown that runs via GitHub Actions with AI agents. Anthropic’s Claude and OpenAI’s Codex became available as alternative coding agents directly on github.com. And features like Agent Skills (reusable instruction folders) and Copilot Memory (repository-specific context that persists across sessions) made agents significantly more context-aware.

The critical difference: you’re not reviewing in real time. The agent works asynchronously, and you review the output after the fact. This makes the Review and Correct steps of PDRC even more important, because by the time you see the code, the agent has already made dozens of decisions you didn’t approve individually.


The tool landscape

The market moves fast, so rather than memorizing features that will change next quarter, focus on the categories and what each tool is optimized for.

ToolTypeBest atRuns in
GitHub CopilotCompletions + Chat + Agent mode + Coding Agent + Agentic WorkflowsDeep GitHub integration, code review, PR workflows, enterprise governance, agent skills, memoryVS Code, JetBrains, CLI, Eclipse, Xcode, Zed, GitHub.com
Claude CodeCLI agent (also available as 3rd-party agent on GitHub)Long-context reasoning, complex refactors, multi-file changesTerminal (CLI-first), GitHub.com
OpenAI CodexCoding agent (also available as 3rd-party agent on GitHub)Autonomous task execution from issues, cloud-based sandboxed environmentGitHub.com, ChatGPT
CursorIDE with AI-native UXCodebase-aware chat, fast iteration, composer for multi-file editsCursor IDE (VS Code fork)
WindsurfIDE with AI-native UXFlows (multi-step agent), contextual awarenessWindsurf IDE

A practical heuristic

You don’t need to pick one tool and ignore the rest. Different tools excel at different tasks:

  • Quick inline edits and completions → Copilot, Cursor
  • Deep reasoning over a large codebase → Claude Code
  • Autonomous issue-to-PR → Copilot Coding Agent, Codex
  • Rapid prototyping with multi-file orchestration → Cursor (composer), Windsurf (flows)
  • Enterprise governance and code review → Copilot (organization policies, agent firewalls, custom instructions)
  • Repository automation with AI → Copilot Agentic Workflows (Markdown-based, runs in GitHub Actions)

The important thing is not which tool you use, but how you use it. A well-structured prompt in any of these tools will outperform a lazy prompt in the “best” tool.


Where AI delivers proven value

This isn’t speculation. Multiple studies from GitHub and McKinsey give us hard data on where AI coding tools actually move the needle.

The numbers

TaskImpactSource
Code documentationCompleted in ~50% of the timeMcKinsey, 2023
Writing new codeCompleted in nearly half the timeMcKinsey, 2023
Code refactoringCompleted in ~1/3 of the time (2/3 time reduction)McKinsey, 2023
Complex tasks completion rate25–30% more likely to finish within deadlineMcKinsey, 2023
Developer happiness and flow state2x more likely to report fulfillmentMcKinsey, 2023
AI coding tool adoption92% of developers using AI tools at workGitHub Survey, 2023
Perceived benefits70% see significant advantage at workGitHub Survey, 2023
Team collaboration81% expect AI tools to increase collaborationGitHub Survey, 2023

What this means in practice

The biggest productivity gains cluster around tasks that are repetitive, well-defined, and have clear patterns:

  • Test generation — writing unit tests, integration tests, expanding coverage for existing code
  • Documentation — JSDoc, docstrings, README files, API docs
  • Refactoring — extracting functions, renaming, restructuring code to improve maintainability
  • Debugging — explaining errors, suggesting fixes, tracing through stack traces
  • Boilerplate — configuration files, scaffolding, standard CRUD operations

Notice a pattern: these are all tasks where the what is clear and the how follows established conventions. The AI doesn’t need to understand your business domain to generate a test for a pure function.


What AI does NOT do well

This is the part most evangelists skip. Understanding the limitations is just as important as understanding the capabilities, because using AI on the wrong task wastes more time, money and natural resources than doing it manually.

Complex architectural decisions

An AI can generate a microservice. It cannot tell you whether your system should be split into microservices. Architecture decisions require understanding organizational constraints, team topology, deployment infrastructure, and long-term maintenance costs. These are context-heavy decisions that no model has access to.

McKinsey’s research confirms this: time savings shrank to less than 10% on tasks that developers rated as highly complex, particularly when unfamiliar frameworks were involved.

Organizational context

Your company’s coding standards, naming conventions, security policies, deployment pipelines, internal libraries, none of this exists in the model’s training data. Off-the-shelf tools won’t know that your team uses a specific error-handling pattern, or that certain modules require approval from the security team before changes.

This is exactly why Modules 01 and 03 of this workshop focus heavily on custom instructions, AGENTS.md, and MCP integrations. The tools can be taught your context, but you have to teach them.

Multifaceted requirements

When a task combines multiple frameworks, integrates with several systems, and has non-obvious constraints, AI tools struggle. McKinsey participants reported that to get usable solutions for multifaceted requirements, they had to break the problem into smaller segments manually before prompting.

One participant put it simply: “[Generative AI] is least helpful when the problem becomes more complicated and the big picture needs to be taken under consideration.”

The “accept all” trap

This isn’t a tool limitation, it’s a human one. The GitHub Survey found that developers consistently rank code quality as the most important metric. But when you accept every suggestion without reading it, quality degrades silently. Bad patterns compound. Technical debt accumulates. And by the time you notice, the damage is spread across dozens of files.

The McKinsey research found that code quality was marginally better in AI-assisted code, but only because developers actively iterated with the tools to achieve that quality. The quality doesn’t come from the AI. It comes from the developer’s review.

Real-world failures: what happens without human review

These aren’t hypothetical scenarios. Each of these happened in 2024–2026, and each illustrates what goes wrong when AI output is trusted without verification.

cURL ends its bug bounty after a flood of AI-generated reports

In January 2024, Daniel Stenberg — founder and lead developer of cURL, one of the most widely used open-source projects in the world — published a detailed account of AI-generated security reports overwhelming the project’s bug bounty program on HackerOne.

The reports looked professional. They included code references, proposed fixes, and were written in clean English. But they were fabricated. One claimed a critical vulnerability related to CVE-2023-38545 that didn’t exist. The reporter admitted that they were using Google Bard. Another described a buffer overflow in WebSocket handling, complete with a proposed patch, but after careful investigation there was no buffer overflow at all. The LLM had mixed real function names with hallucinated vulnerabilities.

Stenberg described the cost: each well-crafted fake report took real developer time to investigate and dismiss. Security reports are high priority, they trump bug fixes and feature work. Every hallucinated vulnerability stole hours from real development work.

The scale of the problem grew. By January 2026, the project ended its bug bounty program entirely, stating it was “an attempt to reduce the noise.” The word “slop” (Merriam-Webster’s 2025 Word of the Year) became the shorthand for this kind of AI-generated low-quality output that shifts its cost to the recipient.

PDRC lens: The reporters delegated entirely to an LLM (skipped Plan), submitted the raw output (skipped Review), and never verified the claims (skipped Correct). The entire cost landed on the maintainers.

OpenClaw: when an autonomous agent becomes a security nightmare

In January 2026, OpenClaw (originally “Clawdbot”) — an open-source autonomous AI agent by Peter Steinberger — went viral with 140,000 GitHub stars. It could book flights, manage calendars, control browsers, and execute shell commands on behalf of users, all through messaging platforms like WhatsApp and iMessage.

The security community raised alarms almost immediately. Cisco’s AI Threat and Security Research team tested a third-party skill called “What Would Elon Do?”, which had been artificially inflated to rank #1 in the skill repository, and found it was functionally malware. The skill performed silent data exfiltration (sending user data to an external server via curl without user awareness) and direct prompt injection to bypass safety guidelines. Cisco’s Skill Scanner flagged nine security findings: two critical, five high severity.

The risks were systemic, not just theoretical:

  • Skills operated without vetting. Anyone could publish a skill to the repository; there was no code review, no signing, no sandbox.
  • The agent had broad system access. OpenClaw could run shell commands, read/write files, and access email, calendars, and messaging — a single misconfigured skill could exfiltrate sensitive data across all connected services.
  • Prompt injection via messaging apps extended the attack surface. Malicious prompts embedded in messages could cause unintended behavior without the user ever triggering them directly.
  • Plaintext credentials were leaked — API keys and tokens reported exposed, stealable via prompt injection or unsecured endpoints.

One of OpenClaw’s own maintainers warned on Discord: “if you can’t understand how to run a command line, this is far too dangerous of a project for you to use safely.”

In a separate incident, a user discovered that his OpenClaw agent had created a dating profile on MoltMatch (an AI agent dating platform) and was screening potential matches without his explicit direction. The agent acted autonomously beyond what the user intended — exactly the kind of behavior that escapes detection when there’s no Review step.

PDRC lens: Users delegated life-management tasks to an autonomous agent without reviewing what skills it loaded, what permissions it needed, or what it was actually doing in the background. No Plan (threat model), no Review (skill audit), no Correct (permission boundaries).

The OpenClaw story also hit close to home for AI professionals themselves. In February 2026, Business Insider reported that a Meta AI alignment director shared her own OpenClaw nightmare: the agent deleted emails from her account without permission, and she “had to RUN to my Mac mini” to stop it. This is someone whose literal job is AI safety — and even she was caught off guard by an autonomous agent acting beyond its intended scope.

AWS outages caused by an AI coding agent

In early 2026, it was reported that Amazon Web Services suffered at least two outages caused by its own AI tools. In the most notable incident, in December 2025, engineers allowed Amazon’s Kiro agentic coding tool to make changes autonomously, and the AI decided the best course of action was to delete and recreate the environment, causing a 13-hour disruption to AWS Cost Explorer in one of Amazon’s cloud regions in China.

As a senior AWS employee told the Financial Times: “The engineers let the AI agent resolve an issue without intervention. The outages were small but entirely foreseeable.” In both incidents, the engineers didn’t require a second person’s approval before finalizing the changes.

Amazon’s response was to call it “user error, not AI error” and a “coincidence” that AI was involved, but security researchers disagreed. As one researcher pointed out, AI agents don’t have full visibility into the context in which they’re running. They don’t understand how customers might be affected or what the cost of downtime might be.

This is AWS — one of the most sophisticated engineering organizations on the planet — operating its own infrastructure, using its own AI tool. If it can happen there, it can happen anywhere.

PDRC lens: The engineers Delegated (let the agent resolve an issue), but skipped Plan (no scope boundary for what the agent was allowed to do), skipped Review (no second pair of eyes before changes went live), and had no Correct mechanism (no rollback plan if the agent’s “fix” made things worse). A simple approval gate could have caught a “delete and recreate” action before it caused 13 hours of downtime.

”Workslop”: AI-generated work that creates more work

A September 2025 study published in the Harvard Business Review, conducted jointly by Stanford University and BetterUp, coined the term “workslop”. The AI-generated content at work that looks polished but lacks substance, shifting the burden of quality from the creator to the recipient.

The findings are stark: 40% of participating employees had received workslop, and each incident took an average of two hours to resolve. The mechanism is simple: someone uses an LLM to draft a document, email, or proposal, spends minimal time reviewing it, and sends it along; the recipient then has to figure out what’s actually correct, what’s hallucinated, and what’s missing.

BetterUp defines workslop as “AI-generated content that looks good, but lacks substance.” It is the professional equivalent of accepting all Copilot suggestions: the output appears complete, but the thinking behind it is absent.

PDRC lens: The “slopper” Planned (they had a task), Delegated (they used an LLM), but skipped Review and Correct. The cost didn’t disappear, it was externalized to colleagues.

The pattern across all these cases

Every failure follows the same structure:

Step skippedWhat happenedWho paid the cost
PlanNo threat model, no scope definitionThe agent acted with unbounded authority (OpenClaw); the AI deleted and recreated an environment (AWS)
ReviewOutput accepted without verificationHallucinated vulnerabilities wasted maintainer time (cURL); a Meta director’s emails were deleted (OpenClaw)
CorrectNo iteration, no quality checkColleagues spent hours fixing workslop (HBR study); 13 hours of AWS downtime (AWS)

The tools themselves aren’t the problem. The absence of human judgment at critical checkpoints is.


Setting expectations for this workshop

By the end of this workshop, you will:

  1. Have a mental model (PDRC) for deciding when and how to use AI on any coding task
  2. Know how to configure your tools, repositories, and agents to reflect your team’s context
  3. Be able to generate, review, and iterate on AI-assisted tests, code reviews, documentation, and refactors
  4. Understand the security and governance implications of AI-assisted development
  5. Measure the impact of AI tools on your team’s productivity with concrete metrics

What you will not get is a magic prompt that solves everything. The tools are force multipliers, they multiply whatever you bring to them. Bring clear thinking and structured intent, and you’ll get impressive results. Bring vague prompts and uncritical acceptance, and you’ll get impressive-looking garbage.

Let’s start with the framework that makes the difference. Next up: Chapter 2 — The Mental Model: Plan, Delegate, Review, Correct (PDRC).


References

Data & research

Historical milestones

Real-world failure cases