ADNT Experience Report: TDD/BDD on Raspberry Pi Pico with AI Assistance

By ADNT Sàrl & Florian Mahon (florian.mahon@adnt.io) (firmware engineering / embedded systems — RP2040)

Over the past few months, I’ve been working on a bootloader in Rust for the RP2040. But the real challenge wasn’t the bootloader itself — it was figuring out how to use AI agents in my day-to-day embedded work without things going sideways. Because the traps are everywhere: the AI generates too much code, builds things you didn’t ask for, and before you know it, you’re stuck refactoring instead of shipping.

I tried the usual approaches first — detailed ticket workflows, exhaustive specs. Every time, same story: tickets pile up, codebase balloons, you lose sight of the goal… and you end up with technical debt that slows you down more than the AI speeds you up. I was spending my time babysitting the agent instead of delivering features.

So I changed my approach. I decided to work as if I were pair-programming with a colleague, following a simple agile discipline:

small steps, testable goals, tests first, short branches, review, integrate.

This article is about sharing what I learned. The project is open source, downloadable, and testable on a Raspberry Pi Pico — you can try it yourself.

To be clear: the point was never to “build a bootloader” for fun. The bootloader is a technical building block serving something bigger: a platform for TDD/BDD firmware on real hardware, with the same fluidity you get in regular software development.

The first building block is Crispy-bootloader: an A/B bootloader for RP2040 that lets you iterate on firmware like you iterate on software, with hardware as the ultimate judge. It opens the door to BDD scenarios on board, boot sequence non-regressions, integration tests on real behavior, and eventually, an automated test farm.

1. Minimal Roadmap: Three Requirements

Before writing a single line of code, I set three requirements:

Dual A/B bank so we can roll back if something breaks.
Firmware execution in RAM for low jitter and more deterministic behavior.
Respect the RP2040 boot sequence (ROM → boot2 → application).

On the RP2040, a second stage bootloader (boot2) runs before the application to initialize QSPI flash access (XIP) 4. Crispy slots in right after: it picks a bank, copies the firmware to RAM, and hands over execution.

flowchart TD
    A[RESET] --> B[Boot ROM]
    B --> C[boot2]
    C --> D[Crispy Bootloader]
    D --> E[Firmware in RAM]

2. From LED Blink to Instrumentation

Like any self-respecting embedded project, I started by blinking an LED. Rust base, VSCode + Pico SDK tooling, compile, flash, blink. Great — the environment works. But it’s not enough: that workflow needs me glued to the screen the whole time.

So I quickly moved to something more automatable:

logs via defmt (and RTT),
scripts to drive execution,
and as soon as possible, integration tests.

That’s when the AI started being genuinely useful. I wasn’t throwing prompts into the void anymore — I was giving it real context, and most importantly, access to the debug chain (probe + probe-rs/GDB) so we could observe the actual system state together 3.

From there, progress happened in short iterations:

Minimal boot + minimal firmware to validate that it starts.
BDD tests to verify behavior.
An upload tool and a communication protocol.
Extend the bootloader — just enough to support those tests.
Clean up, consolidate, and go again.

3. Minimal Chain: Bootloader + Firmware

My earlier attempts had taught me one thing: let the AI run wild and you’ll end up with a huge, unmanageable codebase. So I deliberately broke things into small complete steps — each covering the necessary layers, but testable at every increment.

The first step was getting:

a minimal bootloader,
a minimal Rust firmware,
observable feedback (defmt/RTT),
and scripts to validate everything works.

This had two immediate effects:

I quickly had a testable system from my workstation;
the AI stopped being a “code generator” and became a real discussion partner, grounded by measurable results.

From that point on, no more three-paragraph prompts. The test loop spoke for itself, and the AI could suggest changes I’d validate on target right away.

4. Tooling and Consolidation

Once the minimal chain was solid, I built the tools to make the build → upload → boot → validation loop reproducible:

A CLI (crispy-upload) to:

push binaries to the banks,
configure the bootloader,
and automate tests.

Protocol consolidation:

added integrity checking (CRC) on firmwares,
formalized the protocol,
and built an equivalent Python library for certain scenarios.

At this point, we had a path that was truly minimal, stable, and testable: bootloader → copy to RAM → execute → observable behavior. The Rust firmware served as a reference: small, deterministic, perfect for stabilizing the chain without the complexity of a full SDK.

5. Refactoring and Debug with probe-rs

With tooling in place and solid test coverage, I could refactor with confidence:

shared structures between firmware and bootloader,
common libraries,
general code cleanup.

In parallel, I set up a reliable debug environment in VSCode with probe-rs.

Unexpected bonus: contributing back to probe-rs

This phase threw me a curveball: the firmware worked fine (tests proved it), but debugging kept crashing. Frustrating — the target is healthy, but the tooling can’t keep up.

First issue: by combining tests and instrumentation, we tracked down a bug in probe-rs itself. Working with the AI on a minimal repro, we ended up submitting an upstream fix (PR 6). That stabilized the workflow.

Second issue: even after the fix, debugging firmware running from RAM was a pain. Hardware breakpoints wouldn’t stick (known Cortex-M0+/FPB limitation with relocated code). With the AI’s help, we implemented software breakpoints in a modified probe-rs 2. It’s a lightweight implementation — not submitted upstream — but it unblocked the workflow and seriously sped up iterations.

Takeaway: even dev tools deserve proper investigation. The same rigorous approach (tests + repro + instrumentation) works just as well when the problem is in your tooling.

6. From Rust to C++: Cross-Validation

With a stable, debuggable base in hand, the next step was verifying compatibility with the Pico SDK C++ ecosystem. The rule was simple:

the SDK stays standard — no patches. We’re not maintaining an SDK fork just for our bootloader.

The existing tests proved their worth immediately: they revealed a hardware initialization issue (PLL/timers/clocks) where the bootloader was leaving hardware in a state the Pico SDK didn’t expect.

Same method as always:

tests → repro,
instrumentation → observation,
AI → targeted hypotheses and experiments,
fix → non-regression.

In less than a day, we had a minimal C++ firmware running on the standard SDK.

7. Service Architecture

Then came a major refactoring: structuring the bootloader into “services”. This wasn’t about architecture for architecture’s sake — it was a practical need.

Determinism: make the main loop explicit and predictable.
Readability: each responsibility in its own module.

 let services = [
        ServiceType::UsbTransport(UsbTransportService::new()),
        ServiceType::Trigger(TriggerCheckService::new()),
        ServiceType::Update(UpdateService::new()),
        ServiceType::Led(LedBlinkService::new()),
    ];

This restructuring also surfaced a need we hadn’t anticipated: a RAM staging area during upload, before writing to flash. Makes sense when you think about it — flash constraints (XIP, interrupts, multicore) documented in the Pico SDK 4 pretty much demand it.

8. Making AI Useful Without Losing Control

If I had to boil this project down to one lesson, it’d be this:

AI becomes useful when it’s constrained by tests and fed with real data.

Short prompts, real data

Over the course of the project, I landed on a prompt format that works every time: measurable facts instead of long explanations.

Instead of writing three paragraphs of context, I give:

Goal: what I want to achieve (one sentence)
Repro: 4 to 6 steps to reproduce the problem on target
Observed: what happens vs. what should happen, with logs
Instrumentation: probe-rs / GDB data (PC, SP, registers, memory dump)
Ask: 3 ranked hypotheses, each with a minimal test

This format didn’t appear overnight — it emerged naturally. But once it clicked, collaboration became truly methodical: hypotheses → experiments → validation on target → non-regression.

Conclusion

Crispy isn’t just “an A/B bootloader”. It’s the foundation for TDD/BDD firmware on real hardware, with a fast and reliable iteration loop — like regular software development, but with hardware truth.

What this project really taught me isn’t a technical trick. It’s a method that keeps you effective — even when you’re working with an AI agent.

Five tips for using AI in embedded development

Build an end-to-end testable chain from day one Before expanding scope, establish the full flow: build → upload → boot → validate on target. This backbone makes every iteration measurable and keeps drift in check.
Give the AI something to test, not just something to generate AI is most useful when it can work from facts: test outputs, logs, measurements, observable states. The more concrete the context, the better its suggestions.
Feed your prompts with real data No need for walls of text. A failing scenario, repro steps, logs, instrumentation results — that’s enough. And it converges much faster.
Keep a structured development flow Short branches, systematic reviews, frequent integration. AI speeds up production, but without discipline, speed turns into technical debt.
Treat your tests as a first-class deliverable Tests are the source of truth. If they’re flaky or incomplete, everything else crumbles — including the AI’s contributions. Invest in reliable, reproducible, behavior-driven tests on target.

What’s next?

This project let me validate an agile approach tailored to embedded, and integrate AI into a workflow that actually delivers. The chain is in place: bootloader, firmware, tools, tests, debug.

With these foundations, we can take on more complexity — new firmwares, new building blocks, continuous integration on hardware — while keeping the AI as an accelerator, without losing control.

Next up: hardening and industrializing all of this. Runners, broader integration tests, and extending to fast communication buses.

References

Crispy Repo (README, structure, Quick Start, tests, AI philosophy) https://github.com/ADNTIO/crispy-bootloader-rp2040-rs
crispy-upload Documentation (note on modified probe-rs + software breakpoints, install via make install-probe-rs) https://docs.rs/crate/crispy-upload/0.2.0
probe-rs (debug toolchain — CLI/DAP/GDB server, etc.) https://github.com/probe-rs/probe-rs https://probe.rs/docs/
Pico SDK docs — hardware_flash (XIP/IRQ/cores constraints) + PICO_NO_FLASH mention https://www.raspberrypi.com/documentation/pico-sdk/hardware.html
rp2040-boot2 (second stage bootloader — boot2 and image placement) https://github.com/rp-rs/rp2040-boot2
probe-rs PR (fix related to PicoProbe / debug workflow) https://github.com/probe-rs/probe-rs/pull/3810
probe-rs Fork (modified version with software breakpoints for RAM firmware debug) https://github.com/fmahon/probe-rs