Posted on :: Tags: , , , , ,

ADNT Experience Report: TDD/BDD on Raspberry Pi Pico with AI Assistance

By ADNT Sàrl & Florian Mahon (florian.mahon@adnt.io) (firmware engineering / embedded systems — RP2040)


Over the past several months, I've been developing a bootloader in Rust for the RP2040 platform and, more importantly, seeking to integrate AI agents into my development workflow without falling into the classic pitfalls: verbosity, code over-production, "off-topic" implementations, and perpetual refactoring.

I experimented with several approaches (including highly detailed "tickets/stories" workflows and more "exhaustive specification" methods). Each time, I observed the same drift: ticket accumulation, rapid code growth, difficulty maintaining a clear direction... then technical debt that was hard to resolve. In the end, I spent more time "piloting the agent" and searching for areas to refactor than delivering useful features.

For this project, I deliberately changed my approach: I proceeded as if I were developing with a small team (two people), applying a simple and disciplined method (AGILE):

small steps, testable objectives, tests first, short branches, review, integration.

The goal of this article is to share this approach, its results, and practical advice for integrating AI into embedded development without losing rigor. The project is public, downloadable, and testable on a Raspberry Pi Pico.

The goal wasn't to "build a bootloader" for fun. The bootloader is a technological building block serving a larger objective: building a platform enabling TDD/BDD firmware on real hardware, with the same agility as classic software development but validated on hardware.

The first building block is Crispy-bootloader, an A/B bootloader for RP2040 that enables firmware iteration like software iteration, with hardware truth. It paves the way for BDD scenarios on board, non-regressions on boot sequences, integration tests validating real behaviors, and eventually, an automated test farm.


1. Minimal Roadmap: Three Requirements

Before even coding, I drafted a minimal roadmap, guided by three requirements:

  1. Dual A/B bank to enable rollback.
  2. Firmware execution in RAM to maintain low jitter and more deterministic behavior.
  3. Respect for the boot sequence imposed by the RP2040 (ROM → boot2 → application).

On the RP2040, a second stage bootloader (boot2) is executed before the application to initialize QSPI flash access (XIP) [4][5]. Crispy inserts itself there, chooses a bank, copies the firmware to RAM, and transfers execution.

flowchart TD A[RESET] --> B[Boot ROM] B --> C[boot2] C --> D[Crispy Bootloader] D --> E[Firmware in RAM]

I started with a Rust base from an initial project (generated by VSCode pico sdk tooling), compiled a first version, and validated a simple behavior (LED blink). This was useful for validating the environment, but insufficient: this workflow required my constant presence in the loop.

I therefore quickly switched to a more automatable loop:

  • logs via defmt (and RTT),
  • scripts to drive execution,
  • and, as soon as possible, integration tests.

From this stage, when I solicited the AI, I provided it with exploitable context and especially access to a debug chain (probe + probe-rs/GDB) to observe the real system state and confirm/refute hypotheses [3].

Progress was made in short iterations:

  1. Implement minimal boot and minimal firmware to validate the boot.
  2. Set up BDD tests to validate behavior.
  3. Implement an upload tool and minimal communication protocol.
  4. Extend the bootloader to the strict minimum needed to support these tests.
  5. Consolidate / clean / refactor, then iterate.

3. Minimal Chain: Bootloader + Firmware

My previous attempts had taught me one thing: letting the AI "go too far" quickly leads to a large, difficult-to-maintain codebase. I therefore deliberately broke things down into small complete steps, covering the necessary layers, but testable at each increment (unit + integration/BDD).

The first step consisted of obtaining:

  • a minimal bootloader,
  • a minimal Rust firmware,
  • observable feedback (defmt/RTT),
  • and scripts to validate behavior.

This choice had two immediate effects:

  1. I quickly had a testable system from a dev workstation;
  2. the AI ceased being a "code generator" and became an architecture discussion partner, constrained by measurable feedback.

From there, prompts no longer needed to be long: the test loop provided the information, and the AI could propose modifications that I validated immediately on target.


4. Tooling and Consolidation

Once the minimal chain was stable, I developed the necessary tooling to make the build → upload → boot → validation loop reproducible:

CLI Tool (crispy-upload) for:

  • uploading binaries to banks,
  • configuring the bootloader,
  • and automating the test flow.

Protocol Consolidation:

  • implementation of a verification mechanism (CRC) to validate firmware integrity,
  • formalization of the minimal protocol,
  • creation of an equivalent Python library to facilitate certain scenarios/tooling.

At this stage, the path was truly minimal, stable and testable: bootloader → copy to RAM → execution in RAM → observable behavior. The Rust firmware played the role of reference firmware: small, deterministic, and ideal for stabilizing the chain without adding the assumptions of a complete SDK.


5. Refactoring and Debug with probe-rs

Once the tooling (Rust + Python) and test base were in place, I refactored:

  • sharing information and structures,
  • creating common libraries between firmware and bootloader,
  • code cleanup to improve readability.

In parallel, I structured the VSCode development environment to have a reliable debug loop with probe-rs.

Unexpected Side Effect: Contributing to probe-rs Thanks to Good Instrumentation

This phase confronted me with an unexpected challenge: the firmware worked (tests proved it), but debugging remained unstable. A frustrating situation where the target is healthy, but the tooling doesn't keep up.

First obstacle: By combining tests and instrumentation, we isolated a problem on the probe-rs side itself. Working with the AI on a minimal repro, the investigation led to an upstream fix (probe-rs PR [6]). This contribution stabilized the workflow.

Second obstacle: Even after this fix, the firmware executed in RAM remained difficult to debug. Hardware breakpoints didn't apply correctly (known limitations of Cortex-M0+/FPB with relocated code). With the AI's help, we implemented software breakpoints in a modified version of probe-rs [2][7]. This modification, although not proposed upstream (deliberately lightweight and efficiency-oriented implementation), unblocked the workflow and greatly accelerated iterations.

Learning: even development tools can require investigation. The rigorous methodology (tests + repro + instrumentation) also applies to identifying problems outside the project itself.


6. From Rust to C++: Cross-Validation

With this stable and debuggable base, the next step was to validate compatibility with the Pico SDK C++ ecosystem by adding a minimal firmware based on the Pico SDK. The constraint was non-negotiable:

the SDK must remain standard, without patches, to maintain a maintainable project (and compatible with "no-flash"/RAM-preloaded execution).

Reusing existing tests was decisive: it quickly revealed a hardware initialization problem (PLL/timers/clocks) related to the overlap between the state left by the bootloader and the Pico SDK firmware initialization assumptions.

Here again, the method was the same:

  • tests → repro,
  • instrumentation → observation,
  • AI → minimal hypotheses and experiments,
  • fix → non-regression.

This phase enabled, in less than a day, stabilizing a minimal C++ firmware working with the standard SDK.


7. Service Architecture

A major bootloader refactoring then consisted of structuring the code into "services". The main motivation wasn't aesthetic: it was functional.

  • Determinism: making the main loop explicit, predictable and stable (controlled action order).
  • Readability: isolating responsibilities in dedicated modules/services.
 let services = [
        ServiceType::UsbTransport(UsbTransportService::new()),
        ServiceType::Trigger(TriggerCheckService::new()),
        ServiceType::Update(UpdateService::new()),
        ServiceType::Led(LedBlinkService::new()),
    ];

This organization also highlighted the need to go through a RAM staging area during upload before flash persistence — which aligns with flash constraints (XIP, interrupts, multicore) documented in the Pico SDK [4].


8. Making AI Useful Without Losing Control

The most important lesson from this project isn't "AI writes code". It's:

AI becomes useful when constrained by tests and fed with real observables.

Structuring Prompts with Real Observables

Throughout the project, I converged toward an approach that proved consistently effective: feeding the AI with measurable facts rather than lengthy descriptions.

Concretely, instead of verbose prompts, I provide:

  • Objective: expected behavior (short, precise)
  • Repro: 4 to 6 steps maximum, reproducible on target
  • Observed: symptoms + logs + what's missing or differs
  • Instrumentation: probe-rs / GDB data (PC, SP, key registers, memory dump)
  • Request: 3 ordered hypotheses, with for each a minimal test + expected observation

This structure wasn't formalized from the start, but it emerged naturally. With this type of context, collaboration becomes methodical: hypotheses → experiments → validation on target → non-regression.


Conclusion

Crispy isn't "an A/B bootloader". It's a structural building block to enable TDD/BDD firmware on real hardware, with a fast, instrumented and reliable iteration loop — comparable to what you get in classic software development, but validated on hardware.

The main lesson from this project isn't tied to a particular implementation, but to the method that enables staying effective — including with an AI agent.

Five Recommendations for Integrating AI in Embedded Without Losing Control

  1. Build an end-to-end testable layer from the start Before expanding the functional scope, establish a minimal chain covering the complete flow: build → upload → boot → validation on target. This "backbone" makes each iteration measurable and drastically reduces the risk of drift.

  2. Give AI the means to test and experiment, not just generate code AI becomes truly useful when it can work from facts: test outputs, logs, measurements, observable states (debug). The more empirical the context, the more relevant the proposed hypotheses.

  3. Enrich prompts with real data from the target Rather than writing long prompts, feed the AI with concrete observables: failing scenario, reproduction steps, logs, and instrumentation results. This reduces verbosity, avoids speculative reasoning, and accelerates convergence.

  4. Maintain a structured development flow (Gitflow-type) to secure reintegrations Short branches, systematic reviews, frequent integration: AI accelerates production, but reintegration discipline prevents speed from turning into technical debt. The flow must remain a safeguard.

  5. Treat test quality as a deliverable in its own right Tests are the "source of truth": if they're unstable, incomplete or non-deterministic, everything else becomes fragile (including AI contribution). Invest in reliable, reproducible, behavior-oriented tests on target.

Summary and Perspectives

This project enabled me to test an agile approach adapted to embedded systems, integrate AI into a truly useful (rather than verbose) workflow, and consolidate a complete chain: bootloader, firmware, tools, tests, debug.

By applying these principles, you get a healthy base: a serene development loop, capable of absorbing complexity (new firmwares, new building blocks, continuous integration on hardware), while benefiting from AI as an accelerator — without losing system control.

The logical next step is now to harden and industrialize this approach (runners, expanded integration tests, and extension to fast communication/bus building blocks).


References

  1. Crispy Repo (README, structure, Quick Start, tests, AI philosophy) https://github.com/ADNTIO/crispy-bootloader-rp2040-rs

  2. crispy-upload Documentation (note on modified probe-rs + software breakpoints, install via make install-probe-rs) https://docs.rs/crate/crispy-upload/0.2.0

  3. probe-rs (debug toolchain — CLI/DAP/GDB server, etc.) https://github.com/probe-rs/probe-rs https://probe.rs/docs/

  4. Pico SDK docs — hardware_flash (XIP/IRQ/cores constraints) + PICO_NO_FLASH mention https://www.raspberrypi.com/documentation/pico-sdk/hardware.html

  5. rp2040-boot2 (second stage bootloader — boot2 and image placement) https://github.com/rp-rs/rp2040-boot2

  6. probe-rs PR (fix related to PicoProbe / debug workflow) https://github.com/probe-rs/probe-rs/pull/3810

  7. probe-rs Fork (modified version with software breakpoints for RAM firmware debug) https://github.com/fmahon/probe-rs