Adding Test Coverage to 20-Year-Old Koha Code — With a Little Help from AI

If you’ve worked on a large open-source project, you’ve seen the pattern: critical modules that have run in production for years with minimal test coverage. Not because nobody cares, but because writing tests for complex rendering code is tedious, time-consuming work that rarely makes it to the top of anyone’s priority list.

At Veritas Supera, we maintain the label and patron card printing modules in Koha — the world’s most widely used open-source library system. We also maintain PDF::Reuse, the CPAN module these Koha components depend on. When we prepared to release a major update to PDF::Reuse, we needed to be certain the new version wouldn’t break Koha’s label and card generation.

The problem: almost no rendering tests existed.

Starting From Near Zero

C4::Labels::Label handles spine labels and barcode labels — text layout, barcode generation (CODE39, EAN13, various 2-of-5 formats), and PDF stream composition. It had a test file with 6 basic assertions. None of them exercised the actual drawing methods.

C4::Patroncards::Patroncard renders patron ID cards with text fields, barcodes, images, and layout guide overlays. It had no test file at all.

These modules have been running in libraries worldwide for nearly two decades. But “it works in production” isn’t the same as “we’ll know if the next update breaks it.”

How We Used Claude to Build the Test Suite

We brought Claude into the workflow by giving it access to the full source of both modules, the existing test file, and Koha’s testing infrastructure (Test::More, t::lib::TestBuilder, transaction-wrapped database fixtures).

Here’s the thing about AI-assisted test writing that most people don’t realize: the first draft was wrong.

Claude’s initial implementation simply exercised PDF::Reuse functions directly — testing the library, not Koha’s code. We recognized this immediately and redirected: the tests needed to exercise Koha’s own rendering methods — draw_label_text(), draw_guide_box(), create_label(), and their Patroncard equivalents.

After that course correction, Claude produced thorough, convention-compliant test code. Here’s a straightforward example that validates PDF stream output:

my $label_with_box = C4::Labels::Label->new(
    %$label_info, guidebox => 1,
    llx => 10, lly => 20, width => 100, height => 50,
);
my $box_stream = $label_with_box->draw_guide_box();
ok( $box_stream, 'draw_guide_box returns truthy when guidebox enabled' );
like( $box_stream, qr/10 20 100 50 re/,
    'PDF stream contains expected rectangle coordinates' );

And a more nuanced case that required understanding Koha’s printing type system — where different types return different data structures:

# BAR printing type renders barcode only, returns undef (no text)
my $label_bar = C4::Labels::Label->new(
    %$label_info, printing_type => 'BAR',
    barcode_type => 'CODE39', barcode => 'ORCHTEST1',
);
my $bar_result = $label_bar->create_label();
ok( !defined $bar_result,
    'BAR printing type returns undef (barcode only)' );

One thing Claude handled well was PDF session management. Each subtest that calls rendering methods needs its own isolated PDF::Reuse session with proper cleanup — temporary files, output stream selection, and session teardown. Claude learned this pattern after encountering the first session collision and applied it consistently across all 19 test groups.

The Tests Found Real Bugs

This is the part that pays for itself.

While building the Patroncard test suite, we discovered Bug 41718 — a defect that had been hiding in draw_guide_grid() for nearly 20 years. The method uses a hardcoded 'Courier' font name instead of a Koha font code, and calls StrWidth() without the required font and size arguments, producing warnings from deep in the PDF layer.

Nobody had noticed because the guide grid is a layout debugging aid that most users never enable in production. But there it was — waiting to confuse anyone who turned it on.

# Koha Bug 41718: draw_guide_grid() has latent font handling bugs
my $card_point = build_patroncard();
{
    local $SIG{__WARN__} = sub { };
    $result = eval { $card_point->draw_guide_grid($pdf); 1; };
}
ok( $result, 'draw_guide_grid() with POINT units succeeds' );

We also discovered the PDF::Reuse TTFont crash during early test development — an object lifecycle bug that we fixed upstream and released as part of the same effort.

The Numbers

Metric	Before	After
Label test assertions	6 (no rendering)	~30 across 11 test groups
Patroncard test assertions	0 (no test file)	~20 across 8 test groups
Rendering methods tested	0	All public drawing methods
Bugs discovered	—	2 (1 in Koha, 1 upstream)
Lines of test code added	—	558

The entire test suite was submitted to the Koha project as Bug 41719, with full AI-assisted development disclosure per the community’s coding guidelines for AI contributions.

The Human Element

Every piece of this work required experienced judgment calls that shaped the outcome:

Rejecting the first draft that tested the wrong layer of abstraction
Knowing Koha’s conventions — TestBuilder fixtures, transaction wrapping, Modern::Perl, the right variable patterns
Recognizing Bug 41718 as a real defect rather than a test setup problem
Tracing the PDF::Reuse crash to its root cause in object lifecycle management (a story for another post)
Evaluating convention compliance — catching patterns like unnecessary intermediate variables that didn’t match the prevailing style across 200+ test files

Claude generated thorough test code efficiently. But the decisions about what to test, how to structure it, and whether the results were actually correct — those required someone who wrote these modules and understands the ecosystem they run in.

What This Means for Your Codebase

Most organizations have code like this: critical modules that work, have worked for years, and have little or no test coverage. Writing tests for legacy rendering code, batch processing systems, or complex internal tools is exactly the kind of work that gets perpetually deferred.

AI-accelerated test development changes the economics. What used to be weeks of tedious manual test writing becomes a focused engagement measured in days. But the quality of the result depends entirely on who’s directing the process and evaluating the output.

At Veritas Supera, we combine AI-powered development with deep expertise in open-source systems, Perl, and legacy codebases to deliver results like these — real test coverage that catches real bugs, built to your project’s conventions and standards.

Need test coverage for legacy code? Let’s talk — we’ll help you find the bugs before your users do.

Adding Test Coverage to 20-Year-Old Koha Code — With a Little Help from AI

Starting From Near Zero

How We Used Claude to Build the Test Suite

The Tests Found Real Bugs

The Numbers

The Human Element

What This Means for Your Codebase

When a 20-Year-Old Perl Bug Comes Knocking: How We Used AI to Track It Down

When AI Agents Argue: What One Adversarial Reviewer Caught That Four Collaborative AIs Missed

Starting From Near Zero

How We Used Claude to Build the Test Suite

The Tests Found Real Bugs

The Numbers

The Human Element

What This Means for Your Codebase

When a 20-Year-Old Perl Bug Comes Knocking: How We Used AI to Track It Down

When AI Agents Argue: What One Adversarial Reviewer Caught That Four Collaborative AIs Missed

One careful piece of writing on AI tooling, in your inbox each Friday.