Build-Time Validation

Summary

Make the build the place where content is checked: run automated validation on every proposed change so that anything which fails is stopped before it reaches a user rather than after. This pattern is for teams maintaining a static research site who want quality enforced by routine rather than by vigilance, including teams whose contributors are not all software engineers.

Recommendation	Why?
Run validation automatically in the build, on every pull request and again before each deployment, and block on failure.	Quality stops depending on whether a busy person remembered to look, problems surface at the cheapest moment attached to the change that caused them, and the published site stays trustworthy without anyone policing it.
Decide deliberately what blocks and what merely warns, and keep the checks fast.	A gate that blocks on everything, or runs slowly, trains people to resent or route around it, while a gate that blocks on nothing protects nothing. Calibrating which failures are fatal is the judgement that keeps the gate both trusted and bearable to work with.

Context

The build step this needs is what the statically generated site pattern provides, and the reviewable channel for changes is what the git-native continuous deployment pattern wires up. The pattern’s value rises sharply once more than one person contributes, and again once contributors include researchers who do not code and cannot be expected to catch every fault by eye.

These conditions are mostly about people and project phase rather than tools, and they carry real weight. Agreeing what “good enough to publish” means matters because the checks encode exactly that agreement; owning the checks means keeping them current as the project moves; and contributors have to accept that a red build blocks them, which is a social contract as much as a technical setting. Where standards are genuinely unsettled, settle enough of them to write a check before relying on one.

It does not fit a single author publishing by hand with no build and no review, where the ceremony costs more than it returns. It is also weakened if the team treats a failing check as advice rather than a stop, since a gate that everyone learns to merge past is worse than honest about its absence. The schema that part of this gate enforces is designed in the sibling pattern on schema-validated structured content; this pattern is about the working practice of enforcement and the other checks that ride alongside it.

Usage

Run the checks where the change is proposed. Trigger validation on every pull request so a contributor sees results against their own change before it merges, and run the same checks again before deployment as a backstop.

Make a failing check stop the change. The gate only works if red means stop, so protect the default branch (as set out in the continuous deployment pattern) such that an unreviewed or failing change cannot publish. A check that does not block is a report, not a gate.

Validate the content against its schema, and check what else breaks silently. Beyond schema conformance, check for broken internal and external links, missing or misnamed assets, and accessibility regressions against a recognised standard such as the Web Content Accessibility Guidelines (WCAG). These are the faults that a human reviewer reading for sense will routinely miss.

Separate what blocks from what warns, and write the policy down. Treat genuine errors, e.g. a malformed item or a dead internal link, as blocking, and treat advisory findings as warnings that inform without halting, so that the distinction is consistent across contributors rather than decided case by case.

Keep the gate fast and its failures legible. A slow check gets skipped, and a failure a non-coding author cannot interpret gets escalated to whoever can, which defeats the purpose. Report which item, which field or link, and what to do about it.

Give contributors the same checks locally. Let people run the validation on their own machine, e.g. as a single script or a pre-commit step, so the build is not the first time they discover something is wrong. This lowers the barrier for contributors who are not software engineers, a recurring concern in the sector.

Grow the suite deliberately and review it. Add checks as new classes of mistake appear, retire ones that no longer earn their keep, and treat the check suite as something the team maintains rather than inherits. This is sustainability practice, not overhead.

The build as a quality gate: only changes that pass the checks cross into the published site users see, while failures loop back to the contributor.

Implementations

The RSE-CEP project runs this pattern as its normal way of working. Opening a pull request triggers continuous integration to validate the content against its schema and build a trial copy of the site, and the change cannot reach the default branch until those checks pass. Merging then builds and publishes. The quality gate and the deployment are the same pipeline, which is exactly the practice described here.

Accessibility checking is a common companion check. Tools such as pa11y and axe-core run against the built pages in the pipeline and fail the build on violations at a chosen WCAG level, which turns accessibility from a pre-launch scramble into a routine, continuous gate. Link checking plays the same role for dead internal and external links, run over the built output before any user meets them.

Local parity is the other half of the practice in mature setups: a pre-commit hook or a shared validation script lets contributors run the same checks before they push, so failures are caught at the desk rather than in the pipeline. The mechanics of wiring these checks into the same workflow that deploys are covered in the GitHub Pages custom workflow documentation, and the same shape applies on other CI platforms.

References

Models

The underlying practice is continuous integration with a fail-fast, shift-left posture: check early, check on every change, and stop the change rather than the user. The mechanism for running such checks in the same pipeline that publishes is documented in Using custom workflows with GitHub Pages.

Other resources

For accessibility checks in the pipeline, see pa11y and axe-core, both of which test against WCAG levels and can fail a build on violations.

Acknowledgments

This pattern responds to two concerns raised by participants in the HASS and Indigenous RDC Community Data Lab co-design workshop: that research software too often becomes unsustainable, and that infrastructure upkeep is a barrier for contributors who are not software engineers. Automating checks so that quality does not rest on specialist vigilance speaks to both. It points to the open-source accessibility tooling maintained by the pa11y and Deque communities rather than reinventing it.

Build-Time Validation

Applies when