Reproducibility Guidelines

From HPCwiki
Revision as of 14:22, 18 June 2026 by Haars0011 (talk | contribs) (IA migration §8: new Reproducibility Guidelines (versions/containers/Git/WUR GitLab/workflow engines; absorbs Manual GitLab pointer) (via create-page on MediaWiki MCP Server))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Reproducible work means that you — or someone else — can run your analysis again later and get the same result. On a shared, evolving cluster this takes a little discipline, but the building blocks are all available on Anunna.

Pin your software versions

  • Load specific module versions rather than defaults, and record the bucket and module versions you used. A job that loads Python/3.11.3 from the 2024 bucket will behave the same next year; one that loads whatever happens to be current may not.
  • For your own environments, pin versions too — record them in a requirements.txt, environment.yml, or equivalent. See Installing Personal Software and Python.

Use containers for full reproducibility

A container captures an entire software stack — operating-system libraries, tools, and dependencies — in a single image that runs the same anywhere. On Anunna, use Apptainer (formerly Singularity). A container is the strongest guarantee that your environment will not drift over time.

Keep your code in version control

Track your scripts and pipelines with Git so you have a history of what changed and can return to any version.

WUR runs a GitLab instance at git.wur.nl where you can host your repositories. Set up SSH keys for it the same way as for any SSH service — see SSH Access for generating a key pair — and add the public key to your GitLab account.

Automate the steps

A workflow engine records the exact steps, their order, and their dependencies, so the whole pipeline can be rerun from scratch. This is far more reproducible than running commands by hand.

See also