Reproducibility Guidelines
Reproducible work means that you — or someone else — can run your analysis again later and get the same result. On a shared, evolving cluster this takes a little discipline, but the building blocks are all available on Anunna.
Pin your software versions
- Load specific module versions rather than defaults, and record the bucket and module versions you used. A job that loads
Python/3.11.3from the2024bucket will behave the same next year; one that loads whatever happens to be current may not. - For your own environments, pin versions too — record them in a
requirements.txt,environment.yml, or equivalent. See Installing Personal Software and Python.
Use containers for full reproducibility
A container captures an entire software stack — operating-system libraries, tools, and dependencies — in a single image that runs the same anywhere. On Anunna, use Apptainer (formerly Singularity). A container is the strongest guarantee that your environment will not drift over time.
Keep your code in version control
Track your scripts and pipelines with Git so you have a history of what changed and can return to any version.
WUR runs a GitLab instance at git.wur.nl where you can host your repositories. Set up SSH keys for it the same way as for any SSH service — see SSH Access for generating a key pair — and add the public key to your GitLab account.
Automate the steps
A workflow engine records the exact steps, their order, and their dependencies, so the whole pipeline can be rerun from scratch. This is far more reproducible than running commands by hand.