Tapeworm

From HPCwiki
Jump to navigation Jump to search

Tapeworm: Automated tape-archival of old datasets

Warning:

  • This documentation page is under construction and may contain errors.
  • The Tapeworm application is in beta and may contain errors.

https://tapeworm.anunna.wur.nl/

Tapeworm helps you manage data on /archive by identifying datasets that are no longer actively used and preparing them for tape archival. The goal is simple: keep our warm storage available for active work, while safely preserving older data on tape.

With Tapeworm, you can:

  • See which of your datasets are being considered for tape archival.
  • Review planned moves before they happen.
  • Approve, snooze, or block moves when needed.
  • Add metadata to help describe archived datasets. The metadata is included on tape and can be used to view/retrieve from tape, should you need to do so in the future.

If you do nothing, Tapeworm will continue with the planned move after the review period. That is why we recommend checking your pending actions regularly. You will also receive notification emails about pending actions.

How Tapeworm works

  1. Tapeworm scans /archive and builds an index of datasets, size, owner, and last activite use.
  2. A policy engine checks which datasets look stale (for example: 30+ days old and larger than 1GB).
  3. Matching datasets are marked as planned and shown in your overview.
  4. You will be notified by email that Tapeworm plans to move data you own
  5. You can review and change what should happen, or block the move(s) entirely
  6. If no action is taken, after a wait period of 4 weeks, planned moves can become scheduled and then executed.
  7. Data is moved to tape, and removed from /archive/

Who sees what?

  • Regular users see only their own datasets and actions.
  • Group admins/contacts see data for their configured group(s), in addition to their own data.

User pages

1) Overview

This is your action page. It shows items that currently need your decision.

For each candidate, you can:

  • Approve: proceed with the tape move. It will schedule for the next day.
  • Deny: stop this move, configure an override for this path. Tapeworm will not try to move this dataset/path again, until you choose to remove the override.
  • Snooze: postpone the decision to a future date.
  • Edit metadata: add key/value notes for archived data. These values are included on tape and can be used to view/retrieve datasets on tape.

You can also select multiple rows and apply actions in bulk.

2) Datasets

This page shows your discovered datasets, their sizes, and last activity times. The application has no concept of what data belongs together and should be considered a 'dataset'. If the selections on this page are wrong, you can change how Tapeworm should handle these datasets instead.

Important:

  • If a dataset already has an active move candidate, scheduling controls are disabled.
The dataset list is informational; move decisions are handled through the Schedule page.

3) Schedule

This page shows move candidates and their status over time.

Common statuses:

  • Planned (or planned + notified): under review.
  • Scheduled: move is planned for a specific date.
  • Executing / Tape staged / On tape: move is in progress or completed.
  • Error: move needs admin attention. You may be contacted, maybe we resolve it ourselves :).

Once a move is already executing or completed, schedule-changing actions are locked.

4) Overrides

Overrides tell Tapeworm to ignore specific paths in future planning.

Use overrides when:

  • a project is still active and needs to remain on /archive
  • policy suggestions are not appropriate for that location

If you agree that the dataset can in principle be moved to tape, but you don't (yet) know when, you can choose to postpone/snooze the archival instead of overriding it.

Overrides apply to the selected path and everything below it.

Notifications (email)

Tapeworm sends email updates when actions are pending or dates are approaching.

Emails typically include:

  • dataset path,
  • size and last activity,
  • current status,
  • review/scheduled date.

Please read these emails carefully — they are your chance to adjust decisions before execution.

Best practices for users

  • Check your Overview page regularly.
  • Use Snooze if you need time to validate impact.
  • Add metadata when approving important datasets.
  • Use Overrides for known exceptions.
  • If unsure, contact HPC support before a scheduled move date.

FAQ

What happens if I do nothing?

Planned items can move forward automatically after the review window.

Can I undo after tape staging?

Not directly in Tapeworm. Retrieval is done via the tape/iRODS workflow. See: https://irods.wur.nl/userguide/tape_retrieval/

Why is an action button disabled?

Usually because the move has already progressed (executing/staged/on tape/error), so schedule edits are no longer valid.

Why do I see “planned + notified”?

That means the dataset move is planned and a notification has already been sent.

Need help?

If anything is unclear, or you think a move is incorrect but you cannot alter it in the provided GUI, please open an HPC support ticket.