💭

Configuring S3 for Archivematica's AIP Processing and Storage

2024/02/11に公開

Overview

This is a memo on how to use files and folders on Amazon S3 as processing targets in Archivematica and save the resulting AIPs back to S3.

Using S3 as storage facilitates integration with other systems and provides more options for long-term preservation of AIPs.

The following article from the Wellcome Collection was helpful:

https://docs.wellcomecollection.org/archivematica/administering-archivematica/bootstrapping

Amazon S3 Configuration

Create a bucket. For this instance, a bucket named archivematica.aws.ldas.jp was created in the us-east-1 region.

Then, create folders named transfer_source for storing files to be processed, and aip_storage for storing the resulting AIPs. The names and hierarchy of these folders are arbitrary and can be set in the steps described below.

Archivematica Storage Service Configuration

If Archivematica is installed using Docker, you can access the Archivematica Storage Service at the following URL:

http://127.0.0.1:62081/

After logging in, go to the following page and click the "Create new space" link.

/spaces/

In the "Create Space" screen, fill out the form as shown below. Select S3 as the "Access protocol" and enter the Access Key and other information.

The Staging path was unclear, so the value from the following article was used:

https://docs.wellcomecollection.org/archivematica/administering-archivematica/bootstrapping#step_7

After creating the space, press "Create Location here" to create a location. There were two links, but both led to the same action.

Here, create two locations. One with the Purpose set to "Transfer Source" as shown below.

For the Relative Path, select the previously created folder using the "Browse" button.

Note that if you have created multiple Pipelines, you would select the associated one here.

The other location should have its Purpose set to "AIP Storage" as shown below.

Checking the "Set as global default location for its purpose:" option in each screen makes the following default settings unnecessary.

Verification

With these settings, by accessing /spaces/, you can confirm that in addition to the default space with the "Local Filesystem" Access Protocol, a new space with the "S3" Access Protocol has been added.

Further, by accessing /locations/, you can see the two added locations.

Archivematica Dashboard Configuration

If Archivematica is installed using Docker, you can access the Archivematica Dashboard at the following URL:

http://127.0.0.1:62080/

Setting AIP Storage Destination

Then, go to the following URL and edit, for example, the automated process.

/administration/processing/

For the "Store AIP" item, select the location you created earlier (in this case, "s3 api_storage").

This ensures that AIPs are stored in the previously created s3 location. If "Set as global default location for its purpose:" was checked earlier, this step is unnecessary.

Starting a Transfer

Go to /transfer/. Pressing the

"Browse" button displays the "Default transfer source" by default.

It's a select box; clicking on it will list available "Transfer Sources" for selection, so choose the s3 you created earlier.

This allows you to use files and folders on s3 as processing targets.

Summary

Specifying S3's cold storage (Amazon S3 Glacier) as the AIP storage destination can provide more options for long-term preservation of AIPs. Additionally, using S3 facilitates the use of APIs and integration with other systems.

I hope this guide is helpful in using Archivematica.

Discussion