iTranslated by AI
Implementing Cross-Region Mutual Exclusion with Amazon DynamoDB Lock Client
Amazon DynamoDB Lock Client: Do You Know It?
This is it.
It is a "general-purpose distributed locking library" with Amazon DynamoDB as the backend. Since it allows locking using any string as a key, the locking granularity (the scope of the lock) can be designed very flexibly. On the other hand, due to this high degree of freedom, it carries risks such as locks being released unexpectedly or deadlocks occurring if not designed carefully. Therefore, it is a library that needs to be used while maintaining sufficient control by centralizing the locking logic in one place.
Background of Testing Cross-Region Exclusive Control
First, strong consistency was recently supported for DynamoDB global tables. Since it became possible to handle the latest information across regions, I wondered if lock information could now be shared across regions. This was the idea.
Second, Aurora DSQL does not have an exclusive control mechanism. Aurora DSQL is a powerful RDB that provides 99.999% high availability and infinite scalability, but because it employs optimistic concurrency control that detects conflicts at transaction commit time, no exclusive control mechanism is provided. So, I thought, why not just create an exclusive control mechanism outside of Aurora DSQL for now?
Execution Architecture
The AWS configuration for this trial implementation is as follows.

Both Aurora DSQL multi-region clusters and DynamoDB global tables achieve 99.999% high availability. Additionally, consistency that allows for active-active configurations is provided. In other words, it is a highly reliable configuration where clients can always access the latest data regardless of which region they access.
Note that in reality, both Aurora DSQL and DynamoDB global tables also exist in the us-west-2 region as a third data replication destination. However, it is omitted here because the application does not perform reads or writes to it.
How Was It After Trying It Out?
So, in the end, was cross-region exclusive control achievable? The conclusion is "It was possible." However, there are several things that need to be addressed.
Handling ReplicatedWriteConflictException
When strong consistency is enabled in DynamoDB global tables, an exception called ReplicatedWriteConflictException starts to occur. Since Amazon DynamoDB Lock Client does not support this exception, it needs to be handled appropriately on the application side.
Implementing Reentrant Locks
Amazon DynamoDB Lock Client has a Reentrant option. When this option is set to true, the lock acquisition process is skipped if the current owner already holds the target lock. However, this option does not support lock count tracking like java.util.concurrent.locks.ReentrantLock. In other words, even if the same lock is acquired multiple times, the lock will be released by a single release operation. This behavior is a problem; if you want the lock to be released after the same number of release operations as acquisition operations—like a ReentrantLock—additional implementation is required on the application side.
Implementing Lock Ownership by Thread
Since some processes depend on instance variables of the AmazonDynamoDBLockClient, it is advisable to make the AmazonDynamoDBLockClient instance thread-local when implementing lock ownership on a per-thread basis. Also, because the default value for the owner name used to identify the lock owner is the hostname, the configuration should be changed to set an owner name that is globally unique per thread, such as "hostname + thread name."
Settings Optimized for Numerous Short-Lived Locks
As mentioned earlier, the specification for the lock expiration issued by the Amazon DynamoDB Lock Client assumes that locks will be held for a relatively long period. Specifically, it starts a background thread to send heartbeats from the lock owner, extending the lock's expiration as long as the heartbeats are received. Additionally, if heartbeats stop for a certain period (set via LeaseDuration), the lock is automatically treated as expired, allowing locks held by owners that stopped responding without releasing the lock to be freed.
However, for short-lived locks used in online processing that complete within seconds, this mechanism of starting background threads and sending heartbeats imposes excessive load on the system. Therefore, it is better to set the maximum allowed lock retention period in LeaseDuration and disable background thread starting and heartbeat transmission.
Is Exclusive Control Really Necessary After All?
The motivation for this experiment was "Aurora DSQL lacks an exclusive control mechanism, so why not try using an existing mechanism?" However, it is necessary to calmly reconsider whether such exclusive control is actually needed. In workloads with a very high volume of concurrent processes, applying exclusive control causes all processes to enter a queue serially. This leads to a rapid increase in processing time, and eventually, most processes may start timing out. In other words, if you need to achieve high concurrency, exclusive control is an architecture that should be avoided.
Furthermore, if the traffic volume is low enough that the bottleneck from exclusive control is acceptable, the likelihood of actual processing collisions is probably low. If so, an architecture that retries upon detecting a collision without using exclusive control should be sufficient.
While there are likely other factors to consider and a conclusion cannot be reached immediately, if you want to build a scalable system, it seems better to think about "how to manage without exclusive control" rather than "how to somehow implement exclusive control."
Discussion