Contemporary solutions
A challenge in persistent databases is to provide access to the data to many concurrent users while keeping that data consistent and provide a means to recover from application errors or system failures. An important abstraction used is the notion of a transaction which groups simple operations together so this group can succeed or fail as a whole. All changes to the database should be made as a series of transactions. This means that concurrent transactions should somehow be serialized. Serialization in contemporary database systems is achieved through various techniques. Some techniques rely on locking protocols, others on ordering transactions based on their associated timestamps.
Traditionally, locking protocols where commonplace. Transactions would lock the resources they needed to read or update so other transactions would have to wait for those resources to be available again. In case a deadlock occurs, one of the transactions would be aborted and rolled back. The obvious disadvantage of these techniques is that concurrent transactions are often waiting for each other. Especially with long transactions this could be a really big problem.
Part of the problems with the locking protocols is solved in a technique that is widely used now. The technique called Multiversion concurrency control does not lock all data, instead, whenever a reader needs access to data that is currently being updated by a writer the database returns the data as it was when the reader transaction started. So, updating data results in creating a copy of the original data and performing the update on the copy. This technique has the advantage that readers will never block writers and reversely, writers will never block readers. However, when multiple writers need to update the same data, still some coordination needs to be performed. This can be achieved by traditional locking or by a method based on timestamp ordering. With timestamp ordering every record of data is being associated with the timestamp of the transaction that last read or wrote the record. By inspecting these timestamps the database can determine which transactions conflict and abort the appropriate transaction.
Considerations for Charta
Before trying to choose an appropriate method for Charta, it is important to first determine what would be important or desirable for the final system. In the following some considerations are laid out:
High concurrencyThe possibility of high concurrency would be very important. Most applications do a lot of reading while one or more threads perform writing. It is important that the readers and writers do not get blocked by each other more than necessary.Long and interactive transaction supportA common rule of thumb in todays databases is to keep transactions short lived in order to allow other transactions access to the same resources. However, in real life transactions can easily grow large. For instance, an import script which bulk imports data into the database can take very long and ideally all its changes are committed as whole. Chunking such a transaction into little pieces would be unnatural and undesirable. Besides in import scripts the same situation can easily arise when a user starts a database transaction through a user interface. When the user wants to group his actions into a transaction, this transaction can easily take very long, or may never end. Here it would also be unnatural and undesirable to disallow the user to group his actions into a transaction.Temporal databaseContemporary database systems often lack the possibility to provide a version history of its records. If possible, it would be very desirable to be able to provide this feature. As current systems already use Multiversion concurrency control it seems like a small step to be able to keep historic data as long one desires instead of just allowing readers to not be blocked by writers.
Proposal for Charta
Given the considerations it seems like the obvious approach would be to provide a true temporal database solution. This approach would allow a version history, high performing concurrent access to actual data and the possibility to roll back transactions and recover from failures. It might even be possible to selectively roll back (parts of) transactions, even if other unrelated or non conflicting transactions have been executed afterwards. This implementation would in fact resemble the functionality of a version control system as they are used to maintain the repository of a code base.
Looking at contemporary version control systems in more detail one could be inspired to provide a database solution for the long or interactive transaction problem. In fact, every user that works with the version control system obtains his own working copy and starts working on that. When the user has finished he commits his work to the repository as an atomic operation or transaction. These transactions typically take a long time to complete, especially when compared to current database transactions. As these transactions take a long time a version control system needs to provide a system to provide concurrent modifications to the repository. Traditionally these systems provided a locking protocol so users would have to lock every file before they could start editing on it. However, this method has the obvious problems associated with locking. The current solution is to allow every user to modify whatever they want and at commit time it is checked whether these changes can be applied to the repository without a conflict. If another user has committed changes in the meanwhile the user would first have to update his working copy with those intermediate changes and possibly merge those changes with his own to resolve conflicts.
The above update-merge-commit method seems like a great way to serialize database transactions if the update-merge actions are feasible in a database context. What makes it great is that short transactions automatically take precedence over longer transactions and the possible conflicts that arise are the responsibility of the longer transaction to solve. This seems only fair. Please note that this way of transaction ordering is different from traditional systems in which often the transaction that first started takes precedence (in a locking scheme) or the transaction that first read or writes a record takes precedence (in a timestamp ordering scheme). Both traditional transaction ordering methods allow a long transaction to disrupt the flow of the more important and short lived transactions.