The key feature of relational databases is their ability to ensure data consistency, that is, data correctness.
关系数据库的关键特征是能够保证 数据一致性,即数据正确性。
It is a known fact that at the database level it is possible to create integrity con- straints, such as NOT NULL or UNIQUE. The database system ensures that these con- straints are never broken, so data integrity is never compromised.
众所周知,在数据库级别可以创建完整性约束,例如 NOT NULL 或 UNIQUE。数据库系统确保这些约束永远不会被破坏,因此数据完整性永远不会受到损害。
If all the required constraints could be formulated at the database level, consis- tency would be guaranteed. But some conditions are too complex for that, for example, they touch upon several tables at once. And even if a constraint can be defined in the database, but for some reason it is not, it does not mean thatthis constraint may be violated.
如果可以在数据库级别制定所有必需的约束,则可以保证一致性。但有些条件太复杂,例如,它们同时涉及多个表。即使数据库中可以定义约束,但由于某种原因没有定义,也不意味着可以违反该约束。
Thus, data consistency is stricter than integrity, but the database system has no idea what “consistency” actually means. If an application breaks it without break- ing the integrity, there is no way for the database system to find out. Consequently, it is the application that must lay down the criteria for data consistency, and we have to believe that it is written correctly and will never have any errors.
因此,数据一致性比完整性更严格,但数据库系统并不知道“一致性”到底意味着什么。如果应用程序在不破坏完整性的情况下破坏了它,数据库系统就无法发现。因此,必须由应用程序来制定 数据一致性 的标准,并且我们必须相信它被正确编写并且永远不会出现任何错误。
But if the application always executes only correct sequences of operators, where does the database system come into play?
但是,如果应用程序始终只执行正确的运算 序列,那么数据库系统将在哪里发挥作用?
First of all, a correct sequence of operators can temporarily break data consistency, and—strange as it may seem—it is perfectly normal.
首先,正确的运算 序列可能会暂时破坏数据一致性,尽管看起来很奇怪,但这是完全正常的。
A hackneyed but clear example is a transfer of funds from one account to another. A consistency rule may sound as follows: a money transfer must never change the total balance of the affected accounts. It is quite difficult (although possible) to formulate this rule as an integrity constraint in SQL, so let’s assume that it is defined at the application level and remains opaque to the database system. A transfer consists of two operations: the first one draws some money from one of the accounts, whereas the second one adds this sum to another account. The first operation breaks data consistency, whereas the second one restores it.
一个老生常谈但很明显的例子是将资金从一个账户转移到另一个账户。一致性规则听起来可能如下:转账绝不能改变受影响账户的总余额。将此规则制定为 SQL 中的完整性约束非常困难(尽管可能),因此我们假设它是在应用程序级别定义的并且对数据库系统保持不透明。转账由两个操作组成:第一个操作从一个账户中提取一些钱,而第二个操作将这笔钱添加到另一个账户中。第一个操作破坏了数据一致性,而第二个操作则恢复了数据一致性。
If the first operation succeeds, but the second one does not (because of some fail- ure), data consistency will be broken. Such situations are unacceptable, but it takes a great deal of effort to detect and address them at the application level. Luckily it is not required—the problem can be completely solved by the database system itself if it knows that these two operations constitute an indivisible whole, that is, a transaction.
如果第一个操作成功,但第二个操作没有成功(由于某些失败),则数据一致性将被破坏。这种情况是不可接受的,但需要付出很大的努力才能在应用程序级别检测和解决它们。幸运的是,这不是必需的——如果数据库系统知道这两个操作构成一个不可分割的整体,即事务,那么问题就可以完全由数据库系统本身解决。
But there is also a more subtle aspect here. Being absolutely correct on their own, transactions can start operating incorrectly when run in parallel. That’s because operations belonging to different transactions often get intermixed. There would be no such issues if the database system first completed all operations of one trans- action and then moved on to the next one, but performance of sequential execution would be implausibly low.
但这里还有一个更微妙的方面。事务本身绝对正确,但并行运行时候 可能会导致错误操作。这是因为属于不同事务的操作经常混合在一起。如果数据库系统首先完成一个事务的所有操作,然后继续执行下一个事务,则不会出现此类问题,但顺序执行的性能会低得令人难以置信。
A truly simultaneous execution of transactions can only be achieved on systems with suit- able hardware: a multi-core processor, a disk array, and so on. But the same reasoning is also true for a server that executes commands sequentially in the time-sharing mode. For generalization purposes, both these situations are sometimes referred to as concurrent execution.
真正的事务同步执行只能在具有合适硬件的系统上实现:多核处理器、磁盘阵列等。但对于分时模式下顺序执行命令的服务器来说也是同样的道理。出于一般化目的,这两种情况有时都称为并发执行。
Correct transactions that behave incorrectly when run together result in concur- rency anomalies, or phenomena.
正确的事务一起运行时 行为不正确会导致并发异常或现象。
对比reids 事务
单个 Redis 命令的执行是原子性的,但 Redis 没有在事务上增加任何维持原子性的机制,所以 Redis 事务的执行并不是原子性的。
事务可以理解为一个打包的批量执行脚本,但批量指令并 非原子化 的操作,中间某条指令的失败不会导致前面已做指令的回滚,也不会造成后续的指令不做。
Here is a simple example. To get consistent data from the database, the applica- tion must not see any changes made by other uncommitted transactions, at the very minimum. Otherwise (if some transactions are rolled back), it would see the database state that has never existed. Such an anomaly is called a dirty read. There are also many other anomalies, which are more complex.
这是一个简单的例子。为了从数据库中获得一致的数据,应用程序至少不能看到 其他未提交事务所做的任何更改。否则(如果某些事务被回滚),它将看到从未存在过的数据库状态。这种异常称为脏读。还有许多其他异常现象,而且更为复杂。
When running transactions concurrently, the database must guarantee that the result of such execution will be the same as the outcome of one of the possible sequential executions. In other words, it must isolate transactions from one another, thus taking care of any possible anomalies.
当并发运行事务时,数据库必须保证此类并发执行的结果将与 顺序执行之一的结果相同。换句话说,它必须将事务彼此隔离,从而处理任何可能的异常情况。
To sum it up, a transaction is a set of operations that takes the database from one correct state to another correct state (consistency), provided that it is executed in full (atomicity) and without being affected by other transactions (isolation). This definition combines the requirements implied by the first three letters of the ACID acronym. They are so intertwined that it makes sense to discuss them together. In fact, the durability requirement is hardly possible to split off either: after a crash, the system may still contain some changes made by uncommitted transactions, and you have to do something about it to restore data consistency. 综上所述,事务是一组将数据库从一个正确状态转移到另一个正确状态(一致性)的操作,前提是它是完整执行的(原子性)并且不受其他事务影响(隔离性)。该定义结合了 ACID 首字母缩略词的前三个字母所隐含的要求。它们是如此交织在一起,因此将它们放在一起讨论是有意义的。事实上,持久性也必须要求:在发生崩溃后,系统可能仍然包含一些未提交事务所做的更改,您必须采取一些措施来恢复数据一致性。
Thus, the database system helps the application maintain data consistency by taking transaction boundaries into account, even though it has no idea about the im- plied consistency rules.
因此,数据库系统通过考虑事务边界来帮助应用程序维护数据一致性,即使它不知道隐含的一致性规则。
Unfortunately, full isolation is hard to implement and can negatively affect per- formance. Most real-life systems use weaker isolation levels, which prevent some anomalies, but not all of them. It means that the job of maintaining data consis- tency partially falls on the application. And that’s exactly why it is very important to understand which isolation level is used in the system, what is guaranteed at this level and what is not, and how to ensure that your code will be correct in such conditions.
不幸的是,完全隔离很难实现,并且会对性能产生负面影响。大多数现实生活中的系统使用较弱的隔离级别,这可以防止某些异常情况,但不能防止所有异常情况。这意味着维护数据一致性的工作部分落在应用程序身上。这就是为什么了解系统中使用哪种隔离级别、该级别保证什么、不保证什么以及如何确保代码在这种情况下正确是非常重要的。
2.2 SQL 标准中的隔离级别和异常
The SQL standard specifies four isolation levels. These levels are defined by the list of anomalies that may or may not occur during concurrent transaction execution. So when talking about isolation levels, we have to start with anomalies.
SQL 标准指定了四种隔离级别。 这些级别由并发事务执行期间可能发生或不发生的异常列表定义。因此,在谈论隔离级别时,我们必须从异常开始。
We should bear in mind that the standard is a theoretical construct: it affects the practice, but the practice still diverges from it in lots of ways. That’s why all examples here are rather hypothetical. Dealing with transactions on bank accounts, these examples are quite self-explanatory, but I have to admit that they have noth- ing to do with real banking operations.
我们应该记住,该标准只是理论:它影响实践,但实践在很多方面仍然与其存在分歧。这就是为什么这里的所有例子都相当假设。处理银行账户上的事务,这些例子是不言自明的,但我不得不承认它们与真正的银行业务无关。
It is interesting that the actual database theory also diverges from the standard: it was developed after the standard had been adopted, and the practice was already well ahead.
有趣的是,实际的数据库理论也与标准有所不同:它是在标准采用之后开发的,并且实践已经遥遥领先。
The lost update anomaly occurs when two transactions read one and the same table row, then one of the transactions updates this row, and finally the other transaction updates the same row without taking into account any changes made by the first transaction.
当两个事务读取同一表行,然后其中一个事务更新该行,最后 另一个事务更新同一行而不考虑第一个事务所做的任何更改时(即直接覆盖),就会发生丢失更新异常。
Suppose that two transactions are going to increase the balance of one and the same account by $100
. The first transaction reads the current value ($1,000
), then the second transaction reads the same value. The first transaction increases the balance (making it $1,100
) and writes the new value into the database. The second transaction does the same: it gets $1,100
after increasing the balance and writes this value. As a result, the customer loses $100
.
假设两笔事务将使同一个账户的余额增加 $100
。第一个事务读取当前值 ( $1,000
),然后第二个事务读取相同的值。第一个事务增加余额(使其成为 $1,100
)并将新值写入数据库。第二笔事务执行相同的操作:增加余额后获得 1,100
美元并写入该值。结果,客户丢失了 $100
。
Lost updates are forbidden by the standard at all isolation levels.
隔离级别标准 在 所有隔离级别都 禁止 丢失更新。
脏读和读未提交
The dirty read anomaly occurs when a transaction reads uncommitted changes made by another transaction.
当一个事务读取另一个事务所做的 未提交的更改时,就会发生脏读异常。
For example, the first transaction transfers $100
to an empty account but does not commit this change. Another transaction reads the account state (which has been updated but not committed) and allows the customer to withdraw the money— even though the first transaction gets interrupted and its changes are rolled back, so the account is empty.
例如,第一笔事务将 $100
转移到空帐户,但不提交此更改。另一笔事务读取账户状态(已更新但未提交)并允许客户提取资金——即使第一笔事务被中断并且其更改被回滚,因此账户为空。
The standard allows dirty reads at the Read Uncommitted level.
该标准允许读未提交级别的脏读。
不可重复读和读已提交
The non-repeatable read anomaly occurs when a transaction reads one and the same row twice, whereas another transaction updates (or deletes) this row between these reads and commits the change. As a result, the first transaction gets different results.
当一个事务读取 同一行 两次,而另一个事务在这些读取之间更新(或删除)该行并提交更改时,就会发生不可重复读取异常。结果,第一笔事务得到了不同的结果。
For example, suppose there is a consistency rule that forbids having a negative bal- ance in bank accounts. The first transaction is going to reduce the account balance by $100
.It checksthecurrentvalue,gets $1,000
, and decides that this operation is possible. At the same time, another transaction withdraws all the money from this account and commits the changes. If the first transaction checked the balance again at this point, it would get $0
(but the decision to withdraw the money is already taken, and this operation causes an overdraft).
例如,假设有一条一致性规则,禁止银行账户出现负余额。第一笔事务将使账户余额减少 $100
。它检查当前值,获取 $1,000
,并决定此操作是可能的。与此同时,另一笔事务从该账户中 提取所有 资金并提交更改。如果此时第一笔事务再次检查余额,则会得到 $0
(但提款的决定已经做出,此操作会导致透支)。
The standard allows non-repeatable reads at the Read Uncommitted and Read Committed levels.
该标准允许在“读未提交”和“读已提交”级别进行不可重复读取。
幻读和可重复读
The phantom read anomaly occurs when one and the same transaction executes two identical queries returning a set of rows that satisfy a particular condition, while another transaction adds some other rows satisfying this condition and commits the changes in the time interval between these queries. As a result, the first trans- action gets two different sets of rows.
当一个事务执行两个相同的查询返回 一组满足特定条件的行,而另一个事务添加一些满足该条件的其他行并在这些查询之间的时间间隔内提交更改时,就会发生幻读异常。结果,第一个事务获得了两组不同的行。
For example, suppose there is a consistency rule that forbids a customer to have more than three accounts. The first transaction is going to open a new account, so it checks how many accounts are currently available (let’s say there are two of them) and decides that this operation is possible. At this very moment, the second transaction also opens a new account for this client and commits the changes. If the first transaction double-checked the number of open accounts, it would get three (but it is already opening another account, and the client ends up having four of them).
例如,假设有一条一致性规则,禁止客户拥有超过三个帐户。第一个事务将开设一个新帐户,因此它检查当前有多少个可用帐户(假设有两个)并决定此操作是可能的。此时,第二笔事务也为该客户开设了一个新帐户并提交了更改。如果第一笔事务仔细检查了开立账户的数量,它将得到三个(但它已经开立了另一个账户,并且客户最终拥有其中四个)。
The standard allows phantom reads at the Read Uncommitted, Read Committed, and Repeatable Read isolation levels.
该标准允许在读未提交、读已提交和可重复读隔离级别进行幻读。
无异常且串行化
The standard also defines the Serializable level, which does not allow any anomalies. It is not the same as the ban on lost updates and dirty, non-repeatable, and phantom reads. In fact, there is a much higher number of known anomalies than the standard specifies, and an unknown number of still unknown ones.
该标准还定义了 串行化级别,不允许任何异常。它与禁止丢失更新以及脏读、不可重复读和幻读不同。事实上,已知异常的数量比标准规定的要多得多,并且未知数量仍然未知。
The Serializable level must prevent any anomalies. It means that the application developer does not have to take isolation into account. If transactions execute correct operator sequences when run on their own, concurrent execution cannot break data consistency either.
可串行化级别必须防止任何异常情况。这意味着应用程序开发人员不必考虑隔离。如果事务在单独运行时执行正确的运算符序列,则并发执行也不会破坏数据一致性。
To illustrate this idea, I will use a well-known table provided in the standard; the last column is added here for clarity:
为了说明这个想法,我将使用标准中提供的一个众所周知的表格;为了清楚起见,在此处添加了最后一列:
lost dirty non-repeatable phantom other
update read read read anomalies
Read Uncommitted — yes yes yes yes
Read Committed — — yes yes yes
Repeatable Read — — — yes yes
Serializable — — — — —
Of all the possible anomalies, why does the standard mentions only some, and why exactly these ones ?
在所有可能的异常情况中,为什么标准只提到了 一些,以及为什么具体是这些?
No one seems to know it for sure. But it is not unlikely that other anomalies were simply not considered when the first versions of the standard were adopted, as theory was far behind practice at that time.
似乎没有人确切知道这一点。但在采用该标准的第一个版本时,也有可能根本没有考虑其他异常情况,因为当时的理论远远落后于实践。
Besides, it was assumed that isolation had to be based on locks. The widely used two-phase locking protocol (2PL) requires transactions to lock the affected rows dur- ing execution and release the locks upon completion. In simplistic terms, the more locks a transaction acquires, the better it is isolated from other transactions. And consequently, the worse is the system performance, as transactions start queuing to get access to the same rows instead of running concurrently.
此外,还假设 隔离必须基于锁。广泛使用的两阶段锁定协议(2PL)要求事务在执行期间锁定受影响的行,并在完成时释放锁定。简而言之,事务获取的锁越多,它与其他事务的隔离性就越好。因此,系统性能变得更差,因为事务开始排队串行访问相同的行,而不是同时访问行。
I believe that to a great extent the difference between the standard isolation levels is defined by the number of locks required for their implementation.
我相信,在很大程度上,标准隔离级别之间的差异是由其实现所需的锁数量定义的。
If the rows to be updated are locked for writes but not for reads, we get the Read Uncommitted isolation level, which allows reading data before it is committed.
如果要 更新的行被锁定 以进行写入而不是读取,我们将获得读取未提交隔离级别,该级别允许在提交数据之前读取数据。
If the rows to be updated are locked for both reads and writes, we get the Read Committed level: it is forbidden to read uncommitted data, but a query can return different values if it is run more than once (non-repeatable reads).
如果要更新的行对于读取和写入都被锁定,我们将获得读已提交级别:禁止读取未提交的数据,但如果查询运行多次(不可重复读取),则可以返回不同的值。
Locking the rows to be read and to be updated for all operations gives us the Re- peatable Read level: a repeated query will return the same result.
锁定所有操作要读取和更新的行为我们提供了可重复读取级别:重复查询将返回相同的结果。
However, the Serializable level poses a problem: it is impossible to lock a row that does not exist yet. It leaves an opportunity for phantom reads to occur: a transac- tion can add a row that satisfies the condition of the previous query, and thisrow will appear in the next query result.
然而,Serialized 级别带来了一个问题:不可能锁定尚不存在的行。这就给 幻读的发生留下了机会:一个事务可以 添加一行满足上一个查询条件的行,而该行将出现在下一个查询结果中。
Thus, regular locks cannot provide full isolation: to achieve it, we have to lock con- ditions (predicates) rather than rows. Such predicate locks were introduced as early as 1976 when System R was being developed; however, their practical applicability is limited to simple conditions for which it is clear whether two different predicates may conflict. As far as I know, predicate locks in their intended form have never been implemented in any system. 因此,常规锁 无法提供完全隔离:为了实现它,我们必须 锁定条件(谓词)而不是行。这种谓词锁早在 1976 年开发 System R 时就被引入了;然而,它们的实际适用性仅限于简单的条件,在这些条件下,两个不同的谓词是否可能发生冲突是显而易见的。据我所知,谓词锁的理想形式从未在任何系统中实现过。
Over time, lock-based protocols for transaction management got replaced with the Snapshot Isolation (SI) protocol. The idea behind this approach is that each trans- action accesses a consistent snapshot of data as it appeared at a particular point in time. The snapshot includes all the current changes committed before the snap- shot was taken.
随着时间的推移,基于锁的事务管理协议 被 快照隔离 (SI) 协议 所取代。这种方法背后的想法是,每个事务都访问特定时间点出现的 一致数据快照。快照包括 拍摄快照之前提交的所有当前更改。
Snapshot isolation minimizes the number of required locks. In fact, a row will be locked only by concurrent update attempts. In all other cases, operations can be executed concurrently: writes never lock reads, and reads never lock anything. 快照隔离最大限度地减少了所需锁的数量。事实上,数据行 将是 仅由 并发更新 尝试锁定。在所有其他情况下,操作可以并发执行:写入永远不会锁定读取,读取永远不会锁定任何内容。
PostgreSQL uses a multiversion flavor of the SI protocol. Multiversion concurrency control implies that at any moment the database system can contain several ver- sions of one and the same row, so PostgreSQL can include an appropriate version into the snapshot rather than abort transactions that attempt to read stale data.
PostgreSQL 使用 SI 协议的多版本风格 。多版本并发控制意味着数据库系统在任何时候都可以包含同一行的多个版本,因此 PostgreSQL 可以将适当的版本包含到快照中,而不是中止尝试读取过时数据的事务。
Based on snapshots, PostgreSQL isolation differs from the requirements specified in the standard—in fact, it is even stricter. Dirty reads are forbidden by design. Technically, you can specify the Read Uncommitted level, but its behavior will be the same as that of Read Committed, so I am not going to mention this level anymore.
PostgreSQL基于快照的隔离与标准中规定的要求不同,实际上甚至更严格。设计上禁止脏读。从技术上讲,您可以指定“读未提交”级别,但其行为与“读已提交”的行为相同,因此我不会再提及此级别。
Repeatable Read allows neither non-repeatable nor phantom reads (even though it does not guarantee full isolation). But in some cases, there is a risk of losing changes at the Read Committed level.
可重复读既不允许不可重复读也不允许幻读(即使它不能保证完全隔离)。但在某些情况下,存在丢失读已提交级别的更改的风险。
lost dirty non-repeatable phantom other
updates reads reads reads anomalies
Read Committed yes — yes yes yes
Repeatable Read — — — — yes
Serializable — — — — —
Before exploring the internal mechanisms of isolation, let’s discuss each of the three isolation levels from the user’s perspective.
在探讨隔离的内部机制之前,我们先从用户的角度分别讨论一下这三个隔离级别。
For this purpose, we are going to create the accounts table; Alice and Bob will have $1,000 each, but Bob will have two accounts:
为此,我们将创建账户表; Alice 和 Bob 各有 1,000 美元,但 Bob 有两个帐户:
=> CREATE TABLE accounts(
id integer PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
client text,
amount numeric
);
=> INSERT INTO accounts VALUES
(1, 'alice', 1000.00), (2, 'bob', 100.00), (3, 'bob', 900.00);
No dirty reads. It is easy to check that reading dirty data is not allowed. Let’s start a transaction. By default, it uses the Read Committed1 isolation level:
没有脏读。很容易检查不允许读取脏数据。让我们开始一个事务。默认情况下,它使用读已提交 隔离级别:
=> BEGIN;
=> SHOW transaction_isolation;
transaction_isolation
-----------------------------
read committed
(1 row)
To be more exact, the default level is set by the following parameter, which can be changed as required:
更准确地说,默认级别由以下参数设置,可以根据需要更改:
=> SHOW default_transaction_isolation;
default_transaction_isolation
----------------------------------------
read committed
(1 row)
The opened transaction withdraws some funds from the customer account but does not commit these changes yet. It will see its own changes though, as it is always allowed:
已打开的事务从客户帐户中提取一些资金,但尚未提交这些更改。但它会看到自己的变化,因为它总是被允许的:
=> UPDATE accounts SET amount = amount - 200 WHERE id = 1;
=> SELECT * FROM accounts WHERE client = 'alice';
id | client | amount
--------+--------------------+--------------------
1 | alice | 800.00
(1 row)
In the second session, we start another transaction that will also run at the Read Committed level:
在第二个会话中,我们启动另一个事务,该事务也将在读已提交级别运行:
| => BEGIN;
| => SELECT * FROM accounts WHERE client = 'alice';
| id | client | amount
| ----+--------+--------------------
| 1 | alice | 1000.00
| (1 row)
Predictably, the second transaction does not see any uncommitted changes—dirty reads are forbidden.
可以预见的是,第二个事务不会看到任何未提交的更改——脏读是被禁止的。
Non-repeatable reads. Now let the first transaction commit the changes. Then the second transaction will repeat the same query:
不可重复读取。现在让第一个事务提交更改。然后第二笔事务将重复相同的查询:
=> COMMIT;
| => SELECT * FROM accounts WHERE client = 'alice';
| id | client | amount
| ----+--------+--------------------
| 1 | alice | 800.00
| (1 row)
| => COMMIT;
The query receives an updated version of the data—and it is exactly what is under- stood by the non-repeatable read anomaly, which is allowed at the Read Committed level.
查询接收数据的更新版本,这正是不可重复读取异常所理解的,这是在读已提交级别允许的。
A practical insight: in a transaction, you must not take any decisions based on the data read by the previous operator, as everything can change in between. Here is an example whose variations appear in the application code so often that it can be considered a classic anti-pattern:
一个实用的见解:在事务中,您 不能根据前一个操作 读取的数据做出任何决定,因为在此期间一切都可能发生变化。下面是一个示例,其变体经常出现在应用程序代码中,因此可以将其视为经典的反模式:
IF (SELECT amount FROM accounts WHERE id = 1) >= 1000 THEN
UPDATE accounts SET amount = amount - 1000 WHERE id = 1;
END IF;
During the time that passes between the check and the update, other transactions can freely change the state of the account, so such a “check” is absolutely useless. For better understanding, you can imagine that random operators of other transac- tions are “wedged” between the operators of the current transaction. For example, like this:
在检查和更新之间的时间里,其他事务可以自由地改变账户的状态,所以这样的“检查”是绝对没有用的。为了更好地理解,您可以想象其他事务的随机运算符“楔入”当前事务的运算符之间。例如,像这样:
IF (SELECT amount FROM accounts WHERE id = 1) >= 1000 THEN
UPDATE accounts SET amount = amount - 200 WHERE id = 1;
COMMIT;
UPDATE accounts SET amount = amount - 1000 WHERE id = 1;
END IF;
If everything goes wrong as soon as the operators are rearranged, then the code is incorrect. Do not delude yourself that you will never get into this trouble: any- thing that can go wrong will go wrong. Such errors are very hard to reproduce, and consequently, fixing them is a real challenge.
如果重新排列运算符后一切都出错了,那么代码就是错误的。不要自欺欺人地认为自己永远不会陷入这个麻烦:任何可能出错的事情都会出错。此类错误很难重现,因此修复它们是一个真正的挑战。
How can you correct this code? There are several options:
您如何更正此代码?有几种选择:
Replace procedural code with declarative one.
将过程性代码替换为声明性代码。
For example, in this particular case it is easy to turn an IF statement into a CHECK constraint:
例如,在这种特殊情况下,很容易将 IF 语句转换为 CHECK 约束:
ALTER TABLE accounts
ADD CHECK amount >= 0;
Now you do not need any checks in the code: it is enough to simply run the command and handle the exception that will be raised if an integrity con- straint violation is attempted.
现在您不需要在代码中进行任何检查:只需运行命令并处理尝试违反完整性约束时将引发的异常就足够了。
Use a single SQL operator.
Data consistency can be compromised if a transaction gets committed within the time gap between operators of another transaction, thus changing data visibility. If there is only one operator, there are no such gaps.
如果一个事务在另一个事务的操作者之间的时间间隔内提交,则数据一致性可能会受到影响,从而改变数据可见性。如果只有一名操作员,则不存在这样的间隙。
PostgreSQL has enough capabilities to solve complex tasks with a single SQL statement. In particular, it offers common table expressions (CTE) that can contain operators like INSERT, UPDATE, DELETE, as well as the ON CONFLICT operator that implements the following logic: insert the row if it does not exist, otherwise perform an update.
PostgreSQL 有足够的能力使用单个 SQL 语句解决复杂的任务。特别是,它提供了公共表表达式 (CTE),其中可以包含 INSERT、UPDATE、DELETE 等运算符,以及实现以下逻辑的 ON CONFLICT 运算符:如果行不存在则插入该行,否则执行更新。
Apply explicit locks.
The last resort is to manually set an exclusive lock on all the required rows (SELECT FOR UPDATE) or even on the whole table (LOCK TABLE). This approach always works, but it nullifies all the advantages of MVCC: some operations that could be executed concurrently will run sequentially.
最后的手段是手动在所有必需的行上设置排他锁 (SELECT FOR UPDATE) 甚至整个表 (LOCK TABLE)。这种方法 总是有效,但它抵消了 MVCC 的所有优点:一些可以并发执行的操作将按顺序运行。
Read skew. However, it is not all that simple. The PostgreSQL implementation allows other, less known anomalies, which are not regulated by the standard.
读取倾斜。然而,事情并非那么简单。 PostgreSQL 实现允许其他鲜为人知的异常,这些异常不受标准监管。
Suppose the first transaction has started a money transfer between Bob’s accounts:
假设第一笔事务已开始 Bob 账户之间的转账:
=> BEGIN;
=> UPDATE accounts SET amount = amount - 100 WHERE id = 2;
Meanwhile, the other transaction starts looping through all Bob’s accounts to cal- culate their total balance. It begins with the first account (seeing its previous state, of course):
与此同时,另一笔事务开始循环访问鲍勃的所有账户以计算其总余额。它从第一个帐户开始(当然,查看其之前的状态):
| => BEGIN;
| => SELECT amount FROM accounts WHERE id = 2;
| amount
| --------------------
| 100.00
| (1 row)
At this moment, the first transaction completes successfully:
此时,第一笔事务成功完成:
=> UPDATE accounts SET amount = amount + 100 WHERE id = 3;
=> COMMIT;
The second transaction reads the state of the second account (and sees the already updated value):
第二笔事务读取第二个账户的状态(并看到已经更新的值):
| => SELECT amount FROM accounts WHERE id = 3;
| amount
| --------------------
| 1000.00
| (1 row)
| => COMMIT;
As a result, the second transaction gets $1,000 because it has read incorrect data. Such an anomaly is called read skew.
结果,第二笔事务因为读取了错误的数据而获得了 1,000 美元。这种异常称为读倾斜。
How can you avoid this anomaly at the Read Committed level? The answer is obvi- ous: use a single operator. For example, like this:
如何在读已提交级别避免这种异常情况?答案很明显:使用单个操作。例如,像这样:
SELECT sum(amount) FROM accounts WHERE client = 'bob';
I have been stating so far that data visibility can change only between operators, but is it really so? What if the query is running for a long time? Can it see different parts of data in different states in this case?
到目前为止,我一直在说数据可见性只能在操作 之间改变,但事实真的如此吗?如果查询运行很长时间怎么办?在这种情况下它可以看到不同状态下的数据的不同部分吗?
Let’s check it out. A convenient way to do it is to add a delay to an operator by calling the pg_sleep function. Then the first row will be read at once, but the second row will have to wait for two seconds:
让我们来看看。一种方便的方法是通过调用 pg_sleep 函数向运算符添加延迟。那么第一行将立即被读取,但第二行将不得不等待两秒钟:
=> SELECT amount, pg_sleep(2) -- two seconds
FROM accounts WHERE client = 'bob';
While this statement is being executed, let’s start another transaction to transfer the money back:
当这条语句正在执行时,让我们开始另一笔事务来将钱转回来:
| => BEGIN;
| => UPDATE accounts SET amount = amount + 100 WHERE id = 2;
| => UPDATE accounts SET amount = amount - 100 WHERE id = 3;
| => COMMIT;
The result shows that the operator has seen all the data in the state that corre- sponds to the beginning of its execution, which is certainly correct:
结果表明,操作员 已经看到了与其执行开始相对应的状态的所有数据,这当然是正确的:
amount | pg_sleep
---------+--------------------
0.00 |
1000.00 |
(2 rows)
But it is not all that simple either. If the query contains a function that is de- clared VOLATILE, and this function executes another query, then the data seen by this nested query will not be consistent with the result of the main query.
但事情也并非那么简单。如果查询包含声明为 VOLATILE 的函数,并且该函数执行另一个查询,则该嵌套查询看到的数据将与主查询的结果不一致。
Let’s check the balance in Bob’s accounts using the following function:
让我们使用以下函数检查 Bob 帐户中的余额:
=> CREATE FUNCTION get_amount(id integer) RETURNS numeric
AS $$
SELECT amount FROM accounts a WHERE a.id = get_amount.id;
$$ VOLATILE LANGUAGE sql;
=> SELECT get_amount(id), pg_sleep(2)
FROM accounts WHERE client = 'bob';
We will transfer the money between the accounts once again while our delayed query is being executed:
在执行延迟查询时,我们将再次在帐户之间转移资金:
| => BEGIN;
| => UPDATE accounts SET amount = amount + 100 WHERE id = 2;
| => UPDATE accounts SET amount = amount - 100 WHERE id = 3;
| => COMMIT;
In this case, we are going to get inconsistent data —$100 has been lost:
在这种情况下,我们将得到不一致的数据——损失了 100 美元:
get_amount | pg_sleep
-------------+--------------------
100.00 |
800.00 |
(2 rows)
I would like to emphasize that this effect is possible only at the Read Committed isolation level, and only if the function is VOLATILE. The trouble is that PostgreSQL uses exactly this isolation level and this volatility category by default. So we have to admit that the trap is set in a very cunning way.
我想强调的是,这种效果仅在 读已提交隔离级别下才有可能,并且仅当函数为 VOLATILE 时才可能。问题是 PostgreSQL 默认使用这个隔离级别和这个VOLATILE类别。所以我们不得不承认,这个陷阱设置得非常狡猾。
Read skew instead of lost updates. The read skew anomaly can also occur within a single operator during an update—even though in a somewhat unexpected way.
读倾斜而不是丢失更新。更新期间,读倾斜异常也可能发生在单个运算符内,即使是以某种意想不到的方式发生。
Let’s see what happens if two transactions try to modify one and the same row. Bob currently has a total of $1,000 in two accounts:
让我们看看如果两个事务尝试修改同一行会发生什么。 Bob 目前在两个账户中总共有 1,000 美元:
=> SELECT * FROM accounts WHERE client = 'bob';
id | client | amount
-----+--------+--------------------
2 | bob | 200.00
3 | bob | 800.00
(2 rows)
Start a transaction that will reduce Bob’s balance:
启动一个会减少 Bob 余额的事务:
=> BEGIN;
=> UPDATE accounts SET amount = amount - 100 WHERE id = 3;
At the same time, the other transaction will be calculating the interest for all cus- tomer accounts with the total balance of $1,000 or more:
同时,另一笔事务将计算所有总余额为 1,000 美元或以上的客户账户的利息:
| => UPDATE accounts SET amount = amount * 1.01
| WHERE client IN (
| SELECT client
| FROM accounts
| GROUP BY client
| HAVING sum(amount) >= 1000
| );
The UPDATE operator execution virtually consists of two stages. First, the rows to be updated are selected based on the provided condition. Since the first transaction is not committed yet, the second transaction cannot see its result, so the selection of rows picked for interest accrual is not affected. Thus, Bob’s accounts satisfy the condition, and his balance must be increased by $10 once the UPDATE operation completes.
UPDATE 运算符的执行实际上由两个阶段组成。首先,根据提供的条件选择要更新的行。由于第一个事务尚未提交,第二个事务无法看到其结果,因此为应计利息选择的行的选择不受影响。因此,Bob 的账户满足条件,一旦 UPDATE 操作完成,他的余额必须增加 10 美元。
At the second stage, the selected rows are updated one by one. The second trans- action has to wait because the row with id = 3 is locked: it is being updated by the first transaction.
在第二阶段,所选行被逐一更新。第二个事务必须等待,因为 id = 3 的行被锁定:它正在被第一个事务更新。
Meanwhile, the first transaction commits its changes:
同时,第一个事务提交其更改:
=> COMMIT;
=> SELECT * FROM accounts WHERE client = 'bob';
id | client | amount
----+--------+--------------------
2 | bob | 202.0000
3 | bob | 707.0000
(2 rows)
On the one hand, the UPDATE command must not see any changes made by the first transaction. But on the other hand, it must not lose any committed changes.
一方面,UPDATE 命令不能看到第一个事务所做的任何更改。但另一方面,它不能丢失任何已提交的更改。
Once the lock is released, the UPDATE operator re-readsthe row to be updated (but
一旦锁被释放,UPDATE操作符就会重新读取要更新的行(但是
only this row!). As a result, Bob gets 9ofinterest,basedonthetotalof900. But if he had $900, his accounts should not have been included into the query results in the first place.
只有这一行!)。结果,Bob 得到 9ofinterest,basedonthetotalof 900。但是,如果他有 900 美元,那么他的帐户一开始就不应该包含在查询结果中。
Thus, our transaction has returned incorrect data: different rows have been read from different snapshots. Instead of a lost update, we observe the read skew anomaly again.
因此,我们的事务返回了不正确的数据:从不同的快照读取了不同的行。我们再次观察到读取倾斜异常,而不是丢失更新。
Lost updates. However, the trick of re-reading the locked row will not help against lost updates if the data is modified by different SQL operators.
丢失更新。然而,如果数据被不同的 SQL 操作符修改,重新读取锁定行的技巧将无助于防止丢失更新。
Here is an example that we have already seen. The application reads and registers (outside of the database) the current balance of Alice’s account:
这是我们已经见过的一个例子。应用程序读取并注册(在数据库外部)Alice 帐户的当前余额:
=> BEGIN;
=> SELECT amount FROM accounts WHERE id = 1;
amount
--------------------
800.00
(1 row)
Meanwhile, the other transaction does the same:
与此同时,另一笔事务也做了同样的事情:
| => BEGIN;
| => SELECT amount FROM accounts WHERE id = 1;
| amount
| --------------------
| 800.00
| (1 row)
The first transaction increases the previously registered value by $100 and commits this change:
第一笔事务将之前注册的价值增加了 100 美元并提交了此更改:
=> UPDATE accounts SET amount = 800.00 + 100 WHERE id = 1
RETURNING amount;
amount
--------------------
900.00
(1 row)
UPDATE 1
=> COMMIT;
The second transaction does the same:
第二笔事务做同样的事情:
| => UPDATE accounts SET amount = 800.00 + 100 WHERE id = 1
| RETURNING amount;
| amount
| --------------------
| 900.00
| (1 row)
| UPDATE 1
| => COMMIT;
Unfortunately, Alice has lost $100
.The database system does not know that the registered value of $800
is somehow related to accounts.amount, so it cannot pre- vent the lost update anomaly. At the Read Committed isolation level, this code is incorrect.
不幸的是,Alice丢失了 $100
。数据库系统不知道 $800
的注册值与accounts.amount有某种关系,因此无法防止丢失更新异常。在读已提交隔离级别,此代码不正确。
No non-repeatable and phantom reads. As its name suggests, the Repeatable Read1isolation level must guarantee repeatable reading. Let’s check it and make sure that phantom reads cannot occur either. For this purpose, we are going to start a transaction that will revert Bob’s accounts to their previous state and create a new account for Charlie:
没有 不可重复读和幻读。顾名思义,Repeatable Read 隔离级别必须保证可重复读取。让我们检查一下并确保幻读也不会发生。为此,我们将启动一项事务,将 Bob 的帐户恢复到之前的状态,并为 Charlie 创建一个新帐户:
=> BEGIN;
=> UPDATE accounts SET amount = 200.00 WHERE id = 2;
=> UPDATE accounts SET amount = 800.00 WHERE id = 3;
=> INSERT INTO accounts VALUES
(4, 'charlie', 100.00);
=> SELECT * FROM accounts ORDER BY id;
id | client | amount
----+----------+--------------------
1 |alice | 900.00
2 |bob | 200.00
3 |bob | 800.00
4 |charlie | 100.00
(4 rows)
In the second session, let’s start another transaction, with the Repeatable Read level explicitly specified in the BEGIN command (the level of the first transaction is not important):
在第二个会话中,让我们开始另一个事务,并在 BEGIN 命令中显式指定可重复读取级别(第一个事务的级别并不重要):
| => BEGIN ISOLATION LEVEL REPEATABLE READ;
| => SELECT * FROM accounts ORDER BY id;
| id | client | amount
| ----+--------+--------------------
| 1 | alice | 900.00
| 2 | bob | 202.0000
| 3 | bob | 707.0000
| (3 rows)
Now the first transaction commits its changes, and the second transaction repeats the same query:
现在第一个事务提交其更改,第二个事务重复相同的查询:
=> COMMIT;
| => SELECT * FROM accounts ORDER BY id;
| id | client | amount
| ----+--------+--------------------
| 1 | alice | 900.00
| 2 | bob | 202.0000
| 3 | bob | 707.0000
| (3 rows)
| => COMMIT;
The second transaction still sees the same data as before: neither new rows nor row updates are visible. At this isolation level, you do not have to worry that something will change between operators.
第二个事务仍然看到与以前相同的数据:新行和行更新都不可见。在此隔离级别,您不必担心操作员之间会发生某些变化。
Serialization failures instead of lost updates. As we have already seen, if two trans- actions update one and the same row at the Read Committed level, it can cause the read skew anomaly: the waiting transaction has to re-read the locked row, so it sees the state of this row at a different point in time as compared to other rows.
序列化失败而不是丢失更新。正如我们已经看到的,如果两个事务在读已提交级别更新同一行,则可能会导致读倾斜异常:等待事务必须重新读取锁定的行,因此它会看到该行的状态与其他行相比,在不同时间点的行。
Such an anomaly is not allowed at the Repeatable Read isolation level, and if it does happen, the transaction can only be aborted with a serialization failure. Let’s check it out by repeating the scenario with interest accrual:
可重复读隔离级别不允许出现此类异常,如果确实发生,事务只能因序列化失败而中止。让我们通过重复应计利息的场景来检查一下:
=> SELECT * FROM accounts WHERE client = 'bob';
id | client | amount
-----+--------+--------------------
2 | bob | 200.00
3 | bob | 800.00
(2 rows)
=> BEGIN;
=> UPDATE accounts SET amount = amount - 100.00 WHERE id = 3;
| => BEGIN ISOLATION LEVEL REPEATABLE READ;
| => UPDATE accounts SET amount = amount * 1.01
| WHERE client IN (
| SELECT client
| FROM accounts
| GROUP BY client
| HAVING sum(amount) >= 1000
| );
=> COMMIT;
| ERROR: could not serialize access due to concurrent update
| => ROLLBACK;
The data remains consistent:
数据保持一致:
=> SELECT * FROM accounts WHERE client = 'bob';
id | client | amount
-----+--------+--------------------
2 | bob | 200.00
3 | bob | 700.00
(2 rows)
The same error will be raised by any concurrent row updates, even if they affect different columns.
任何并发的行更新都会引发相同的错误,即使它们影响不同的列。
We will also get this error if we try to update the balance based on the previously stored value:
如果我们尝试根据之前存储的值更新余额,我们也会收到此错误:
=> BEGIN ISOLATION LEVEL REPEATABLE READ;
=> SELECT amount FROM accounts WHERE id = 1;
amount
--------------------
900.00
(1 row)
| => BEGIN ISOLATION LEVEL REPEATABLE READ;
| => SELECT amount FROM accounts WHERE id = 1;
| amount
| --------------------
| 900.00
| (1 row)
=> UPDATE accounts SET amount = 900.00 + 100.00 WHERE id = 1
RETURNING amount;
amount
--------------------
1000.00
(1 row)
UPDATE 1
=> COMMIT;
| => UPDATE accounts SET amount = 900.00 + 100.00 WHERE id = 1
| RETURNING amount;
| ERROR: could not serialize access due to concurrent update
| => ROLLBACK;
A practical insight: if your application is using the Repeatable Read isolation level for write transactions, it must be ready to retry transactions that have been com- pleted with a serialization failure. For read-only transactions, such an outcome is impossible.
一个实用的见解:如果您的应用程序对写入事务使用可重复读取隔离级别,则它必须准备好重试已完成但出现序列化失败的事务。对于只读事务,这样的结果是不可能的。
Write skew. As we have seen, the PostgreSQL implementation of the Repeatable Read isolation level prevents all the anomalies described in the standard. But not all possible ones: no one knows how many of them exist. However, one important fact is proved for sure: snapshot isolation does not prevent only two anomalies, no matter how many other anomalies are out there.
写倾斜。正如我们所看到的,可重复读隔离级别的 PostgreSQL 实现可以防止标准中描述的所有异常情况。但并不是所有可能的情况:没有人知道其中有多少。然而,一个重要的事实已得到证实:快照隔离并不仅仅可以防止两种异常,无论存在多少其他异常。
The first one is write skew.
第一个是写倾斜。
Let’s define the following consistency rule: it is allowed to have a negative balance in some of the customer’s accounts as long as the total balance is non-negative.
我们定义如下一致性规则:只要总余额非负,就允许客户的部分账户出现负余额。
The first transaction gets the total balance of Bob’s accounts:
第一笔事务获取 Bob 账户的总余额:
=> BEGIN ISOLATION LEVEL REPEATABLE READ;
=> SELECT sum(amount) FROM accounts WHERE client = 'bob';
sum
--------------------
900.00
(1 row)
The second transaction gets the same sum:
第二笔事务获得相同的金额:
| => BEGIN ISOLATION LEVEL REPEATABLE READ;
| => SELECT sum(amount) FROM accounts WHERE client = 'bob';
| sum
| --------------------
| 900.00
| (1 row)
The first transaction fairly assumes that it can debit one of the accounts by $600:
第一笔事务公平地假设它可以借记其中一个账户 600 美元:
=> UPDATE accounts SET amount = amount - 600.00 WHERE id = 2;
The second transaction comes to the same conclusion, but debits the other ac- count:
第二笔事务得出相同的结论,但借记另一个账户:
| => UPDATE accounts SET amount = amount - 600.00 WHERE id = 3;
| => COMMIT;
=> COMMIT;
=> SELECT * FROM accounts WHERE client = 'bob';
id | client | amount
----+--------+--------------------
2 | bob | -400.00
3|bob | 100.00
(2 rows)
Bob’s total balance is now negative, although both transactions would have been correct if run separately.
鲍勃的总余额现在为负数,尽管如果单独运行的话,两笔事务都是正确的。
Read-only transaction anomaly. The read-only transaction anomaly is the second and the last one allowed at the Repeatable Read isolation level. To observe this anomaly, we have to run three transactions: two of them are going to update the data, while the third one will be read-only.
只读事务异常。只读事务异常是可重复读隔离级别允许的第二个也是最后一个异常。为了观察这种异常,我们必须运行三个事务:其中两个将更新数据,而第三个将是只读的。
But first let’s restore Bob’s balance:
但首先让我们恢复鲍勃的平衡:
=> UPDATE accounts SET amount = 900.00 WHERE id = 2;
=> SELECT * FROM accounts WHERE client = 'bob';
id | client | amount
-----+--------+--------------------
3 | bob | 100.00
2 | bob | 900.00
(2 rows)
The first transaction calculates the interest to be accrued on Bob’s total balance and adds this sum to one of his accounts:
第一笔事务计算 Bob 总余额应计的利息,并将这笔金额添加到他的一个账户中:
=> BEGIN ISOLATION LEVEL REPEATABLE READ; -- 1
=> UPDATE accounts SET amount = amount + (
SELECT sum(amount) FROM accounts WHERE client = 'bob'
) * 0.01
WHERE id = 2;
Then the second transaction withdraws some money from Bob’s other account and commits this change:
然后第二笔事务从 Bob 的另一个账户中提取一些钱并提交此更改:
| => BEGIN ISOLATION LEVEL REPEATABLE READ; -- 2
| => UPDATE accounts SET amount = amount - 100.00 WHERE id = 3;
| => COMMIT;
If the first transaction gets committed at this point, there will be no anomalies: we could assume that the first transaction is committed before the second one (but not vice versa—the first transaction had seen the state of account with id = 3 before any updates were made by the second transaction).
如果此时提交第一个事务,则不会出现异常:我们可以假设第一个事务在第二个事务之前提交(但反之亦然——第一个事务在任何事务之前都已看到 id = 3 的帐户状态)更新是由第二笔事务进行的)。
But let’s imagine that at this very moment we start a ready-only transaction to query an account that is not affected by the first two transactions:
但让我们想象一下,此时我们启动了一个只读事务来查询一个不受前两个事务影响的账户:
| => BEGIN ISOLATION LEVEL REPEATABLE READ; -- 3
| => SELECT * FROM accounts WHERE client = 'alice';
| id | client | amount
| ----+--------+--------------------
| 1 | alice | 1000.00
| (1 row)
And only now will the first transaction get committed:
只有现在第一个事务才会被提交:
=> COMMIT;
Which state should the third transaction see at this point? Having started, it could see the changes made by the second transaction (which had already been commit- ted), but not by the first one (which had not been committed yet). But as wehave already established, the second transaction should be treated as if it were started after the first one. Any state seen by the third transaction will be inconsistent—this is exactly what is meant by the read-only transaction anomaly:
此时第三笔事务应该看到哪种状态?启动后,它可以看到第二个事务(已提交)所做的更改,但看不到第一个事务(尚未提交)所做的更改。但正如我们已经确定的那样,第二个事务应该被视为是在第一个事务之后开始的。第三个事务看到的任何状态都将是不一致的——这正是只读事务异常的含义:
| => SELECT * FROM accounts WHERE client = 'bob';
| id | client | amount
| ----+---------+--------------------
| 2 | bob | 900.00
| 3 | bob | 0.00
| (2 rows)
| => COMMIT;
The Serializable isolation level prevents all possible anomalies. This level is vir- tually built on top of snapshot isolation. Those anomalies that do not occur at the Repeatable Read isolation level (such as dirty, non-repeatable, or phantom reads) cannot occur at the Serializable level either. And those two anomalies that do occur (write skew and read-only transaction anomalies) get detected in a special way to abort the transaction, causing an already familiar serialization failure.
可串行化隔离级别可防止所有可能的异常情况。该级别实际上是 建立 在快照隔离之上的。那些在可重复读隔离级别上不会发生的异常(例如脏读、不可重复读或幻读)也不会在可串行级别上发生。确实发生的这两种异常(写倾斜 和只读事务异常)会以特殊方式检测到,以中止事务,从而导致已经熟悉的序列化失败。
No anomalies. Let’s make sure that our write skew scenario will eventually end
无异常情况。让我们确保我们的写倾斜场景最终会结束
with a serialization failure:
序列化失败:
=> BEGIN ISOLATION LEVEL SERIALIZABLE;
=> SELECT sum(amount) FROM accounts WHERE client = 'bob';
sum
--------------------
910.0000
(1 row)
| => BEGIN ISOLATION LEVEL SERIALIZABLE;
| => SELECT sum(amount) FROM accounts WHERE client = 'bob';
| sum
| --------------------
| 910.0000
| (1 row)
=> UPDATE accounts SET amount = amount - 600.00 WHERE id = 2;
| => UPDATE accounts SET amount = amount - 600.00 WHERE id = 3;
| => COMMIT;
| COMMIT
=> COMMIT;
ERROR: could not serialize access due to read/write dependencies
among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during
commit attempt.
HINT: The transaction might succeed if retried.
The scenario with the read-only transaction anomaly will lead to the same error.
只读事务异常的场景也会导致同样的错误。
Deferring a read-only transaction. To avoid situations when a read-only transac- tion can cause an anomaly that compromises data consistency, PostgreSQL offers an interesting solution: this transaction can be deferred until its execution be- comes safe. It is the only case when a SELECT statement can be blocked by row updates.
推迟只读事务。为了避免只读事务可能导致损害数据一致性的异常情况,PostgreSQL 提供了一个有趣的解决方案:可以推迟该事务,直到其执行变得安全为止。这是 SELECT 语句可能被行更新阻塞的唯一情况。
We are going to check it out by repeating the scenario that demonstrated the read- only transaction anomaly:
我们将通过重复演示只读事务异常的场景来检查它:
=> UPDATE accounts SET amount = 900.00 WHERE id = 2;
=> UPDATE accounts SET amount = 100.00 WHERE id = 3;
=> SELECT * FROM accounts WHERE client = 'bob' ORDER BY id;
id | client | amount
----+--------+--------------------
2 | bob | 900.00
3 | bob | 100.00
(2 rows)
=> BEGIN ISOLATION LEVEL SERIALIZABLE; -- 1
=> UPDATE accounts SET amount = amount + (
SELECT sum(amount) FROM accounts WHERE client = 'bob'
) * 0.01
WHERE id = 2;
| => BEGIN ISOLATION LEVEL SERIALIZABLE; -- 2
| => UPDATE accounts SET amount = amount - 100.00 WHERE id = 3;
| => COMMIT;
Let’s explicitly declare the third transaction as READ ONLY and DEFERRABLE:
让我们显式地将第三个事务声明为 READ ONLY 和 DEFERRABLE:
| => BEGIN ISOLATION LEVEL SERIALIZABLE READ ONLY DEFERRABLE; -- 3
| => SELECT * FROM accounts WHERE client = 'alice';
An attempt to run the query blocks the transaction—otherwise, it would have caused an anomaly.
尝试运行查询会阻止事务,否则会导致异常。
And only when the first transaction is committed, the third one can continue its execution:
并且只有当第一个事务提交后,第三个事务才能继续执行:
=> COMMIT;
| id | client | amount
| ----+--------+--------------------
| 1 | alice | 1000.00
| (1 row)
| => SELECT * FROM accounts WHERE client = 'bob';
| id | client | amount
| ----+--------+--------------------
| 2 | bob | 910.0000
| 3 | bob | 0.00
| (2 rows)
| => COMMIT;
Thus, if an application uses the Serializable isolation level, it must be ready to retry transactions that have ended with a serialization failure. (The Repeatable Read level requires the same approach unless the application is limited to read-only transactions.)
因此,如果应用程序使用可串行化隔离级别,则它必须准备好重试因序列化失败而结束的事务。 (可重复读取级别需要相同的方法,除非应用程序仅限于只读事务。)
The Serializable isolation level brings ease of programming, but the price you pay is the overhead incurred by anomaly detection and forced termination of a certain fraction of transactions. You can lower this impact by explicitly using the READ ONLY clause when declaring read-only transactions. But the main questions is, of course, how big the fraction of aborted transactions is—since these transactions will have to be retried. It would have been not so bad if PostgreSQL aborted only those transactions that result in data conflicts and are really incompatible. But such an approach would inevitably be too resource-intensive, as it would involve tracking operations on each row.
Serialized 隔离级别带来了编程的便利性,但您付出的代价是异常检测和强制终止某部分事务所产生的开销。您可以在声明只读事务时显式使用 READ ONLY 子句来降低这种影响。但主要问题当然是中止事务的比例有多大——因为这些事务必须重试。如果 PostgreSQL 只中止那些导致数据冲突并且确实不兼容的事务,情况还不错。但这种方法不可避免地会过于占用资源,因为它会涉及对每一行的跟踪操作。
The current implementation allows false positives: PostgreSQL can abort some ab- solutely safe transactions that are simply out of luck. Their “luck” depends on many factors, such as the presence of appropriate indexes or the amount of RAM available, so the actual behavior is hard to predict in advance.
当前的实现允许误报:PostgreSQL 可以中止一些绝对安全的事务,但这些事务只是运气不好。他们的“运气”取决于许多因素,例如是否存在适当的索引或可用的 RAM 量,因此很难提前预测实际行为。
If you use the Serializable level, it must be observed by all transactions of the ap- plication. When combined with other levels, Serializable behaves as Repeatable Read without any notice. So if you decide to use the Serializable level, it makes sense to modify the default_transaction_isolation parameter value accordingly— even though someone can still overwrite it by explicitly setting a different level. There are also other restrictions; for example, queries run at the Serializable level cannot be executed on replicas. And although the functionality of this level is constantly being improved, the current limitations and overhead make it less at- tractive.
如果您使用可序列化级别,则应用程序的所有事务都必须遵守该级别。当与其他级别结合使用时,Serialized 表现为可重复读取,无需任何通知。因此,如果您决定使用 Serialized 级别,则相应地修改 default_transaction_isolation 参数值是有意义的 - 即使有人仍然可以通过显式设置不同的级别来覆盖它。还有其他限制;例如,在可序列化级别运行的查询无法在副本上执行。尽管该级别的功能不断得到改进,但当前的限制和开销使其吸引力下降。
2.4 使用哪种隔离级别?
Read Committed is the default isolation level in PostgreSQL, and apparently it is this level that is used in the vast majority of applications. This level can be convenient because it allows aborting transactions only in case of a failure; it does not abort any transactions to preserve data consistency. In other words, serialization failures cannot occur, so you do not have to take care of transaction retries. The downside of this level is a large number of possible anomalies, which have been discussed in detail above. A developer has to keep them in mind all the time and write the code in a way that prevents their occurrence. If it is impossible to define all the needed actions in a single SQL statement, then you have to resort to explicit locking. The toughest part is that the code is hard to test for errors related to data inconsistency; such errors can appear in unpredictable and barely reproducible ways, so they are very hard to fix too.
读已提交是 PostgreSQL 中的默认隔离级别,显然绝大多数应用程序都使用此级别。这个级别很方便,因为它只允许在发生故障时中止事务;它不会中止任何事务以保持数据一致性。换句话说,不会发生序列化失败,因此您不必处理事务重试。该级别的缺点是存在大量可能的异常,这已在上面详细讨论过。开发人员必须始终牢记它们,并以防止它们发生的方式编写代码。如果 不可能在单个 SQL 语句中定义所有所需的操作,那么您必须诉诸 显式锁定。最困难的部分是代码很难测试与数据不一致相关的错误;此类错误可能以不可预测且几乎不可重现的方式出现,因此也很难修复。
The Repeatable Read isolation level eliminates some of the inconsistency prob- lems, but alas, not all of them. Therefore, you must not only remember about the remaining anomalies, but also modify the application to correctly handle serializa- tion failures, which is certainly inconvenient. However, for read-only transactions this level is a perfect complement to the Read Committed level; it can be very useful for cases like building reports that involve multiple SQL queries.
可重复读隔离级别消除了一些不一致问题,但遗憾的是,并不能消除全部问题。因此,您不仅必须记住剩余的异常,而且还必须修改应用程序以正确处理序列化失败,这当然很不方便。然而,对于只读事务,该级别是对读已提交级别的完美补充;它对于构建涉及多个 SQL 查询的报告等情况非常有用。
And finally, the Serializable isolation level allows you not to worry about data con- sistency at all, which simplifies writing the code to a great extent. The only thing required from the application is the ability to retry any transaction that is aborted with a serialization failure. However, the number of aborted transactions and as- sociated overhead can significantly reduce system throughput. You should also keep in mind that the Serializable level is not supported on replicas and cannot be combined with other isolation levels.
最后,Serializable隔离级别使您完全不用担心数据一致性,这在很大程度上简化了代码的编写。应用程序唯一需要的就是能够重试因Serializable失败而中止的任何事务。然而,中止事务的数量和相关的开销会显着降低系统吞吐量。您还应该记住,副本不支持Serializable化级别,并且不能与其他隔离级别结合使用。
更多阅读:
数据库内核杂谈(十):事务、隔离、并发(1)_数据库_顾仲贤_InfoQ精选文章
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。