mysql binlog 文件由 event 组成,event 有不同的类型,而xid_event 表示一个事务的提交操作。
执行一条插入语句
insert into t1 values(16);
show binlog events in 'mysql-bin.000003';
log_name | pos | event_type | server_id | end_log_pos | info
| mysql-bin.000003 | 2005 | gtid | 1013307 | 2070 | set @@session.gtid_next= 'f971d5f1-d450-11ec-9e7b-5254000a56df:11' |
| mysql-bin.000003 | 2070 | query | 1013307 | 2142 | begin |
| mysql-bin.000003 | 2142 | table_map | 1013307 | 2187 | table_id: 109 (test.t1) |
| mysql-bin.000003 | 2187 | write_rows | 1013307 | 2227 | table_id: 109 flags: stmt_end_f |
| mysql-bin.000003 | 2227 | xid | 1013307 | 2258 | commit /* xid=121 */
其中xid =121,表示这个事务提交操作。
当事务提交时,在 binlog 依赖的内部 xa 中,额外添加了 xid 结构,binlog 有多种数据类型:
- statement 格式,记录为基本语句,包含 commit
- row 格式,记录为基于行
- mixed 格式,日志记录使用混合格式
不论是 statement 还是 row 格式,binlog 都会添加一个 xid_event 作为事务的结束,该事件记录了事务的 id 也就是 xid,在 mysql 进行崩溃恢复时根据 binlog 中提交的情况来决定如何恢复。
mysql 内部维护了一个全局变量 global_query_id,每次执行语句的时候将它赋值给 query_id,然后给这个变量加 1。如果当前语句是这个事务执行的第一条语句,那么 mysql 还会同时把 query_id 赋值给这个事务的 xid。
而 global_query_id 是一个纯内存变量,重启之后就清零了。所以你就知道了,在同一个数据库实例中,不同事务的 xid 也是有可能相同的。
但是 mysql 重启之后会重新生成新的 binlog 文件,这就保证了,同一个 binlog 文件里,xid 一定是惟一的。
虽然 mysql 重启不会导致同一个 binlog 里面出现两个相同的 xid,但是如果 global_query_id 达到上限后,就会继续从 0 开始计数。从理论上讲,还是就会出现同一个 binlog 里面出现相同 xid 的场景。
因为 global_query_id 定义的长度是 8 个字节,这个自增值的上限是 264-1。要出现这种情况,必须是下面这样的过程:
执行一个事务,假设 xid 是 a;
接下来执行 264次查询语句,让 global_query_id 回到 a;
再启动一个事务,这个事务的 xid 也是 a。
不过,264这个值太大了,大到你可以认为这个可能性只会存在于理论上。
当有事务提交时
二阶段提交
步骤如下
- innodb 进入 prepare 阶段,并且 write/sync redo log,写 redo log,将事务的 xid 写入到 redo 日志中,binlog 不作任何操作
- 进行 write/sync binlog,写 binlog 日志,也会把 xid 写入到 binlog
- 调用 innodb 引擎的 commit 完成事务的提交,将 commit 信息写入到 redo 日志中
mysql崩溃恢复会有什么操作
如果 redo log 里面的事务是完整的,也就是已经有了 commit 标识,则直接提交;
如果 redo log 里面的事务只有完整的 prepare,则判断对应的事务 binlog 是否存在并完整:
a. 如果是,则提交事务;
b. 否则,回滚事务。
mysql 怎么知道 binlog 是完整的
一个事务的 binlog 是有完整格式的:statement 格式的 binlog,最后会有 commit;
row 格式的 binlog,最后会有一个 xid event。
崩溃恢复的时候,会按顺序扫描 redo log:
如果碰到既有 prepare、又有 commit 的 redo log,就直接提交;
如果碰到只有 parepare、而没有 commit 的 redo log,就拿着 xid 去 binlog 找对应的事务。
mysql解释如下
扫描最后一个binlog获取xid生成一个hash table,再扫描redo将checkpoint之后所有xid,如果xid在hash table里面则提交,不在则回滚。
/**
mysqld server recovers from last crashed binlog.
@param[in] binlog_file_reader binlog_file_reader of the crashed binlog.
@param[out] valid_pos the position of the last valid transaction or
event(non-transaction) of the crashed binlog.
valid_pos must be non-null.
after a crash, storage engines may contain transactions that are
prepared but not committed (in theory any engine, in practice
innodb). this function uses the binary log as the source of truth
to determine which of these transactions should be committed and
which should be rolled back.
the function collects the xids of all transactions that are
completely written to the binary log into a hash, and passes this
hash to the storage engines through the ha_recover function in the
handler interface. this tells the storage engines to commit all
prepared transactions that are in the set, and to roll back all
prepared transactions that are not in the set.
to compute the hash, this function iterates over the last binary log
only (i.e. it assumes that 'log' is the last binary log). it
instantiates each event. for xid-events (i.e. commit to innodb), it
extracts the xid from the event and stores it in the hash.
it is enough to iterate over only the last binary log because when
the binary log is rotated we force engines to commit (and we fsync
the old binary log).
@retval false success
@retval true out of memory, or storage engine returns error.
*/
static bool binlog_recover(binlog_file_reader *binlog_file_reader,
my_off_t *valid_pos) {
bool res = false;
binlog::tools::iterator it(binlog_file_reader);
it.set_copy_event_buffer();
/*
the flag is used for handling the case that a transaction
is partially written to the binlog.
*/
bool in_transaction = false;
int memory_page_size = my_getpagesize();
{
mem_root mem_root(key_memory_binlog_recover_exec, memory_page_size);
mem_root_unordered_set xids(&mem_root) ;
/*
now process events in the queue. queue is dynamically changed
everytime we process an event. this may be a bit suboptimal
since it adds an indirection, but it helps to generalize the
usage of the transaction payload event (which unfolds into
several events into the queue when it is processed).
*/
for (log_event *ev = it.begin(); !res && (ev != it.end()); ev = it.next()) {
switch (ev->get_type_code()) {
// may be begin, middle or end of a transaction
case binary_log::query_event: {
// starts a transaction
if (!strcmp(((query_log_event *)ev)->query, "begin"))
in_transaction = true;
// ends a transaction
if (!strcmp(((query_log_event *)ev)->query, "commit")) {
assert(in_transaction == true);
in_transaction = false;
}
// starts and ends a transaction
if (is_atomic_ddl_event(ev)) {
assert(in_transaction == false);
auto qev = dynamic_cast(ev);
assert(qev != nullptr);
res = (qev == nullptr || !xids.insert(qev->ddl_xid).second);
}
break;
}
// ends a transaction
case binary_log::xid_event: {
assert(in_transaction == true);
in_transaction = false;
xid_log_event *xev = dynamic_cast(ev);
assert(xev != nullptr);
res = (xev == nullptr || !xids.insert(xev->xid).second);
break;
}
default: {
break;
}
}
/*
recorded valid position for the crashed binlog file
which did not contain incorrect events. the following
positions increase the variable valid_pos:
1 -
...
<---> here is valid <--->
gtid
begin
...
commit
...
2 -
...
<---> here is valid <--->
gtid
ddl/utility
...
in other words, the following positions do not increase
the variable valid_pos:
1 -
gtid
<---> here is valid <--->
...
2 -
gtid
begin
<---> here is valid <--->
...
*/
if (!in_transaction && !is_gtid_event(ev))
*valid_pos = binlog_file_reader->position();
delete ev;
ev = nullptr;
res = it.has_error();
}
/*
call ha_recover if and only if there is a registered engine that
does 2pc, otherwise in dbug builds calling ha_recover directly
will result in an assert. (production builds would be safe since
ha_recover returns right away if total_ha_2pc <= opt_log_bin.)
*/
res = res || (total_ha_2pc > 1 && ha_recover(&xids));
}
if (res) logerr(error_level, er_binlog_crash_recovery_failed);
return res;
}