随笔,PostgreSQL的clog属于日志还是数据,需要遵守write-WAL-before-data吗?

总结

从原理上来看,MVCC需要给定事务ID后,能查询到事务的状态。

在PG中事务状态可以从几个路径获取:

  1. 在快照中查询(活跃事务)
  2. 在元组头的状态为查询(不活跃事务)
  3. 在CLOG中查询(不活跃事务)

如果不看实现只看概念,不活跃事务提交状态也可以在XLOG中查询,CLOG可以视作一种XLOG commit/rollback日志的缓存、映射,一种事务提交状态的快速查询方式。

所以在write-WAL-before-data中,CLOG也会按照data来处理,只有XLOG属于WAL。

Postgresql中clog写盘实现SlruPhysicalWritePage

postgresql中clog使用SLRU机制读写,在Slru写盘前,会有保证xlog先写的机制:

  • group_lsn表示32个事务一组中最大的日志序列号(LSN)。
  • group_lsn主要用于事务提交非同步落盘的场景。
static bool
SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruWriteAll fdata)
{
	...
	if (shared->group_lsn != NULL)
	{
		/*
		 * We must determine the largest async-commit LSN for the page. This
		 * is a bit tedious, but since this entire function is a slow path
		 * anyway, it seems better to do this here than to maintain a per-page
		 * LSN variable (which'd need an extra comparison in the
		 * transaction-commit path).
		 */
		XLogRecPtr	max_lsn;
		int			lsnindex,
					lsnoff;

		lsnindex = slotno * shared->lsn_groups_per_page;
		max_lsn = shared->group_lsn[lsnindex++];
		for (lsnoff = 1; lsnoff < shared->lsn_groups_per_page; lsnoff++)
		{
			XLogRecPtr	this_lsn = shared->group_lsn[lsnindex++];

			if (max_lsn < this_lsn)
				max_lsn = this_lsn;    <<<<<<<<<<<<<<<<<<<<<<<<< 找到最大的LSN
		}

		if (!XLogRecPtrIsInvalid(max_lsn))
		{
			/*
			 * As noted above, elog(ERROR) is not acceptable here, so if
			 * XLogFlush were to fail, we must PANIC.  This isn't much of a
			 * restriction because XLogFlush is just about all critical
			 * section anyway, but let's make sure.
			 */
			START_CRIT_SECTION();
			XLogFlush(max_lsn);      <<<<<<<<<<<<<<<<<<<<<<<<< 先保证XLOG写到这个位点!
			END_CRIT_SECTION();
		}
	}
  ...
  if (pg_pwrite(fd, shared->page_buffer[slotno], BLCKSZ, offset) != BLCKSZ)
  {
    ...
  }
}

Postgresql中用户数据写盘实现FlushBuffer

数据页面同理,也是先找到页面lsn,刷xlog,在写数据。

static void
FlushBuffer(BufferDesc *buf, SMgrRelation reln)
{
	...
	buf_state = LockBufHdr(buf);

	/*
	 * Run PageGetLSN while holding header lock, since we don't have the
	 * buffer locked exclusively in all cases.
	 */
	recptr = BufferGetLSN(buf);   <<<<<<<<<<<<<<<<<<<<<<<<< 找到页面的LSN

	/* To check if block content changes while flushing. - vadim 01/17/97 */
	buf_state &= ~BM_JUST_DIRTIED;
	UnlockBufHdr(buf, buf_state);

	/*
	 * Force XLOG flush up to buffer's LSN.  This implements the basic WAL
	 * rule that log updates must hit disk before any of the data-file changes
	 * they describe do.
	 *
	 * However, this rule does not apply to unlogged relations, which will be
	 * lost after a crash anyway.  Most unlogged relation pages do not bear
	 * LSNs since we never emit WAL records for them, and therefore flushing
	 * up through the buffer LSN would be useless, but harmless.  However,
	 * GiST indexes use LSNs internally to track page-splits, and therefore
	 * unlogged GiST pages bear "fake" LSNs generated by
	 * GetFakeLSNForUnloggedRel.  It is unlikely but possible that the fake
	 * LSN counter could advance past the WAL insertion point; and if it did
	 * happen, attempting to flush WAL through that location would fail, with
	 * disastrous system-wide consequences.  To make sure that can't happen,
	 * skip the flush if the buffer isn't permanent.
	 */
	if (buf_state & BM_PERMANENT)
		XLogFlush(recptr);         <<<<<<<<<<<<<<<<<<<<<<<<< 先保证XLOG写到这个位点!
  
  ...
	smgrwrite(reln,
			  BufTagGetForkNum(&buf->tag),
			  buf->tag.blockNum,
			  bufToWrite,
			  false);
  ...
}

正文完