Postgresql源码(92)深入分析HOT更新

news/2024/7/8 2:21:11

0 概述与总结

  • hot更新已经有几篇分析了,这里是最后一篇(总结性的,前面的可以忽略)。
  • 前面在看update代码时,大部分集中在heap_update上,没有涉及寻找HOT链的逻辑。本篇重点看HOT链是如何使用的。

(总结速查)

(lp=line pointer:页面中等宽数组,每一个指向页面底部的数据区域)

关键步骤总结(no vacuum场景):

  • HOT链的头部元素的lp始终保存,索引始终指向这个lp(即使这个lp指向行更新了也只是把数据删了,保留lp指针)
  • HOT链的中间元素都带HEAP_HOT_UPDATED标记,HOT链的最后一个元素只有HEAP_ONLY_TUPLE标记。
  • HOT链在更新时有三个关键步骤:
    • 1 走索引找到链头lp:table_index_fetch_tuple(下文3.1)
    • 2 遍历HOT链确定需要的lp:heap_hot_search_buffer(下文3.2)
    • 3 碎片整理,使数据区域更紧凑,会更新lp的指向位置:compactify_tuples(下文3.3)
    • 4 使用找到的lp获取页面位置,memcopy数据上去完成update:heap_update(下文3.4)

堆栈:

ExecModifyTable
  ExecProcNode        // ExecProcNode = 0x783005 <ExecIndexScan>
  | ExecIndexScan
  |   ExecScan
  |     ExecScanFetch
  |       IndexNext
  |         index_getnext_slot
  |           tid = index_getnext_tid                       // 3.1 总是拿到ip_posid = 130
  |           index_fetch_heap
  |             table_index_fetch_tuple(ItemPointer tid)    // {ip_posid = 130}
  |               heapam_index_fetch_tuple
  |                 heap_hot_search_buffer                  // 3.2 遍历HOT链找旧元组
  |                 heap_page_prune_opt                     // 3.3 碎片整理
  |                   heap_page_prune
  |                     heap_page_prune_execute
  |                       PageRepairFragmentation
  |                         compactify_tuples
  ExecUpdate
    ExecUpdateAct
      table_tuple_update
        heapam_tuple_update
          heap_update                                       // 3.4 更新

1 分析用例

照常先给出分析用例。

-- 测试表,单页可以放136条数据
drop table testbl;
create table testbl(i int primary key not null, id int not null, info varchar(200) not null);
alter table testbl set (autovacuum_enabled = false);
insert into testbl select generate_series(1,130), (random()*100)::integer, repeat('DUfw',(5)::integer);


select * from testbl limit 10;
 i  | id |         info         
----+----+----------------------
  1 | 57 | DUfwDUfwDUfwDUfwDUfw
  2 |  2 | DUfwDUfwDUfwDUfwDUfw
  3 | 29 | DUfwDUfwDUfwDUfwDUfw
  4 | 37 | DUfwDUfwDUfwDUfwDUfw
  5 |  2 | DUfwDUfwDUfwDUfwDUfw
  6 | 44 | DUfwDUfwDUfwDUfwDUfw
  7 | 53 | DUfwDUfwDUfwDUfwDUfw
  8 | 24 | DUfwDUfwDUfwDUfwDUfw
  9 | 49 | DUfwDUfwDUfwDUfwDUfw
 10 | 17 | DUfwDUfwDUfwDUfwDUfw

2 HOT更新实验

2.1 总结:等宽更新

  • HOT多次更新后,发现HOT链会复用元组,并不会一直延长。
  • HOT根节点(本例中的lp=130)不变,复用后续的节点。
  • 复用节点带HEAP_ONLY_TUPLE | HEAP_HOT_UPDATED标记。
UPDATElp=130lp=131lp=132
第一次原tuple:HEAP_HOT_UPDATED新tuple:HEAP_ONLY_TUPLE
第二次原tuple:HEAP_ONLY_TUPLE HEAP_HOT_UPDATED新tuple:HEAP_ONLY_TUPLE
第三次新tuple:HEAP_ONLY_TUPLE原tuple:HEAP_ONLY_TUPLE HEAP_HOT_UPDATED
第四次原tuple:HEAP_ONLY_TUPLE HEAP_HOT_UPDATED新tuple:HEAP_ONLY_TUPLE
第五次新tuple:HEAP_ONLY_TUPLE原tuple:HEAP_ONLY_TUPLE HEAP_HOT_UPDATED

数组区域状态(等宽更新,对应上表)

注意uppdate都是使用lp有效的位置,用之前先做碎片整理,把有效的向下移动,填充到删除的地方。然后再insert。
在这里插入图片描述

2.1 总结:不等宽更新

数组区域状态(不等宽更新)

注意第四次更新和第五次更新,新数据更宽了,可以明显看到碎片整理的过程:

  • 第五次更新时,先把132的数据向下移动到888-967;然后再对132的数据进行更新;更新后132被删除;131被复用,放在了页面的upper指针+数据大小的位置。
    在这里插入图片描述

实验:更新前xid=8169

SELECT lp, lp_off, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask FROM heap_page_items(get_raw_page('testbl', 0));

 lp  | lp_off | t_xmin | t_xmax | t_field3 | t_ctid  | t_infomask2 | t_infomask 
-----+--------+--------+--------+----------+---------+-------------+------------
 130 |    912 |   8169 |      0 |        0 | (0,130) |           3 |       2050

============
t_infomask
============
2050(0x802) = HEAP_XMAX_INVALID | HEAP_HASVARWIDTH

============
t_infomask2(11 bits for number of attributes)
============
number_of_ = 3

实验:第一次更新xid=8170

update testbl set info = 'DDDDDDDDDDDDDDDDDDD1' where i = 130;

SELECT lp, lp_off, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask FROM heap_page_items(get_raw_page('testbl', 0));
 lp  | lp_off | t_xmin | t_xmax | t_field3 | t_ctid  | t_infomask2 | t_infomask 
-----+--------+--------+--------+----------+---------+-------------+------------
 130 |    912 |   8169 |   8170 |        0 | (0,131) |       16387 |        258
 131 |    856 |   8170 |      0 |        0 | (0,131) |       32771 |      10242
 
============
t_infomask
============
258(0x102) = HEAP_XMIN_COMMITTED | HEAP_HASVARWIDTH
10242(0x2802) = HEAP_UPDATED | HEAP_XMAX_INVALID | HEAP_HASVARWIDTH

============
t_infomask2
============
16387(0x4003) = HEAP_HOT_UPDATED | 3attributes
32771(0x8003) = HEAP_ONLY_TUPLE | 3attributes

实验:第二次更新xid=8171

update testbl set info = 'DDDDDDDDDDDDDDDDDDD2' where i = 130;

SELECT lp, lp_off, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask FROM heap_page_items(get_raw_page('testbl', 0));
 lp  | lp_off | t_xmin | t_xmax | t_field3 | t_ctid  | t_infomask2 | t_infomask 
-----+--------+--------+--------+----------+---------+-------------+------------
 130 |    131 |        |        |          |         |             |           
 131 |    912 |   8170 |   8171 |        0 | (0,132) |       49155 |       8450
 132 |    856 |   8171 |      0 |        0 | (0,132) |       32771 |      10242

============
t_infomask
============
8450(0x2102) = HEAP_UPDATED | HEAP_XMIN_COMMITTED | HEAP_HASVARWIDTH
10242(0x2802) = HEAP_UPDATED | HEAP_XMAX_INVALID | HEAP_HASVARWIDTH

============
t_infomask2
============
49155(0xC003) = HEAP_ONLY_TUPLE | HEAP_HOT_UPDATED | 3attributes
32771(0x8003) = HEAP_ONLY_TUPLE | 3attributes

实验:第三次更新xid=8172

update testbl set info = 'DDDDDDDDDDDDDDDDDDD3' where i = 130;

SELECT lp, lp_off, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask FROM heap_page_items(get_raw_page('testbl', 0));
 lp  | lp_off | t_xmin | t_xmax | t_field3 | t_ctid  | t_infomask2 | t_infomask 
-----+--------+--------+--------+----------+---------+-------------+------------
 130 |    132 |        |        |          |         |             |           
 131 |    856 |   8172 |      0 |        0 | (0,131) |       32771 |      10242
 132 |    912 |   8171 |   8172 |        0 | (0,131) |       49155 |       8450
 
============
t_infomask
============
10242(0x2802) = HEAP_UPDATED | HEAP_XMAX_INVALID | HEAP_HASVARWIDTH
8450(0x2102) = HEAP_UPDATED | HEAP_XMIN_COMMITTED | HEAP_HASVARWIDTH

============
t_infomask2
============
49155(0xC003) = HEAP_ONLY_TUPLE | HEAP_HOT_UPDATED | 3attributes
32771(0x8003) = HEAP_ONLY_TUPLE | 3attributes

实验:第四次更新xid=8173

update testbl set info = 'DDDDDDDDDDDDDDDDDDD4' where i = 130;

SELECT lp, lp_off, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask FROM heap_page_items(get_raw_page('testbl', 0));
 lp  | lp_off | t_xmin | t_xmax | t_field3 | t_ctid  | t_infomask2 | t_infomask 
-----+--------+--------+--------+----------+---------+-------------+------------
 130 |    131 |        |        |          |         |             |           
 131 |    912 |   8172 |   8173 |        0 | (0,132) |       49155 |       8450
 132 |    856 |   8173 |      0 |        0 | (0,132) |       32771 |      10242

============
t_infomask
============
8450(0x2102) = HEAP_UPDATED | HEAP_XMIN_COMMITTED | HEAP_HASVARWIDTH
10242(0x2802) = HEAP_UPDATED | HEAP_XMAX_INVALID | HEAP_HASVARWIDTH

============
t_infomask2
============
49155(0xC003) = HEAP_ONLY_TUPLE | HEAP_HOT_UPDATED | 3attributes
32771(0x8003) = HEAP_ONLY_TUPLE | 3attributes

实验:第五次更新xid=8174

update testbl set info = 'DDDDDDDDDDDDDDDDDDD5' where i = 130;

SELECT lp, lp_off, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask FROM heap_page_items(get_raw_page('testbl', 0));
 lp  | lp_off | t_xmin | t_xmax | t_field3 | t_ctid  | t_infomask2 | t_infomask 
-----+--------+--------+--------+----------+---------+-------------+------------
 130 |    132 |        |        |          |         |             |           
 131 |    856 |   8174 |      0 |        0 | (0,131) |       32771 |      10242
 132 |    912 |   8173 |   8174 |        0 | (0,131) |       49155 |       8450

============
t_infomask
============
10242(0x2802) = HEAP_UPDATED | HEAP_XMAX_INVALID | HEAP_HASVARWIDTH
8450(0x2102) = HEAP_UPDATED | HEAP_XMIN_COMMITTED | HEAP_HASVARWIDTH

============
t_infomask2
============
49155(0xC003) = HEAP_ONLY_TUPLE | HEAP_HOT_UPDATED | 3attributes
32771(0x8003) = HEAP_ONLY_TUPLE | 3attributes

3 场景分析:lp=130空、lp=131删除、lp=132有效

  • 当前HOT链:130(重定向)—>132(有效)
  • 预期发生:碎片整理:132准备转移到131的位置;新增数据到132原来的位置、用131指向新增数据。
    在这里插入图片描述

3.1 索引扫描

从顶层ExecModifyTable进入索引扫描部分,因为关掉了VAUUM,索引总是返回130:index_getnext_tid

ExecModifyTable
  ExecProcNode        // ExecProcNode = 0x783005 <ExecIndexScan>
    ExecIndexScan
      ExecScan
        ExecScanFetch
          IndexNext
            index_getnext_slot
              tid = index_getnext_tid                       // 3.1 总是拿到ip_posid = 130  <<<<
              index_fetch_heap
                table_index_fetch_tuple(ItemPointer tid)    // {ip_posid = 130}
                  heapam_index_fetch_tuple
                    heap_hot_search_buffer                  // 3.2 遍历HOT链找旧元组
                    heap_page_prune_opt                     // 3.3 碎片整理

3.2 遍历HOT链找旧元组

再用130去找元组:heap_hot_search_buffer

bool
heap_hot_search_buffer(
			ItemPointer tid,  // {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 130}
			Relation relation, Buffer buffer,
			Snapshot snapshot, HeapTuple heapTuple,
			bool *all_dead, bool first_call)
{
	...
	...	
	/* Scan through possible multiple members of HOT-chain */
	for (;;)
	{
		ItemId		lp;

		/* check for bogus TID */
		if (offnum < FirstOffsetNumber || offnum > PageGetMaxOffsetNumber(dp))
			break;
		

【第一轮循环读到HOT链130,找到132】

这里比较重要,offnum现在还是130,但是lp拿出来就直接是132了:
ItemIdData = {lp_off = 132, lp_flags = 2, lp_len = 0}
这里的lp_flags=2表示LP_REDIRECT,重定向到132。
#define LP_UNUSED 0 /* unused (should always have lp_len=0) */
#define LP_NORMAL 1 /* used (should always have lp_len>0) */
#define LP_REDIRECT 2 /* HOT redirect (should have lp_len=0) */
#define LP_DEAD 3 /* dead, may or may not have storage */

在这里插入图片描述


【第二轮循环读到HOT链132,找到数据位置】

第二次进入循环后,offnum=132拿到的lp:
ItemIdData = {lp_off = 912, lp_flags = 1, lp_len = 53}(LP_NORMAL)

现在的912就是指向数据区域了。

		lp = PageGetItemId(dp, offnum);

		/* check for unused, dead, or redirected items */
		if (!ItemIdIsNormal(lp))
		{
			/* We should only see a redirect at start of chain */
			if (ItemIdIsRedirected(lp) && at_chain_start)
			{
				/* Follow the redirect */
				offnum = ItemIdGetRedirect(lp);

使用lp=132拿到offnum=132,继续下一轮循环继续遍历HOT链。

				at_chain_start = false;
				continue;
			}
			/* else must be end of chain */
			break;
		}

遍历完HOT链出循环,开始读132的数据部分,拼接元组。

		heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp);
		heapTuple->t_len = ItemIdGetLength(lp);
		heapTuple->t_tableOid = RelationGetRelid(relation);
		ItemPointerSet(&heapTuple->t_self, blkno, offnum);

		if (!skip)
		{
			/* If it's visible per the snapshot, we must return it */
			valid = HeapTupleSatisfiesVisibility(heapTuple, snapshot, buffer);
			HeapCheckForSerializableConflictOut(valid, relation, heapTuple,
												buffer, snapshot);

			if (valid)
			{
				ItemPointerSetOffsetNumber(tid, offnum);
				PredicateLockTID(relation, &heapTuple->t_self, snapshot,
								 HeapTupleHeaderGetXmin(heapTuple->t_data));
				if (all_dead)
					*all_dead = false;

找到132返回。

				return true;
			}
		}
		skip = false;

3.3 碎片整理

将132移动到131的位置上,因为131删掉已经是空洞了。

这里就不展开分析了。记录下函数堆栈。

ExecModifyTable
  ExecProcNode        // ExecProcNode = 0x783005 <ExecIndexScan>
    ExecIndexScan
      ExecScan
        ExecScanFetch
          IndexNext
            index_getnext_slot
              tid = index_getnext_tid                       // 总是拿到ip_posid = 130  <<<<
              index_fetch_heap
                table_index_fetch_tuple(ItemPointer tid)    // {ip_posid = 130}
                  heapam_index_fetch_tuple
                    heap_hot_search_buffer                  // 3.2 遍历HOT链找旧元组
                    heap_page_prune_opt                     // 3.3 碎片整理
                      heap_page_prune
                        heap_page_prune_execute
                          PageRepairFragmentation
                            compactify_tuples

3.4 开始更新heap_update

ExecModifyTable
  ExecProcNode        // ExecProcNode = 0x783005 <ExecIndexScan>
  | ExecIndexScan
  |   ExecScan
  |     ExecScanFetch
  |       IndexNext
  |         index_getnext_slot
  |           tid = index_getnext_tid                       // 总是拿到ip_posid = 130  <<<<
  |           index_fetch_heap
  |             table_index_fetch_tuple(ItemPointer tid)    // {ip_posid = 130}
  |               heapam_index_fetch_tuple
  |                 heap_hot_search_buffer                  // 3.2 遍历HOT链找旧元组
  |                 heap_page_prune_opt                     // 3.3 碎片整理
  |                   heap_page_prune
  |                     heap_page_prune_execute
  |                       PageRepairFragmentation
  |                         compactify_tuples
  ExecUpdate
    ExecUpdateAct
      table_tuple_update
        heapam_tuple_update
          heap_update

1 拿到数据地址

ItemId = {lp_off = 912, lp_flags = 1, lp_len = 53}

lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));

2 拼接旧元组:

HeapTupleData = {t_len = 53, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 132}, t_tableOid = 32946, t_data = 0x2aaaab4ae610}

3 新元组位置未确定:

HeapTupleData = {t_len = 53, t_self = {ip_blkid = {bi_hi = 65535, bi_lo = 65535}, ip_posid = 0}, t_tableOid = 32946, t_data = 0x26fe440}

	oldtup.t_tableOid = RelationGetRelid(relation);
	oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
	oldtup.t_len = ItemIdGetLength(lp);
	oldtup.t_self = *otid;
	
	newtup->t_tableOid = RelationGetRelid(relation);

4 判断旧元组可见性

result = TM_Ok

result = HeapTupleSatisfiesUpdate(&oldtup, cid, buffer);

5 同页更新使用use_hot_update

旧的加:HEAP_HOT_UPDATED
新的加:HEAP_ONLY_TUPLE

1: *newtup = {t_len = 53, t_self = {ip_blkid = {bi_hi = 65535, bi_lo = 65535}, ip_posid = 0}, t_tableOid = 32946, t_data = 0x26fe440}
3: oldtup = {t_len = 53, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 132}, t_tableOid = 32946, t_data = 0x2aaaab4ae610}

6 insert插入新tuple

RelationPutHeapTuple执行insert操作,需要先找到插入位置。

RelationPutHeapTuple
	PageAddItem

核心逻辑(重要)

  • 正向遍历itemid,所以,数据区域是反向遍历的。
  • 找到空闲的lp131,然后132经过碎片整理下移到131的位置上了,原来132的位置在最上面空出来了,申请这个位置使用。
			for (offsetNumber = FirstOffsetNumber;
				 offsetNumber < limit;	/* limit is maxoff+1 */
				 offsetNumber++)
			{
				itemId = PageGetItemId(phdr, offsetNumber);

				/*
				 * We check for no storage as well, just to be paranoid;
				 * unused items should never have storage.  Assert() that the
				 * invariant is respected too.
				 */
				Assert(ItemIdIsUsed(itemId) || !ItemIdHasStorage(itemId));

				if (!ItemIdIsUsed(itemId) && !ItemIdHasStorage(itemId))
					break;
			}

在这里插入图片描述


http://lihuaxi.xjx100.cn/news/298849.html

相关文章

《Redis基础篇》带你走进Redis的世界 ~ ⭐必看必看⭐

文章目录1. NoSQL数据库简介1.1 技术发展1.1.1. Web1.0时代1.1.2 Web2.0时代1.1.3. 解决CPU及内存压力(采用分布式)1.1.4. 解决IO压力1.2. NoSQL数据库1.2.1. NoSQL数据库概述1.2.2 NoSQL适用场景1.2.3 NoSQL不适用场景1.2.4 Memcache1.2.5 Redis1.2.6. MongoDB1.3. 行式存储数…

excel转换成pdf格式怎么操作?这3招教你Excel怎么转PDF

在我们日常办公中&#xff0c;经常会需要用到Excel表格&#xff0c;这类文件格式可以帮助我们日常记录统计数据&#xff0c;有效的提升办公效率。当我们需要将文件发送给别人&#xff0c;为了避免被改数据内容&#xff0c;很多时候都会将Excel转换为PDF格式。那么&#xff0c;E…

Perl5和Perl6对比使用Sigils的差别

概述 让我们从Perl 5和Perl 6中的Sigils概述开始&#xff1a; 符号Perl5Perl6 Array Positional% Hash Associative& Subroutine Callable$ Scalar Item* TypeglobN/a (Array vs. Positional) 在perl 5中定义数组时&#xff0c;可以创建一个可扩展的标量值列表&#x…

计算机毕业设计Java班主任管理系统(源码+系统+mysql数据库+lw文档)

计算机毕业设计Java班主任管理系统(源码系统mysql数据库lw文档) 计算机毕业设计Java班主任管理系统(源码系统mysql数据库lw文档)本源码技术栈&#xff1a; 项目架构&#xff1a;B/S架构 开发语言&#xff1a;Java语言 开发软件&#xff1a;idea eclipse 前端技术&#xff1…

嵌入式系统硬件概述

文章目录嵌入式系统硬件平台(1) 嵌入式处理器的分类(2) 嵌入式微处理器MIPS处理器PowerPC处理器ARM处理器ARM发展历史ARM公司介绍ARM市场份额嵌入式微控制器&#xff08;MCU&#xff09;数字信号处理器&#xff08;DSP&#xff09;嵌入式片上系统&#xff08;SoC&#xff09;嵌…

设计模式-桥接模式

桥接模式( Bridge Pattern &#xff09;也称为桥梁模式、接口(Interfce)模式或柄体&#xff08; Handle and Body)模 式&#xff0c;是将抽象部分与它的具体实现部分分离&#xff0c;使它们都可以独立地变化&#xff0c;属于结构型模式。 原文&#xff1a;Decouple an abstract…

提到Canvas,必须好好唠唠它的图像操作能力

前情提要 接续一下之前对Canvas的探索。本篇分享一下对图像操作的阅读和研究。 日常开发中&#xff0c;时常遇到对图像的处理的场景。精美的图像做为背景或者场景&#xff0c;相对会吸引人。 Canvas图像API十分强大。可以通过Canvas图像API加载图像数据&#xff0c;进行裁剪…

安装GCC教程

安装GCC-9.3.0全指导 一.安装准备 1_1下载GCC-9.3.0安装包* 手动下载辅助包&#xff0c;自动下载太慢&#xff0c;咱直接自己动手解决 ps&#xff1a;下载源精选国内清华源&#xff0c;速度无敌 wget https://mirrors.tuna.tsinghua.edu.cn/gnu/gcc/gcc-9.3.0/gcc-9.3.0.tar…