查看原文
其他

什么?PG单表别超过8.9亿条记录?

digoal PostgreSQL码农集散地 2024-07-08

文中参考文档在github需点击阅读原文打开, 同时推荐2个学习环境: 

1、懒人Docker镜像, 已打包200+插件:《最好的PostgreSQL学习镜像

2、有web浏览器就能用的云起实验室: 《免费体验PolarDB开源数据库
3、PolarDB开源数据库内核、最佳实践等学习图谱:  https://www.aliyun.com/database/openpolardb/activity 
关注公众号, 持续发布PostgreSQL、PolarDB、DuckDB等相关文章. 


为什么PostgreSQL单表不建议超过8.9亿条记录?

《PostgreSQL 垃圾回收参数优化之 - maintenance_work_mem , autovacuum_work_mem》

maintenance_work_mem , autovacuum_work_mem 这个参数有什么用?

这部分内存被用于记录垃圾tupleid,vacuum进程在进行表扫描时,当扫描到的垃圾记录ID占满了整个内存(autovacuum_work_mem或maintenance_work_mem),那么会停止vacuum表, 记录heap表vacuum到的位置,然后开始该表对应INDEX的vacuum。

扫描INDEX时,清理索引中的哪些tuple,实际上是从刚才内存中记录的这些tupleid来进行匹配。

当所有索引都扫描并清理了一遍后,继续从刚才的vacuum位点开始vacuum表。

显然,如果垃圾回收时autovacuum_work_mem太小,INDEX会被多次扫描,浪费资源,时间。

所以autovacuum_work_mem当然是越大越好, 不过再大也不能超过1GiB, 设置超过1GiB没有意义. 因为The dead tuple ids's array used a single allocation and so was limited to 1GB.

综合以上分析, 为什么PG单表不建议超过8.9亿条记录?

从头文件得知, tupleid为6字节长度。1GiB可存储1.7亿条dead tuple的tupleid。默认垃圾记录约等于表大小的20%时触发垃圾回收, 8.9亿条记录的表20%的垃圾即1.7亿条dead tuple, 超过8.9亿, 该表的垃圾回收就要多次扫描index了。

PostgreSQL 17引入了一个新的数据结构TidStore, 这个数据结构不是blockid,itemid的array, 而是blockid,bitmap的结构, 也就是说一个page里的所有deadtuple可以存储在一个值里面.

  • 没有autovacuum_work_mem 1GiB限制

  • dead tuples的存储占用空间更少

  • 从TidStore数据结构搜索效率比从dead tuple array搜索更高.

  • 综合性能提升数倍甚至数十倍(当index order is correlated with heap tuple order, 索引顺序和heap里面的对应值顺序线性相关, 也就是一个page里面的垃圾完全对应到index一个page里面的垃圾几率更高.).

PostgreSQL 17解决了如下问题:

  • The array used a single allocation and so was limited to 1GB.

  • The allocation was pessimistically sized according to table size.

  • Lookup with binary search was slow because of poor CPU cache and branch prediction behavior.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=667e65aac354975c6f8090c6146fceb8d7b762d6

Use TidStore for dead tuple TIDs storage during lazy vacuum. master github/master
author Masahiko Sawada <msawada@postgresql.org>
Tue, 2 Apr 2024 01:15:37 +0000 (10:15 +0900)
committer Masahiko Sawada <msawada@postgresql.org>
Tue, 2 Apr 2024 01:15:37 +0000 (10:15 +0900)
commit 667e65aac354975c6f8090c6146fceb8d7b762d6
tree ecefa5c922788f32aa643e4639561d7ba9ac1801 tree
parent d5d2205c8ddc6670fa87474e172fdfab162b7a73 commit | diff
Use TidStore for dead tuple TIDs storage during lazy vacuum.

Previously, we used a simple array for storing dead tuple IDs during
lazy vacuum, which had a number of problems:

* The array used a single allocation and so was limited to 1GB.
* The allocation was pessimistically sized according to table size.
* Lookup with binary search was slow because of poor CPU cache and
branch prediction behavior.

This commit replaces that array with the TID store from commit
30e144287a.

Since the backing radix tree makes small allocations as needed, the
1GB limit is now gone. Further, the total memory used is now often
smaller by an order of magnitude or more, depending on the
distribution of blocks and offsets. These two features should make
multiple rounds of heap scanning and index cleanup an extremely rare
event. TID lookup during index cleanup is also several times faster,
even more so when index order is correlated with heap tuple order.

Since there is no longer a predictable relationship between the number
of dead tuples vacuumed and the space taken up by their TIDs, the
number of tuples no longer provides any meaningful insights for users,
nor is the maximum number predictable. For that reason this commit
also changes to byte-based progress reporting, with the relevant
columns of pg_stat_progress_vacuum renamed accordingly to
max_dead_tuple_bytes and dead_tuple_bytes.

For parallel vacuum, both the TID store and supplemental information
specific to vacuum are shared among the parallel vacuum workers. As
with the previous array, we don't take any locks on TidStore during
parallel vacuum since writes are still only done by the leader
process.

Bump catalog version.

Reviewed-by: John Naylor, (in an earlier version) Dilip Kumar
Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com

- <para>
- Note that for the collection of dead tuple identifiers,
- <command>VACUUM</command> is only able to utilize up to a maximum of
- <literal>1GB</literal> of memory.
- </para>
</listitem>
</varlistentry>

@@ -1946,13 +1941,6 @@ include_dir 'conf.d'
<filename>postgresql.conf</filename> file or on the server command
line.
</para>
- <para>
- For the collection of dead tuple identifiers, autovacuum is only able
- to utilize up to a maximum of <literal>1GB</literal> of memory, so
- setting <varname>autovacuum_work_mem</varname> to a value higher than
- that has no effect on the number of dead tuples that autovacuum can
- collect while scanning a table.
- </para>

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=30e144287a

Add TIDStore, to store sets of TIDs (ItemPointerData) efficiently.
author Masahiko Sawada <msawada@postgresql.org>
Thu, 21 Mar 2024 01:08:42 +0000 (10:08 +0900)
committer Masahiko Sawada <msawada@postgresql.org>
Thu, 21 Mar 2024 01:08:42 +0000 (10:08 +0900)
commit 30e144287a72529c9cd9fd6b07fe96eb8a1e270e
tree 2ced2409e5f62c9f198486b6be3cd74cec18682e tree
parent 995e0fbc1c57c9b705c57de456d25c6e448bc5dd commit | diff
Add TIDStore, to store sets of TIDs (ItemPointerData) efficiently.

TIDStore is a data structure designed to efficiently store large sets
of TIDs. For TID storage, it employs a radix tree, where the key is
a block number, and the value is a bitmap representing offset
numbers. The TIDStore can be created on a DSA area and used by
multiple backend processes simultaneously.

There are potential future users such as tidbitmap.c, though it's very
likely the interface will need to evolve as we come to understand the
needs of different kinds of users. For example, we can support
updating the offset bitmap of existing values.

Currently, the TIDStore is not used for anything yet, aside from the
test code. But an upcoming patch will use it.

This includes a unit test module, in src/test/modules/test_tidstore.

Co-authored-by: John Naylor
Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com

1 /*-------------------------------------------------------------------------
2 *
3 * tidstore.c
4 * TID (ItemPointerData) storage implementation.
5 *
6 * TidStore is a in-memory data structure to store TIDs (ItemPointerData).
7 * Internally it uses a radix tree as the storage for TIDs. The key is the
8 * BlockNumber and the value is a bitmap of offsets, BlocktableEntry.
9 *
10 * TidStore can be shared among parallel worker processes by passing DSA area
11 * to TidStoreCreate(). Other backends can attach to the shared TidStore by
12 * TidStoreAttach().
13 *
14 * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
15 * Portions Copyright (c) 1994, Regents of the University of California
16 *
17 * IDENTIFICATION
18 * src/backend/access/common/tidstore.c
19 *
20 *-------------------------------------------------------------------------
21 */

数据结构对比:

295 /*
296 * VacDeadItems stores TIDs whose index tuples are deleted by index vacuuming.
297 */
298 typedef struct VacDeadItems
299 {
300 int max_items; /* # slots allocated in array */
301 int num_items; /* current # of entries */
302
303 /* Sorted array of TIDs to delete from indexes */
304 ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
305 } VacDeadItems;

296 /*
297 * VacDeadItemsInfo stores supplemental information for dead tuple TID
298 * storage (i.e. TidStore).
299 */
300 typedef struct VacDeadItemsInfo
301 {
302 size_t max_bytes; /* the maximum bytes TidStore can use */
303 int64 num_items; /* current # of entries */
304 } VacDeadItemsInfo;


14 #ifndef TIDSTORE_H
15 #define TIDSTORE_H
16
17 #include "storage/itemptr.h"
18 #include "utils/dsa.h"
19
20 typedef struct TidStore TidStore;
21 typedef struct TidStoreIter TidStoreIter;
22
23 /* Result struct for TidStoreIterateNext */
24 typedef struct TidStoreIterResult
25 {
26 BlockNumber blkno;
27 int max_offset;
28 int num_offsets;
29 OffsetNumber *offsets;
30 } TidStoreIterResult;
31
32 extern TidStore *TidStoreCreate(size_t max_bytes, dsa_area *dsa,
33 int tranche_id);
34 extern TidStore *TidStoreAttach(dsa_area *dsa, dsa_pointer rt_dp);
35 extern void TidStoreDetach(TidStore *ts);
36 extern void TidStoreLockExclusive(TidStore *ts);
37 extern void TidStoreLockShare(TidStore *ts);
38 extern void TidStoreUnlock(TidStore *ts);
39 extern void TidStoreDestroy(TidStore *ts);
40 extern void TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets,
41 int num_offsets);
42 extern bool TidStoreIsMember(TidStore *ts, ItemPointer tid);
43 extern TidStoreIter *TidStoreBeginIterate(TidStore *ts);
44 extern TidStoreIterResult *TidStoreIterateNext(TidStoreIter *iter);
45 extern void TidStoreEndIterate(TidStoreIter *iter);
46 extern size_t TidStoreMemoryUsage(TidStore *ts);
47 extern dsa_pointer TidStoreGetHandle(TidStore *ts);
48
49 #endif /* TIDSTORE_H */

欢迎关注我的github (https://github.com/digoal/blog) , 学习数据库不迷路.  

近期正在写公开课材料, 未来将通过视频号推出, 欢迎关注视频号:

文章中的参考文档请点击阅读原文获得. 


继续滑动看下一个
向上滑动看下一个

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存