介绍
本贴涵盖了oracle database 12c release 2中是如何为分区表改进增量统计信息的内容。如果你还不是很熟悉增量统计信息语境下的synopses,那请在阅读前,查看(本系列的)第1和第2部分。
从oracle database 12c release 2开始,采用了新的synopsis格式,其比早期版本中使用的格式有更显著的压缩。为了方便描述,我将称oracle database 12c release 2的格式为新格式,而之前的格式为旧格式。
如果你之前了解过增量统计信息维护,但因为sysaux中额外的空间使用而决定不使用的话,则这个增强是特别适合的。oracle database 12c release 2解决了这个问题,并且,在多种场景下,降低了管理synopses所需的系统资源量。
一种synopsis的新类型
synopsis是存储在一对数据字典表(sys. wri $ _optstat_synopsis$ 和sys.wri $ _optstat_synopsis_head $ )中的元数据。该元数据由oracle自动管理,所以,通常并不需要知道其底层实现。尽管如此,我还是会介绍一些细节,因为这将帮助我们理解其为何变化得如此显著。
oracle database 12c release 2中的synopses目前使用(默认情况下)称为hyperloglog (hll)的算法创建。这是计算表列值中大致唯一值数量的最先进算法。尽管它是近似值,但它仍然是非常精确的,典型的误差率小于2%。在oracle database 12c release 2之前,synopses由存储在 wri $ _optstat_synopsis $ 表中的行构成。如果有大量的分区和表列,并且列上包含有大量的唯一值,则表中的行数可能是非常大的。新格式的synopses不在该表中存储行,取而代之的是部分额外的哈希数据被存储到wri $ _optstat_synopsis_head $ 表中(使用spare2列)
新的synopses能小多少?好吧,正如你可能已经猜到的,顾问们的说法:“视情况而定”。如前所述,synopses使用的空间是分区数、列数和列上唯一值数量的函数。我可以给你举一个例子:在我们的测试系统有一个8tb的测试表,有84个分区。旧格式的synopses的大小是大约160mb,而新格式的synopses只有6mb。我在github上创建了一个示例(见下文),用于产生大量的synopsis数据。在这个示例中,新格式的synopses几乎不占用任何空间,而老格式的synopses则占用了大约160mb(在sysaux中)。
如何使用
为了使用新格式的synopses,你需要做什么吗?什么都不需要!如果你选择使用增量统计信息,会默认使用。而且,相较于早期版本,需要你在oracle database 12c release 2中做的,并没有什么不同。
synopses控制
你可以使用dbms_stats中的,名为approximate_ndv_algorithm偏好参数来控制创建的类型。
默认值是repeat 或 hyperloglog:如果一个表正在使用旧格式的synopses,那么它将继续这样使用旧格式,而如果一个表正在使用新格式的synopses,那么也将继续使用新格式。
没有什么理由使用默认值之外的值,除非你正在升级数据库到oracle database 12c release 2。如果是这种情况,你可能要考虑其选项。下面会谈及。
升级
如果你正在升级一个使用增量统计的数据库,那么你会希望迁移到使用新格式的synopses。你要怎么做呢?最初需要注意的是那些同一张表中,即有旧格式,又有新格式synopses的表分区。好消息是你可以控制何时,以及如何从一种类型的synopses过渡到另一种。
有一个名为incremental_staleness的dbms_stats的偏好参数,它控制是否允许在从旧格式过渡到新格式时,表中可以拥有不同类型synopses的分区。让我们来看一下,在升级到oracle database 12c release 2后,不同的场景下是如何处理的。从“非常保守”(即,保持旧行为)到“积极”(即,立即利用新特性)是有一系列选择的。下表描述了从最保守的场景案例到最激进的场景案例。
使用场景 | 动作 |
---|---|
最初,你希望对所有表继续使用旧格式的synopses。我们建议使用新格式的,如果你愿意的话,也可以晚些再使用它们。hll之前使用的算法称为自适应采样 | exec dbms_stats.set_table_prefs(‘table_owner’,‘table-name’, ‘approximate_ndv_algorithm’, ‘adaptive sampling’) |
你希望使用旧格式的表继续使用旧格式,新创建的增量管理的表使用新格式的synopses。没有synopses的增量管理的表,收集统计信息时,会使用新格式的synopses,而使用老格式的增量管理的表,则会继续使用它。 | 无须操作。这是缺省的行为。 approximate_ndv_algorythm 的缺省值,是repeat 或 hyperloglog. |
你有一些非常大的分区表。它们正使用旧格式的synopses,而你希望逐步地用新的替换掉旧的 | 旧格式的synopses不会立即被新格式的替代,而新分区将会采用新格式。虽然混合模式会产生不太准确的统计信息,但其优点是不需要在前台重新收集所有表的统计信息。自动统计信息收集任务,将逐步的收集老格式分区上的统计信息,并生成新格式的synopses.最终,所有的分区均会使用新格式,统计信息也会更准确。 exec dbms_stats.set_table_prefs (table_owner, table_name,‘approximate_ndv_algorithm’,‘hyperloglog’) 注意:incremental_staleness 偏好参数必须是 allow_mixed_format, 但是,并不需要显式设置它,因为这就是默认值(除非你修改了它)。 |
你有时间重新收集所有统计信息。增量管理的表使用的是旧格式,而你希望立即用新格式替换掉旧格式。 | 如果你有一个窗口期用来完成分区表统计信息的重新收集,那么这是建议的方法。 exec dbms_stats.set_table_prefs(table_owner,table_name, ‘approximate_ndv_algorithm’, ‘hyperloglog’) 你还需要指定不希望新旧格式混合存在于同一个表上: exec dbms_stats.set_table_prefs (table_owner, table_name, ‘incremental_staleness’, ‘null’) 你要仔细一些,偏好参数的值要设为’null’ (有引号) ,而不是null (无引号)。 null (无引号)会设置偏好参数为其默认值,在本例中是allow_mixed_format。一旦这些偏好参数设置后,你就需要重新收集表的统计信息了。 |
记住,你也可以在库级,全局级和schema级设置dbms_stats的偏好参数(比如approximate_ndv_algorithm) ,就像在表级上设置那样(如上例所示)。
总结
oracle database 12c release 2中的synopsis格式比之前的格式更紧实。如果你的数据库非常大,希望在维护统计信息良好精确度的同时,还能节省大量sysaux中的空间。希望维护synopses的系统开销也会下降(比如,交换分区时)。有关此脚本和一些示例脚本的更多信息,请查看。
如果您对本文或github中的脚本有任何评论,请在下面继续。
原文链接:
efficient statistics maintenance for partitioned tables using incremental statistics – part 3
march 23, 2017 | 6 minute read
nigel bayliss
product manager
introduction
this post covers how oracle has improved incremental statistics for partitioned tables in oracle database 12c release 2. if you’re not already familiar with synopses in the context of incremental statistics then take a look at part 1 and part 2 before you read on.
beginning with oracle database 12c release 2 there’s a new synopsis format that’s significantly more compact than the format used in earlier releases. for brevity, i’ll refer to the oracle database 12c release 2 format as new and the previous format as old.
this enhancement is particularity relevant if you looked at incremental statistics maintenance in the past but decided not to use it because of the additional space usage in sysaux. oracle database 12c release 2 resolves this issue and, in many cases, reduces the amount of system resource required to manage synopses.
a new type of synopsis
a synopsis is metadata stored in a couple of tables in the data dictionary (sys.wri and sys.wri). the metadata is managed automatically by the oracle database, so there’s generally no reason to be aware of the underlying implementation. nevertheless, i’ll cover some of the details here because it will help you to see why the change is so significant.
synopses in oracle database 12c release 2 are now created (by default) using an algorithm called hyperloglog (hll). this is a state-of-the-art algorithm that calculates the approximate number of distinct values for table column values. even though it is an approximation, it is nevertheless very accurate with a typical error rate of less than 2%. prior to oracle database 12c release 2, synopses consisted of rows stored in the wri table. the number of rows in this table can be very large if there are a large number of partitions and table columns, and if the columns contain a large number of distinct values. new-style synopses do not store rows in this table. instead, some additional (and compact) hash data is stored in the wri table (in the spare2 column).
how much smaller are the new synopses? well, as you’ve probably guessed, it’s the consultants’ answer: “it depends”. as outlined above, the space used by synopses is a function of the number of partitions, columns and distinct values in columns. i can give you an example from one of our test systems containing an 8tb test table with 84-partitions. the total size of the old-style synopses was around 160mb and the new-style synopses totaled only 6mb. the example i created in github (see below) was contrived to generate a particularly large amount of synopsis data. in the example i’ve given, new-style synopses take up virtually no space at all and the old-style synopses take up about 160mb (in sysaux).
how to use them
what do you need to do to use new-style synopses? nothing! they are used by default if you choose to use incremental statistics and you don’t need to do anything different in oracle database 12c release 2 compared to earlier releases.
controlling synopses
you can control the type of that will be created using a dbms_stats preference called approximate_ndv_algorithm.
the default is repeat or hyperloglog: if a table is using old-style synopses then it will continue to do so, and tables using new-style synopses will continue to use those!
there’s no reason to use anything other than the default unless you are upgrading a database to oracle database 12c release 2. if this is the case then you might want to consider the options. that’s covered next.
upgrading
if you are upgrading a database that’s using incremental statistics, then you will want to migrate to using the new-style synopses. how do you go about doing that? it’s worth noting from the outset that it’s possible to have partitions with old-style and new-style synopses in the same table. also, the good news is that you can control when and how to transition from one type of synopses to the other.
there is a dbms_stats preference called incremental_staleness. it the controls whether or not you want to allow partitions within an individual table to have different types of synopses during the transition period from old-style to new-style. let’s look at the different scenarios and how to proceed after you have upgraded to oracle database 12c release 2. there is a spectrum of choice from “very conservative” (i.e., maintaining old behaviors) to “aggressive” (i.e., taking advantage of new features immediately). the chart below describes the different scenarios from the most conservative cases to most aggressive cases.
use-case | action |
---|---|
initially,you want to continue to use old-format synopses for all tables.we recommend that you use the new-style synopses, but can choose to use them later on if you prefer. the algorithm used prior to hll is called adaptive sampling. | exec dbms_stats.set_table_prefs(‘table_owner’,‘table-name’, ‘approximate_ndv_algorithm’, ‘adaptive sampling’) |
you want tables using old-style synopses to continue to use them.newly created incrementally-managed tables will use new-style synopses. incrementally-managed tables without synopses will use new-style when statistics are gathered.incrementally-managed tables with old-style synopses will continue to use them. | no action. this is the default behavior.the approximate_ndv_algorythm is, by default, repeat or hyperloglog. |
you have some very large partitioned tables. they are using old-style synopses and you want to gradually replace the old with the new. | old-format synopses are not immediately replaced and new partitions will have synopses in the new format. mixed formats will yield less accurate statistics but the advantage is that there is no need to re-gather all table statistics in the foreground. the statistics auto job will gradually re-gather statistics on partitions with old format synopses and generate new format synopses. eventually, new format synopses will be used for all partitions and statistics will be accurate. exec dbms_stats.set_table_prefs (table_owner, table_name,‘approximate_ndv_algorithm’,‘hyperloglog’)note that incremental_staleness preference must have the value allow_mixed_format, but it does not need to be set explicity (unless you’ve changed it) because it is the default setting。 |
you have time to re-gather all statistics. incrementally managed tables are using old-style synopses and you want to replace the old-style with the new immediately. | if you have a window of time to completely re-gather statistics for partitioned tables, then this is the recommended approach. exec dbms_stats.set_table_prefs(table_owner,table_name, ‘approximate_ndv_algorithm’, ‘hyperloglog’) you also need to specify that you don’t want a mix of old synopses and new synopses in the same table: exec dbms_stats.set_table_prefs (table_owner, table_name, ‘incremental_staleness’, ‘null’)you need to take some care here. the preference value should be set to ‘null’ (in quotes) and not null (without quotes). null (without quotes) sets a preference to its default value, which in this case is allow_mixed_format.once these preferences are set you will need to re-gather the table’s statistics. |
remember that you can also set dbms_stats preferences (such as approximate_ndv_algorithm) at the database, global and schema-level as well as at the table level (as per the examples above).
summary
the synopsis format in oracle database 12c release 2 is much more compact than the previous format. if your database is very large, expect to save a lot of space in sysaux while maintaining very good accuracy for your statistics. you can expect the system overhead required to manage synopses to drop too (for example, when you exchange partitions). for more on this and some example scripts, take a look at github.
if you have comments on this post or the scripts in github, please go ahead below.