nigel bayliss | november 8, 2011
产品经理
原文链接:
理解和管理优化器统计信息是优化sql执行的关键。知道何时和如何及时收集统计信息对于保持可接受的性能是至关重要的。为了澄清与统计信息相关的所有信息,我们将两篇有关优化器统计信息的白皮书一并提供出来,他们是:
介绍
当oracle数据库首次引入如何执行sql语句的决策时,是由基于规则的优化器(rbo)决定的。顾名思义,基于规则的优化器遵循一组规则来确定sql语句的执行计划。这些规则是经过排序的,因此,如果有两个可能的规则可以应用于sql语句,则将使用排名最低的规则。
在oracle database 7中,为了处理当时添加到数据库中的增强功能,包括并行执行和分区,以及处理实际的数据内容和分布,基于成本的优化器(cbo)被引入了。基于成本的优化器会检查一条sql可能的所有执行计划,并选取成本最低的一个。这里的成本表示对于给定的执行计划,资源使用量的评估。成本越低,执行计划的效率越高。为了使基于成本的优化器可以精确地确定给定执行计划的成本,就必须拥有sql中访问的所有对象(表和索引)的信息,以及运行该sql的系统的信息。
这些必须的信息通常统称为优化器统计信息。理解和管理优化器统计信息是优化sql执行的关键。知道何时和如何及时收集统计信息对保持可接受的性能是至关重要的。本白皮书是关于优化器统计信息的两部分系列文章中的第一篇,通过实例,详细描述了优化器统计信息的不同概念,包括:
- 统计信息是什么
- 统计信息收集
- 统计信息管理
- 统计信息的附加类型
统计信息是什么?
统计信息是描述数据库及数据库中对象的数据的集合。统计信息供优化器为每条sql选择最优的执行计划,它存储在数据字典中,并可通过诸如user_tab_statistics这类数据库字典视图来访问。
图1:统计信息存储在数据字典,供优化器用于确定执行计划
表和列统计信息
表统计信息包括表中的行数,表使用的数据块数,以及表中的平均行长等。优化器使用这些信息,并结合其它统计信息,计算执行计划中各种操作的成本,并评估相关操作所产生的行数。例如,表的访问成本是用数据块数,结合参数db_file_multiblock_read_count的值来计算的。可以在数据字典视图user_tab_statistics中查看表的统计信息。
列统计信息包括一个列上的唯一值数据(ndv),以及在该列上找到的最小和最大值。可以在数据字典视图user_tab_col_statistics中查看列的统计信息。优化器使用列统计信息,并结合表统计信息(行数)来评估sql操作将会返回的行数。例如,如果一个表有100行记录,并且对一个具有10个不同值的列上的相等谓词来访问表,那么优化器在假设数据分布一致的情况下,将基数估计为表中的行数除以该列不同值的数量,即,100/10=10。
原文链接:
原文内容:
nigel bayliss | november 8, 2011
product manager
understanding and managing optimizer statistics is key to optimal sql execution. knowing when and how to gather statistics in a timely manner is critical to maintaining acceptable performance. in order to clarify all of the information surrounding statistics we have put together two whitepapers on optimizer statistics. they are:
best practices for gathering optimizer statistics with oracle database 12c release 2
understanding optimizer statistics with oracle database 12c release 2
introduction
when the oracle database was first introduced the decision of how to execute a sql statement was determined by a rule based optimizer (rbo). the rule based optimizer, as the name implies, followed a set of rules to determine the execution plan for a sql statement. the rules were ranked so if there were two possible rules that could be applied to a sql statement the rule with the lowest rank would be used.
in oracle database 7, the cost based optimizer (cbo) was introduced to deal with the enhanced functionality being added to the oracle database at this time, including parallel execution and partitioning, and to take the actual data content and distribution into account. the cost based optimizer examines all of the possible plans for a sql statement and picks the one with the lowest cost, where cost represents the estimated resource usage for a given plan. the lower the cost the more efficient an execution plan is expected to be. in order for the cost based optimizer to accurately determine the cost for an execution plan it must have information about all of the objects (tables and indexes) accessed in the sql statement, and information about the system on which the sql statement will be run.
this necessary information is commonly referred to as optimizer statistics. understanding and managing optimizer statistics is key to optimal sql execution. knowing when and how to gather statistics in a timely manner is critical to maintaining acceptable performance. this whitepaper is the first in a two part series on optimizer statistics, and describes in detail, with worked examples, the different concepts of optimizer statistics including;
what are optimizer statistics
gathering statistics
managing statistics
additional types of statistics
what are optimizer statistics?
optimizer statistics are a collection of data that describe the database, and the objects in the database. these statistics are used by
the optimizer to choose the best execution plan for each sql statement. statistics are stored in the data dictionary, and can be accessed using data dictionary views such as user_tab_statistics.
figure 1:optimizer statistics stored in the data dictionary used by the oracle optimizer to determine execution plans
table and column statistics
table statistics include information on the number of rows in the table, the number of data blocks used for the table, as well as the average row length in the table. the optimizer uses this information, in conjunction with other statistics, to compute the cost of various operations in an execution plan, and to estimate the number of rows the operation will produce. for example, the cost of a table access is calculated using the number of data blocks combined with the value of the parameter db_file_multiblock_read_count. you can view table
statistics in the dictionary view user_tab_statistics.
column statistics include information on the number of distinct values in a column (ndv) as well as the minimum and maximum value found in the column. you can view column statistics in the dictionary view user_tab_col_statistics. the optimizer uses the column statistics information in conjunction with the table statistics (number of rows) to estimate the number of rows that will be returned by a sql operation. for example, if a table has 100 records, and the table access evaluates an equality predicate on a column that has 10 distinct values, then the optimizer, assuming uniform data distribution, estimates the cardinality to be the number of rows in the table divided by the number of distinct values for the column or 100/10 = 10.