我们收到了大量想更多了解oracle的优化器转换的请求,所以,我们想我们应该组织一个系列博客文章,来描述最常用的转换。在接下来的一个季度里,你应该会每月看到两篇关于这方面的文章。
开始前,我们需要解释一下转换的含义是什么?当一条sql语句被解析,优化器会尝试“转换”或重写该sql为一个语义上等价的,处理起来更有效率的sql。我们首次提及转换的概念是在2008年6月,一篇名为《为什么我查询中的一些表从执行计划中消失了?》的博客文章中。这篇最初的文章中,只讨论了oracle数据库11g中引入的一种新转换,称为表消除。新系列文章中的第一部分,我们将讨论子查询展开。我必须要感谢rafi ---- 优化器的高级开发人员之一,他为本主题提供了内容。
子查询展开
子查询展开是一种将子查询转换为外连接的优化方法,并允许优化器在选择访问路径,连接方法和连接次序期间,考虑子查询中的表。展开要么是将子查询是合并到外层查询中,要么转换为一个内联视图(译者注:出现在from子句中的子查询)。
没有子查询展开,子查询将为外层查询中的每一行进行多次评估。因此很多高效的访问方法和连接方法不会被考虑。
这里,我们将讨论出现在where子句中的any和exists子查询上的展开.
术语
任何出现在查询语句中的子查询块,都可以被称之为子查询。但是,我们将出现在where、select和having子句中的子查询块,使用术语子查询来称之。部分oracle官方文档使用术语“嵌套子查询”来称呼他们。出现在from子句中的子查询块被称为视图或派生表。
分类子查询的方法有很多种,主要的分类方法是基于子查询在sql语句中的使用方式。一个where子句中的子查询属于以下类型之一:单行,exists,not exists, any 或者all。单行子查询最多只能返回1行,而其它类型的子查询可以返回0行或多行。
any和all子查询会和关系比较操作符(=, >,>=, <, <=, 和 <>.)一起使用。sql中,集合操作符 in 用 =any 缩写,not in 用 <>all 缩写。
查询a展示了一个exists的关联子查询的示例
a.
select c.cust_last_name, c.country_id
from customers c
where exists (select 1
from sales s
where s.quantity_sold > 1000 and
s.cust_id = c.cust_id);
出现在子查询中的 ,且不属于子查询中定义的表中的列,称之为关联列。a例中的子查询就是关联的,它引用了一个关联列 c.cust_id,他来自于表customers,而该表并不是由子查询定义的(即表没有出现在子查询中)。谓词s.cust_id = c.cust_id被称为关联条件或关联谓词。
再来看查询b,它包含一个非关联的any子查询。请注意查询b和查询a在语义上是等价的。
b.
select c.cust_last_name, c.country_id
from customers c
where c.cust_id =any (select s.cust_id
from sales s
where s.quantity_sold > 1000);
示例b中的子查询是非关联的,即,他没有引用关联列。 "c.cust_id = any s.cust_id"被称为连接条件。
子查询评估
如果一个not exists子查询返回0行,则被评估为true。
any/all子查询会返回值的集合,如果谓词中包含any/all的子查询满足该条件,则其被评估为true.例如上例,至少要有一个s.cust_id要可匹配上查询b中的any子查询的连接条件中的c.cust_id。
注:oracle会将一个不能展开的any或all子查询,转换为相应的exists和not exists子查询。
当一个关联子查询不能展开时,该子查询会被评估多次,为外层表的每一行,替换关联列的值(例如示例a中customers.cust_id)。因此,子查询中表的访问和连接在每一次调用时被重复执行,并且,涉及子查询中的表和外层查询中的表的连接次序也是不能改变的。这类评估还会抑制并行化。
xa展示了查询a的执行计划。这里的子查询展开被关闭了。观察执行计划底部的谓词信息,不能展开的子查询文本被显示为filter。如执行计划所示,子查询会被评估多次(例如50k次,外层customers表中的每一行一次)。
xa.
exists和any子查询
半连接通常用来展开exists和any子查询。但是,在不涉及重复行的情况下,内连接也是可以用来展开exists和any子查询的。
这里,我们使用如下的非标准语法,来表示半连接:
t1.x s= t2.y
其中t1是半连接中的左表,t2是半连接中的右表。半连接的语义如下:
只要t1.x可以在t2.y的值中找到匹配,就返回t1表的该行,而无需进一步查找更多的匹配。
考虑一下前面的查询a,子查询展开产生了查询c,这里子查询被合并到了外层查询。关联条件被转换为连接谓词;customers表和sales表成为了半连接中相应的左表和右表。
c.
select c.cust_last_name, c.country_id
from customers c, sales s
where s.quantity_sold > 1000 and
c.cust_id s= s.cust_id;
查询c的执行计划如下面的xc所示。注意计划xc和xa的成本的不同。xa是关闭展开时产生的,显然,展开时的计划(xc)是更优的,其成本已从5006k下降到2300.(查询b也会产生和xc相同的计划)
xc.
现在,再看一下查询d。它包括了两个表的非相关any子查询。
d.
select c.cust_last_name, c.country_id
from customers c
where c.cust_id =any (select s.cust_id
from sales s, products p
where p.prod_id = s.prod_id and p.prod_list_price > 105);
查询d中的子查询可以通过半连接被展开。但是,子查询中的表是内连接,sales和products表应该在半连接前被执行。因此,需要生成内联视图,以确保连接次序。查询e展示了对查询d的展开转换。这里的子查询是不相关的,并且转换为内联视图。成为半连接中的右表。而关联谓词则被转换为连接谓词。
e.
select c.cust_last_name, c.country_id
from customers c,
(select s.cust_id as s_cust_id
from sales s, products p
where p.prod_id = s.prod_id and
p.prod_list_price > 105) vw
where c.cust_id s= vw.s_cust_id;
xe展示了查询e的执行计划。它有一个优化器生成的,名为vw_nsq_1,在三种可用的连接方法(嵌套循环,hash和排序合并)中,hash方法被优化器选中做为半连接的连接方法。
xe
子查询展开博客的第二部分,会讨论not exists子查询,单行聚合子查询,只要它们被展开前的有效性验证所允许。
原文链接:
原文内容:
optimizer transformations: subquery unnesting part 1
january 1, 2020 | 5 minute read
maria colgan
distinguished product manager
we have received a ton of requests for more information on oracle optimizer transformations so we thought we would put together a series of blog posts describing the most commonly used transformations. you should expect to see two blog posts a month on this over the next quarter.
before we begin we should explain what we mean by transformation? when a sql statement is parsed, the optimizer will try to “transform” or rewrite the sql statement into a semantically equivalent sql statement that can be processed more efficiently. we first discussed the concepts of transformations in a blog post in june 2008 called why are some of tables in my query missing in my plan. this original post dealt with just one of the new transformations introduced in oracle database 11g called table elimination. part one of our new series will deal with subquery unnesting. i must give credit to rafi, one of the senior optimizer developers, who provided the content for this topic.
subquery unnesting
subquery unnesting is an optimization that converts a subquery into a join in the outer query and allows the optimizer to consider subquery table(s) during access path, join method, and join order selection. unnesting either merges the subquery into the body of the outer query or turns it into an inline view.
without unnesting, the subquery is evaluated multiple times, for each row of the outer table, and thus many efficient access paths and join methods cannot be considered.
here we will discuss the unnesting of any and exists subqueries, which appear in the where clause.
terminology
any sub-query block in a query statement may be called a subquery; however, we use the term subquery for a sub-query block that appears in the where, select and having clauses. some oracle documentation uses the term “nested subquery” for what we refer to as a subquery. a sub-query block that appears in the from clause is called a view or derived table.
there are many ways to classify a subquery. the main categorization comes from the way a subquery is used in sql statements. a where clause subquery belongs to one of the following types: single-row, exists, not exists, any, or all. a single-row subquery must return at most one row, whereas the other types of subquery can return zero or more rows.
any and all subqueries are used with relational comparison operators: =, >,>=, <, <=, and <>. in sql, the set operator in is used as a shorthand for =any and the set operator not in is used as a shorthand for <>all.
query a shows an example of a correlated exists subquery.
a.
select c.cust_last_name, c.country_id
from customers c
where exists (select 1
from sales s
where s.quantity_sold > 1000 and
s.cust_id = c.cust_id);
a column that appears in a subquery is called a correlated column, if it comes from a table not defined by the subquery. the subquery in a is correlated, as it refers to a correlated column, c.cust_id, which comes from, customers, a table not defined by the subquery. the predicate, s.cust_id = c.cust_id, is called a correlating condition or a correlated predicate.
consider query b, which contains an uncorrelated any subquery. note that queries b and a are semantically equivalent.
b.
select c.cust_last_name, c.country_id
from customers c
where c.cust_id =any (select s.cust_id
from sales s
where s.quantity_sold > 1000);
the subquery in b is uncorrelated, as it does not refer to a correlated column. “c.cust_id = any s.cust_id” in b is called a connecting condition.
subquery evaluation
a not exists subquery evaluates to true, if it returns no rows.
the any/all subquery returns a set of values, and the predicate containing the any/all subquery will evaluate to true, if it is satisfied. for example, at least one s.cust_id values must match c.cust_id in the connecting condition of the any subquery of query b.
note that in oracle, a non-unnested any and all subquery is converted into a correlated exists and not exists subquery respectively.
when a correlated subquery is not unnested, the subquery is evaluated multiple times, for each row of the outer tables, substituting the values of correlated columns (e.g., customer.cust_id in a). thus, table accesses and joins inside the subquery are repeatedly performed with each invocation and join orders involving subquery tables and outer query tables cannot be explored. this type of evaluation also inhibits parallelization.
xa shows the execution plan for query a. here subquery unnesting has been disabled. observe that the text of the non-unnested subquery filter is displayed in the predicate dump at the bottom of the plan. as the execution plan shows, the subquery will be evaluated multiple (i.e., 50k) times (once per each outer row of the customers table).
xa.
explain_plan_for_a.png
exists and any subqueries
semi-join is generally used for unnesting exists and any subqueries. however, in some cases where duplicate rows are not relevant, inner join can also be used to unnest exists and any subqueries.
here we represent semi-join by the following non-standard syntax: t1.x s= t2.y, where t1 is the left table and t2 is the right table of the semi-join. the semantics of semi-join is the following: a row of t1 is returned as soon as t1.x finds a match with any value of t2.y without searching for further matches.
consider the previously shown query a. unnesting of the subquery in a produces query c, where the body of the subquery has been merged into the outer query. here the correlating condition has been turned into a join predicate; customers and sales become the left and right tables respectively in the semi-join.
c.
select c.cust_last_name, c.country_id
from customers c, sales s
where s.quantity_sold > 1000 and
c.cust_id s= s.cust_id;
the execution plan of c is shown below as xc. note the difference between the costs of the plan xc and the plan xa; recall that xa was generated by disabling unnesting. clearly, the plan with unnesting (xc) is much more optimal; the cost has come down from 5006k to 2300. (the query b also produces the same plan as xc.)
xc.
explain_plan_for_c.png
now consider query d, which contains an uncorrelated any subquery that has two tables.
d.
select c.cust_last_name, c.country_id
from customers c
where c.cust_id =any (select s.cust_id
from sales s, products p
where p.prod_id = s.prod_id and p.prod_list_price > 105);
the subquery in d can be unnested by using a semi-join; however, the inner join of the tables in the subquery, sales and products must take place before the semi-join is performed. therefore, an inline view needs to be generated in order to enforce the join order. the query e shows the unnesting transformation of d. here the subquery is decorrelated and converted into an inline view, which becomes the right table in the semi-join; and the correlated predicate is turned into a join predicate.
e.
select c.cust_last_name, c.country_id
from customers c,
(select s.cust_id as s_cust_id
from sales s, products p
where p.prod_id = s.prod_id and
p.prod_list_price > 105) vw
where c.cust_id s= vw.s_cust_id;
xe shows the execution plan of e. it has an optimizer-generated inline view named vw_sq_1. of the three available join methods (i.e., nested-loop, hash, and sort-merge), the hash method was selected by the optimizer to do the semi-join.
xe
explain_plan_for_e.png
of our blog on subquery unnesting discusses not exists subqueries, single-row aggregated subqueries, as well as the validity checks performed before unnesting is allowed.