Sunday, August 24, 2008

Oracle migration: DBMS_STATS

When migrating from (say) Oracle 9 to 10 or higher version, you need to consider the effect of changes to dbms_stats that can alter performance of your applications. The basic issue is that the default value for for the parameter optimzer_dynamic_sampling is 2 (auto?) 10, which means use dynamic sampling for any table without stats . This setting was 1 in 9i, so your performance can change. The problem is not so much that the default behavior has changed, as much as it that there is script that is run when you set a database up, that creates a job which is run each day and uses the default settings to create statistics for any tables with no statistics or stale statistics. We dodge this by setting the number of hash buckets to 1when we run dbms_stats, and oracle is smart enough not to hash a whole table into one bucket. See below. There is another parameter optimizer_goal that is related to optimizer_dynamic_sampling in that Lewis says there appears to be no difference between them, so I am not sure which of the two method_opt (below) is setting, but optimizer_goal seems likely.

Richard Foote, a well known expert on tuning Oracle has suggested that this may be the largest problem in migration from versions 9 to 10 ( and beyond http://richardfoote.wordpress.com/2008/01/04/dbms_stats-method_opt-default-behaviour-changed-in-10g-be-careful/#comment-1429 ) and Jonathan Lewis ( http://apress.com/book/view/9781590596364 ) also recognizes that the change in the default behavior may not be helpful. (If Foote is correct, it certainly is not the only problem).

Lewis mentions the conditions under which job of running dbms_stats is set up, as being if you use the database creation assistant, or if you run catproc.sql during a manual creation. That may give you some hope that the job was not set up ( you did not run catproc.sql ) but it is a false hope; because of the dependencies of scripts on other scripts, it is unlikely that you will figure out a way not to run catproc.sql, it is a major script which calls other scripts, who in turn call their own lists of scripts, and the script that sets up the job is catmwin.sql, called by catpcnfg.sql (called by catproc.sql).

You should be assured that the job is being run every day if you have a working database, but you can check if you want to make sure, like this:

SQL> select job_name from dba_scheduler_jobs;

JOB_NAME
------------------------------
FGR$AUTOPURGE_JOB
BSLN_MAINTAIN_STATS_JOB
DRA_REEVALUATE_OPEN_FAILURES
HM_CREATE_OFFLINE_DICTIONARY
ORA$AUTOTASK_CLEAN
PURGE_LOG
MGMT_STATS_CONFIG_JOB
MGMT_CONFIG_JOB

8 rows selected.

This shows that the jobs relating to collecting the statistics are qued up.

What's Next?

Read up on dbms_ stats and figure out if you want to learn more, the links above are good places to start. Remember that the changes were made because, in the opinion of Oracle developers, they will be a help. If you think that may not be true, try running you own invocation of dbms_stats and see that makes things better. Here are some things to look out for:

The default setting for method_opt in 9i is "all columns size 1 (which means don't do it) and the default setting for method_opt in 10 is "auto" meaning it will decide which columns need histograms, which will be wrong in some cases.

Taking longer to create stats: the now deprecated analyze command often ran in parallel, dbms_stats is running pl/sql code, and can not do that. If it is taking longer and also creating histograms that you feel you don't need, you need to consider fixing it as it relates to your application (profiles are one way)(but do not think about disabling it; the stats are also run on the data dictionary, and may do it some good).

9i standard for running dbms_stats:

begin
dbms_stats.gather_table_stats(
user,
my_table,
cascade => true,
estimate_percent => null
);
end;

To have th same results in 10, you need to stop the creation of histograms:

begin
dbms_stats.gather_table_stats(
user,
my_table,
cascade => true,
estimate_percent => null,
method_opt => 'for all columns size 1' -- the default setting in 9i
);
end;

The added line cancels the creation of histograms by telling Oracle to only use one bucket for frequency. Clearly there will be times when you will want histograms and, in those cases, use other numbers rather than 1.

See Oracle doc b28419.pdf for more information on dbms_stats, or review the documentation provided with the code (if any) by using the following method for any package. Don't use package body for type, the bodies are compiled and wrapped, to "protect them from prying eyes." I am not sure who they have in mind for that.

SQL> select text
2 from all_source
3 where name = 'DBMS_OUTPUT'
4 and type = 'PACKAGE'
5 order by line
/

TEXT
----------------------------------------------------------------------------------------------------
package dbms_output as

-- DE-HEAD <- tell SED where to cut when generating fixed package ------------ -- OVERVIEW -- -- These procedures accumulate information in a buffer (via "put" and -- "put_line") so that it can be retrieved out later (via "get_line" or -- (much more)

NEXT: I will cover setting stats in a future post, but for now, here is one of the concepts: if you have a huge table, but are sure that only a few rows in it will be used by a query you want to run, setting the number of rows to a number about equal to the number of rows that will be selected, could make this table an attractive table to drive a hash join (it will fit in memory), and the larger table can probe it without being entirely in memory, just by doing a single full scan. It doing an efficient hash join, it is critical that one table fit entirely into memory, BUT not the whole table, just the rows meeting the select criteria, hence inserting realistic stats.

No comments: