1. What is Histogram
2. Purpose of Histograms
3. Type of Histogram
4. How Oracle Database Chooses the Histogram Type
5. When Oracle Database Creates Histograms
6. How to Use Histogram
7. When to Use/Not Use Histograms
8. Verify existence Histograms
9. Delete Histogram
10. Create Histogram
11. Gather STATS with REPEAT Option
12. How to change default METHOD_OPT value from AUTO to REPEAT at Database Level
13. Extended Statistics
Gather stats for column level is called histogram.
A histogram is a special type of column statistic that provides more detailed information about the data distribution in a table column. A histogram sorts values into "buckets," as you might sort coins into buckets.
The database has to scan the table and read the values in the column to build the histogram.
The real purpose of adding a histogram to table is, to give the more information to optimizer to come up with a better plan.
Histograms may help the Oracle optimizer in deciding whether to use an index vs. a full-table scan (where index values are skewed) or help the optimizer determine the fastest table join order.
Histograms are useful when a column has skewed data. The CBO (Cost Based Optimizer) may use that information when trying to find the best plan to retrieve the requested data.
If no histograms are present, the CBO assumes an even distribution of data. Histograms tell the CBO when data is not evenly distributed, and can help the CBO to estimate the number of rows returned from a table join (called "cardinality"). Having histograms on skewed column may help the optimizer in making a proper decision.
Data skew: Skew is defined like uneven data distribution in column. Large variations in the number of duplicate values in a column.
By default the optimizer assumes a uniform distribution of rows across the distinct values in a column. For columns that contain data skew (a nonuniform distribution of data within the column), a histogram enables the optimizer to generate accurate cardinality estimates for filter and join predicates that involve these columns. For example, a California-based book store ships 95% of the books to California, 4% to Oregon, and 1% to Nevada. The book orders table has 300,000 rows. A table column stores the state to which orders are shipped. A user queries the number of books shipped to Oregon. Without a histogram, the optimizer assumes an even distribution of 300000/3 (the NDV is 3), estimating cardinality at 100,000 rows. With this estimate, the optimizer chooses a full table scan. With a histogram, the optimizer calculates that 4% of the books are shipped to Oregon, and chooses an index scan.
Based on the NDV and the distribution of the data, the database chooses the type of histogram to create. (In some cases, when creating a histogram, the database samples an internally predetermined number of rows.) The types of histograms are as follows: The database view DBA_TAB_COL_STATISTICS or USER_TAB_COL_STATISTICS has a column called HISTOGRAM. - HEIGHT BALANCED - Height Based : Each bucket has same number of values,skewed values occupy more buckets - FREQUENCY - Value Based: Each key has its own - TOP FREQUENCY HISTOGRAMS - HYBRID HISTOGRAMS - NONE - No histogram present for the column Note: Frequency histograms are more precise than Height Balanced histograms.
4. How Oracle Database Chooses the Histogram Type
Oracle Database uses several criteria to determine which histogram to create: frequency, top frequency, height-balanced, or hybrid. The histogram formula uses the following variables: NDV: This represents the number of distinct values in a column. For example, if a column only contains the values 100, 200, and 300, then the NDV for this column is 3. n: This variable represents the number of histogram buckets. The default is 254. p: This variable represents an internal percentage threshold that is equal to (1–(1/n)) * 100. For example, if n = 254, then p is 99.6. An additional criterion is whether the estimate_percent parameter in the DBMS_STATS statistics gathering procedure is set to AUTO_SAMPLE_SIZE (default). The following diagram shows the decision tree for histogram creation.
5. When Oracle Database Creates Histograms
If DBMS_STATS gathers statistics for a table, and if queries have referenced the columns in this table, then Oracle Database creates histograms automatically as needed according to the previous query workload. The basic process is as follows: You run DBMS_STATS for a table with the METHOD_OPT parameter set to the default SIZE AUTO. A user queries the table. The database notes the predicates in the preceding query and updates the data dictionary table SYS.COL_USAGE$. You run DBMS_STATS again, causing DBMS_STATS to query SYS.COL_USAGE$ to determine which columns require histograms based on the previous query workload. SQL> CREATE TABLE SCOTT.RAJ AS SELECT * FROM DBA_OBJECTS; Table created. SQL> exec dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'RAJ',estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE,cascade=>TRUE,degree =>4); PL/SQL procedure successfully completed. SQL> SQL> col table_name for a10 col column_name for a30 select TABLE_NAME,COLUMN_NAME,HISTOGRAM from dba_tab_columns where table_name = 'RAJ' AND OWNER='SCOTT';SQL> SQL> TABLE_NAME COLUMN_NAME HISTOGRAM ---------- ------------------------------ --------------- RAJ OWNER NONE RAJ OBJECT_NAME NONE RAJ SUBOBJECT_NAME NONE RAJ OBJECT_ID NONE RAJ DATA_OBJECT_ID NONE RAJ OBJECT_TYPE NONE RAJ CREATED NONE RAJ LAST_DDL_TIME NONE RAJ TIMESTAMP NONE RAJ STATUS NONE RAJ TEMPORARY NONE RAJ GENERATED NONE RAJ SECONDARY NONE RAJ NAMESPACE NONE RAJ EDITION_NAME NONE 15 rows selected. SQL> SQL> SELECT COUNT(*) FROM SCOTT.RAJ WHERE TEMPORARY='Y'; COUNT(*) ---------- 222 SQL> select TABLE_NAME,COLUMN_NAME,HISTOGRAM from dba_tab_columns where table_name = 'RAJ' AND OWNER='SCOTT'; TABLE_NAME COLUMN_NAME HISTOGRAM ---------- ------------------------------ --------------- RAJ OWNER NONE RAJ OBJECT_NAME NONE RAJ SUBOBJECT_NAME NONE RAJ OBJECT_ID NONE RAJ DATA_OBJECT_ID NONE RAJ OBJECT_TYPE NONE RAJ CREATED NONE RAJ LAST_DDL_TIME NONE RAJ TIMESTAMP NONE RAJ STATUS NONE RAJ TEMPORARY NONE RAJ GENERATED NONE RAJ SECONDARY NONE RAJ NAMESPACE NONE RAJ EDITION_NAME NONE 15 rows selected. SQL> exec dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'RAJ',estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE,cascade=>TRUE,degree =>4); PL/SQL procedure successfully completed. SQL> select TABLE_NAME,COLUMN_NAME,HISTOGRAM from dba_tab_columns where table_name = 'RAJ' AND OWNER='SCOTT'; TABLE_NAME COLUMN_NAME HISTOGRAM ---------- ------------------------------ --------------- RAJ OWNER NONE RAJ OBJECT_NAME NONE RAJ SUBOBJECT_NAME NONE RAJ OBJECT_ID NONE RAJ DATA_OBJECT_ID NONE RAJ OBJECT_TYPE NONE RAJ CREATED NONE RAJ LAST_DDL_TIME NONE RAJ TIMESTAMP NONE RAJ STATUS NONE RAJ TEMPORARY FREQUENCY <----
The default value of METHOD_OPT from 10g onwards is ‘FOR ALL COLUMNS SIZE AUTO’. size_clause can defined as := SIZE {integer REPEAT AUTO SKEWONLY} METHOD_OPT=>'FOR ALL COLUMNS SIZE AUTO' METHOD_OPT=>'FOR ALL COLUMNS SIZE REPEAT' METHOD_OPT=>'FOR ALL COLUMNS SIZE SKEWONLY' -- Recommend syntax from 10g onwards execute dbms_stats.gather_table_stats(ownname => USERNAME, tabname => 'TABLE_NAME', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, method_opt => 'FOR ALL COLUMNS SIZE AUTO', cascade => TRUE); Integer Number of histogram buckets. Must be in the range [1,254] The maximum number of buckets in a histogram in 254 in 11g and below and 2048 in 12c. The default number of buckets when using SIZE AUTO or SIZE SKEWONLY is 254, which provides an appropriate level of detail for most data distributions. To force histogram creation it is recommend that the number of buckets be left at 254. Note SIZE 1 means no histogram will be created. If the number of frequently occurring distinct values in a column is relatively small, then it is useful to set the number of buckets to be greater than the number of frequently occurring distinct values. SELECT column_name, num_distinct, histogram FROM dba_tab_col_statistics WHERE table_name= '&TABLE_NAME' and owner='&OWNER'; AUTO: Let Oracle determines the which columns to collect histograms based on data distributiion and the workload of the columns. exec dbms_stats.gather_table_stats('HR','EMPLOYEES',method_opt=> 'for all columns size AUTO); REPEAT: This will re-create existing histograms without creating new histograms. exec dbms_stats.gather_table_stats('HR','EMPLOYEES',METHOD_OPT=> 'FOR ALL COLUMNS SIZE REPEAT'); SKEWONLY: When you specify SKEWONLY, the database will look at the data distribution for each column to determine if the data is skewed enough to warrant the creation of a histogram. We can do this by gathering histograms on the skewed columns. exec dbms_stats.gather_table_stats('HR','EMPLOYEES',method_opt=> 'for all columns size SKEWONLY');
7. When to Use/Not Use Histograms
When to Use Histograms Create histograms on columns that are frequently used in WHERE clauses of queries and have a highly-skewed data distribution. It is recommended to let Oracle determine this automatically using the METHOD_OPT AUTO setting so that Oracle can determine the columns to collect histograms upon based on data distribution and the workload of the columns. When to Not Use Histograms Also, be aware that histograms, as well as all optimizer statistics, are static. If the data distribution of a column changes frequently, it is necessary to recompute the histogram for a given column. Histograms are not useful for columns with the following characteristics: - The column data is uniformly distributed - The column is not used in WHERE clauses of queries - The column is unique and is used only with equality predicates - Do not create them on every column of every table. This requires more time when the table is analyzed. Increases parse time. And can result in poor plans being generated by the optimizer.
8. Verify existence Histograms
select dbms_stats.get_prefs ('METHOD_OPT') from dual; -- At Database level. SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER'; SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COL_STATISTICS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER'; SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM,NOTES FROM USER_TAB_COL_STATISTICS WHERE TABLE_NAME='&TABLE_NAME'; SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM,NOTES FROM USER_TAB_COL_STATISTICSWHERE TABLE_NAME='&TABLE_NAME'; USER_HISTOGRAMS USER_PART_HISTOGRAMS USER_SUBPART_HISTOGRAMS USER_TAB_HISTOGRAMS
- on single Column TEMPORARY SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER'; exec dbms_stats.gather_table_stats('HR','EMPLOYEES', METHOD_OPT => 'FOR COLUMNS TEMPORARY SIZE 1'); SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER'; SQL> select TABLE_NAME,COLUMN_NAME,HISTOGRAM from dba_tab_columns where table_name = 'RAJ' AND OWNER='SCOTT'; TABLE_NAME COLUMN_NAME HISTOGRAM ---------- ------------------------------ --------------- RAJ OWNER NONE RAJ OBJECT_NAME NONE RAJ SUBOBJECT_NAME NONE RAJ OBJECT_ID NONE RAJ DATA_OBJECT_ID NONE RAJ OBJECT_TYPE NONE RAJ CREATED NONE RAJ LAST_DDL_TIME NONE RAJ TIMESTAMP NONE RAJ STATUS NONE RAJ TEMPORARY FREQUENCY <--- RAJ GENERATED NONE RAJ SECONDARY NONE RAJ NAMESPACE NONE RAJ EDITION_NAME NONE 15 rows selected. SQL> SQL> exec dbms_stats.gather_table_stats('SCOTT','RAJ', METHOD_OPT => 'FOR COLUMNS TEMPORARY SIZE 1'); PL/SQL procedure successfully completed. SQL> select TABLE_NAME,COLUMN_NAME,HISTOGRAM from dba_tab_columns where table_name = 'RAJ' AND OWNER='SCOTT'; TABLE_NAME COLUMN_NAME HISTOGRAM ---------- ------------------------------ --------------- RAJ OWNER NONE RAJ OBJECT_NAME NONE RAJ SUBOBJECT_NAME NONE RAJ OBJECT_ID NONE RAJ DATA_OBJECT_ID NONE RAJ OBJECT_TYPE NONE RAJ CREATED NONE RAJ LAST_DDL_TIME NONE RAJ TIMESTAMP NONE RAJ STATUS NONE RAJ TEMPORARY NONE <-- deleted RAJ GENERATED NONE RAJ SECONDARY NONE RAJ NAMESPACE NONE RAJ EDITION_NAME NONE 15 rows selected. SQL> Note: If you regather table stats again then histogram will re-create automatically
SQL> exec dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'RAJ',estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE,cascade=>TRUE,degree =>4); PL/SQL procedure successfully completed. SQL> SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='RAJ' AND OWNER='SCOTT'; TABLE_NAME COLUMN_NAME HISTOGRAM ---------- ------------------------------ --------------- RAJ OWNER NONE RAJ OBJECT_NAME NONE RAJ SUBOBJECT_NAME NONE RAJ OBJECT_ID NONE RAJ DATA_OBJECT_ID NONE RAJ OBJECT_TYPE NONE RAJ CREATED NONE RAJ LAST_DDL_TIME NONE RAJ TIMESTAMP NONE RAJ STATUS NONE RAJ TEMPORARY FREQUENCY <--- RAJ GENERATED NONE RAJ SECONDARY NONE RAJ NAMESPACE NONE RAJ EDITION_NAME NONE 15 rows selected. SQL>
- Delete histograms on all columns SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER'; exec dbms_stats.gather_table_stats('HR','EMPLOYEES', METHOD_OPT => 'FOR ALL COLUMNS SIZE 1'); SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';
10. Create Histogram on Column USERNAME
SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';
exec dbms_stats.gather_table_stats('SCOTT','TEST', METHOD_OPT => 'FOR COLUMNS USERNAME SIZE AUTO');
SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';
11. Gather STATS with REPEAT Option
Please note REPEAT option will re-create existing histograms without creating new histograms. - For TEMPORARY Column SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER'; exec dbms_stats.gather_table_stats('HR','EMPLOYEES', METHOD_OPT => 'FOR COLUMNS TEMPORARY SIZE REPEAT'); SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER'; - For All columns SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER'; exec dbms_stats.gather_table_stats('HR','EMPLOYEES',METHOD_OPT=> 'FOR ALL COLUMNS SIZE REPEAT'); SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COLUMNS WHERE TABLE_NAME='&TABLE_NAME' AND OWNER='&OWNER';
12. How to change default METHOD_OPT value from AUTO to REPEAT at Database Level
-- Verify select dbms_stats.get_prefs ('METHOD_OPT') from dual; -- At Database level. -- set to REPEAT This procedure is used to set the global statistics preferences. exec dbms_stats.set_global_prefs ('METHOD_OPT','FOR ALL COLUMNS SIZE REPEAT'); -- Verify select dbms_stats.get_prefs ('METHOD_OPT') from dual; -- At Database level. Since we set to REPEAT at Database level, hence going forwared gathering of statistics, manually or automatically will not create any new histograms. Collect histograms only on the columns that already have histograms. SQL> select dbms_stats.get_prefs ('METHOD_OPT') from dual; DBMS_STATS.GET_PREFS('METHOD_OPT') -------------------------------------------------------------------------------- FOR ALL COLUMNS SIZE AUTO <---- SQL> exec dbms_stats.set_global_prefs ('METHOD_OPT','FOR ALL COLUMNS SIZE REPEAT'); PL/SQL procedure successfully completed. SQL> select dbms_stats.get_prefs ('METHOD_OPT') from dual; DBMS_STATS.GET_PREFS('METHOD_OPT') -------------------------------------------------------------------------------- FOR ALL COLUMNS SIZE REPEAT <--- SQL> --- Revert back to AUTO SQL> exec dbms_stats.set_global_prefs ('METHOD_OPT','FOR ALL COLUMNS SIZE AUTO'); PL/SQL procedure successfully completed. SQL> select dbms_stats.get_prefs ('METHOD_OPT') from dual; DBMS_STATS.GET_PREFS('METHOD_OPT') -------------------------------------------------------------------------------- FOR ALL COLUMNS SIZE AUTO <--- SQL>
In real-world data, there is often a relationship or correlation between the data stored in different columns of the same table. Creating extended statistics on a group of columns, the Optimizer can determine a more accurate cardinality estimate when the columns are used together in a where clause of a SQL statement. We can use DBMS_STATS.CREATE_EXTENDED_STATS to define the column group you want to have statistics gathered on as a whole. Once the group has been established Oracle will automatically maintain the statistics on that column group when statistics are gathered on the table. Create Extended Statistics SELECT DBMS_STATS.CREATE_EXTENDED_STATS('SCOTT', 'SUGI', '(OBJECT_TYPE,TEMPORARY)') FROM DUAL; exec dbms_stats.gather_table_stats('SCOTT','SUGI'); -- Display extended statistics extensions (Will show the relationship and extension name) column extension format a30 select extension_name, extension from dba_stat_extensions where table_name='SUGI'; -- Display extended statistics distinct values and histograms column col_group format a30 select e.extension col_group, t.num_distinct, t.histogram from dba_stat_extensions e join dba_tab_col_statistics t on e.extension_name=t.column_name and t.table_name = 'SUGI'; Dropping Extended Statistics exec dbms_stats.drop_extended_stats(ownname => 'SCOTT',tabname => 'SUGI',extension => '(OBJECT_TYPE,TEMPORARY)'); Eg: SQL> CREATE TABLE SCOTT.SUGI AS SELECT * FROM DBA_OBJECTS; Table created. SQL> SELECT COUNT(*) FROM SCOTT.SUGI WHERE TEMPORARY='Y' and OBJECT_TYPE='TABLE'; COUNT(*) ---------- 115 SQL> SQL> column extension format a30 select extension_name, extension from dba_stat_extensions where table_name='SUGI';SQL> EXTENSION_NAME EXTENSION ------------------------------ ------------------------------ SYS_STUYM6W8ZX7#X$MP7$EJ_PVU9O ("OBJECT_TYPE","TEMPORARY") SQL> SQL> column col_group format a30 select e.extension col_group, t.num_distinct, t.histogram from dba_stat_extensions e join dba_tab_col_statistics t on e.extension_name=t.column_name and t.table_name = 'SUGI';SQL> 2 3 4 5 6 COL_GROUP NUM_DISTINCT HISTOGRAM ------------------------------ ------------ --------------- ("OBJECT_TYPE","TEMPORARY") 47 FREQUENCY SQL> -- Dropping Extended Statistics SQL> exec dbms_stats.drop_extended_stats(ownname => 'SCOTT',tabname => 'SUGI',extension => '(OBJECT_TYPE,TEMPORARY)'); PL/SQL procedure successfully completed. SQL> -- Verify SQL> column extension format a30 select extension_name, extension from dba_stat_extensions where table_name='SUGI';SQL> no rows selected <---- SQL> Limitations "Extended Statistics" can't be created on Objects owned by SYS Cluster Tables, IOT's,Temporary Tables, external Tables Extended Statistics can be created for columns in a group between 2 and 32 Other ....
Caution: Your use of any information or materials on this website is entirely at your own risk. It is provided for educational purposes only. It has been tested internally, however, we do not guarantee that it will work for you. Ensure that you run it in your test environment before using.
For more information:
Histograms: An Overview (10g and Above) (Doc ID 1445372.1)
How To Find Candidate Columns For Creating Extended Statistics Using “DBMS_STATS.SEED_COL_USAGE” (Doc ID 1430317.1)
Why DBMS_STATS with METHOD_OPT=>’FOR ALL COLUMNS SIZE SKEWONLY’ Gathers Histograms on Unique Key Columns (Doc ID 1473023.1)
How To Quickly Add/Remove Column Statistics (Histograms) For A Column (Doc ID 390249.1)
SIZE Clause in METHOD_OPT Parameter of DBMS_STATS Package (Doc ID 338926.1)
How to: Manage CBO Statistics During an Upgrade from 10g or 9i into 11g/12c (Doc ID 465787.1)
Incorrect Column Statistics Causing Queries To Take Bad Plans (Doc ID 552363.1)
How to Change Default Parameters for Gathering Statistics in 10g (Doc ID 725845.1)
How to Change Default Parameters for Gathering Statistics in Oracle 11g (Doc ID 1493227.1)
Why is a Particular Query Slower on One Machine (or Instance) than Another? (Doc ID 604256.1)
Statistics Gathering: Frequency and Strategy Guidelines (Doc ID 44961.1)
Best Practices for Automatic Statistics Collection (Doc ID 377152.1)
Query Running Slow with Extended Statistics (Doc ID 1130574.1)
How To Find Candidate Columns For Creating Extended Statistics Using “DBMS_STATS.SEED_COL_USAGE” (Doc ID 1430317.1)
https://blogs.oracle.com/optimizer/extended-statistics
https://blogs.oracle.com/optimizer/how-do-i-know-what-extended-statistics-are-needed-for-a-given-workload
https://docs.oracle.com/cd/E18283_01/appdev.112/e16760/d_stats.htm
https://neerajbhatia.files.wordpress.com/2010/11/everything-you-want-to-know-about-oracle-histograms-part-1.pdf
https://oracle-base.com/articles/11g/extended-statistics-enhancements-11gr2
http://www.dba-oracle.com/t_dbms_stats_create_extended_stats.htm