Oracle’s 32 Character Limit in Generating Histograms
Upon Oracle’s release of 11g and noticeably with 11g Release 2, Oracle’s functionality is pretty rock solid. What I considered the most obvious weaknesses of the Oracle database engine are now resolved. Namely, bind peeking is resolved with Adaptive Cursor Sharing, and plan stability issues are resolved with SQL Plan Management.
One weakness still remaining with 11g Release 2(220.127.116.11.0) is the 32 character limit in generating histograms for wide character columns. In a nutshell, if you have a column that is wider than 32 characters AND its values are longer than 32 characters, Oracle will only consider the first 32 characters in the column in generating histograms. In other words, if the first 32 characters don’t vary much in all the rows or it’s identical in all rows, then the histograms generated or the absence of them can be way off. As a consequence, the optimizer will make wrong decisions, doing range scans when it shouldn’t, thinking that more rows will match the criteria than there should be.
But this really should be considered a limitation, not a bug. It is a limitation purposely put in place even at 11g Release 2. And it is a reasonable limitation because it is virtually impossible for Oracle to maintain histograms for columns that can be infinitely wide(up to 4000 characters for varchar2 columns). It’ll have to know how many distinct values there are and how many rows there are for each distinct value. So inevitably, Oracle has to draw a line somewhere.
Here’s a more detailed post from Hemant:
It’s a great read. At the time of this writing, the blog tested the limitation as of 11g Release 1. But I’ve tested the limitation with 11g Release 2, and it’s still present.
One workaround that worked for me is to set the column statistics of such a column manually, so that the number of distinct values and density based on the total number of rows in the table. So:
num_distinct = number of rows
density = 1 / number of rows
The following can be run manually if you wanna do it just once.
The automatic statistic collection may overwrite these columns stats. So to protect against that, on 10g’s Enterprise Manager, a “chain” can be created to contain the statistics collection procedure and the following procedure call immediately following. This will ensure that every time statistics is collected, the columns stats are set. On 11g, however, the automatic statistics collection is more integrated and hidden from us, so I would created so it’s better to schedule a repeating job to set them every n minutes. The following procedure can be reused for other tables when you supply different values for the parameters.
create or replace procedure set_column_stats_for_wide_cols (piOwner in varchar2, piTableName in varchar2, piColumnName in varchar2) as
vNumDistinct number := 0;
vDensity number := 0;
vNumRows number := 0;
select num_distinct, density
into vNumDistinct, vDensity
where owner = piOwner
and table_name = piTableName
and column_name = piColumnName;
where owner = piOwner
and table_name = piTableName;
if vNumDistinct != vNumRows OR vDensity != 1/vNumRows then
That’ll indicate to Oracle that each value in the column is unique. And it’ll know not to range scan the column.
Don’t expect this limitation to go away for future versions of Oracle. This is particularly true if the histograms are generated after the data is populated. And even if Oracle tries to maintain such stats instantly as data comes in, this will dramatically upset concurrency and scalability. One wilder idea may be to export the column out to an open source relational database or noSQL database, and use methods such as Map/Reduce to determine the number of distinct values. And then set the column stats in Oracle based on it. Not that Oracle cannot do Map/Reduce, doing it in Oracle may compromise other mission critical operations.