How Many Rows Will Be Returned?
A Quick Peek Into the Future
by David Kogosov
HERE'S a common problem in executing
complicated database queries: How can you find
out how many rows the result set of a query will contain before
the query is executed? If the result set is too large, your application
should warn the user to narrow the search instead of just blundering
forward. Here are the traditional solutions to this challenge:
- Use a Select Count(*)... with the same Where clause as
the actual query. This will tell you exactly how many rows will
be returned by a query. Unfortunately, this is an unacceptably
slow solution because it forces you to execute every query twice
on the server.
- Write a script in the RetrieveRow event to count the number
of rows as they come in. This slows the query down significantly
by making the code in the RetrieveRow event a bottleneck.
The approach that I propose is radically different from these
two options and is very effective. Sybase SQL Server query optimizer
lets a user obtain a detailed query execution plan. By parsing
this output, you can get a pretty precise estimate of the size
of the result.
Inspiration and investigation
At the April 1994 Performance and Tuning Conference, Peter Thawley,
a senior technical specialist for Sybase's Corporate Technical
Services Group, gave the following response when asked how to
use database statistics to avoid selecting count(*) before executing
the query:
". . . currently it wouldn't be easy. Using the select count(*)
method, while accurate, causes the server a lot of extra work, not
to mention the creation of a work table. If one was adventurous,
one could parse the output of dbcc traceon (302,310)
but that wouldn't be a lot of fun."
I felt adventurous enough to run dbcc traceon(302,310) before
the showplan and was fascinated by the abundance of information
I received. I used the following query:
select name from sysobjects a. sysindexes b where a.id=b.id and
a.name like "sys%"
Sure enough there was the estimate of the result set cardinality
(see Listing 1).
Parsing this output didn't seem too tough:
- Look for the string "FINAL PLAN".
- Look for the string "rows=" and get a value after
it.
- Keep doing step 2 until a string "STEP" is found,
indicating that the detailed execution report is complete.
Implementation
The next logical step was to get access to the detailed plan from
my PowerBuilder application. In my PowerBuilder script I did an
EXECUTE IMMEDIATE with "dbcc traceon(302,310)", "set
showplan on" and "set noexec on" to get a plan.
After rummaging through various data elements of SQLCA in the
debugger, I realized that SQL Server is sending the detailed execution
plan to the database client as a suite of messages. Unfortunately,
PowerBuilder doesn't pass database messages on to the client,
so I had to do it myself.
In order for me to be able to process Sybase messages, I had
to pass to DBLIB the pointer to my message handler. It implied
that the message handler couldn't reside in my PowerBuilder application,
it had to be implemented as a DLL function.
The evaluation process from within the PowerBuilder
application would include several steps:
- Initializing the new message handler.
- Sending instructions the SQL Server to generate the detailed
plan.
- Getting results back.
- Restoring the old message handler.
To hide the complexity of the procedure from a user, I decided
to encapsulate the functionality into a custom class with one
public function: count(). This class, cc_count, also uses internally
two private functions:
- execsql()--executes SQL Server commands making sure they
aren't issued inside a transaction.
- read()-reads last SQL Server message, used mostly for
debugging.
Here are external functions encapsulated in the class cc_count:
- MsgInit(uint hwnd)
- MsgDeInit()
- MsgRead(ref string mes)
Listing 2
presents an example of PowerScript code used in the
sample application.
To use the method described here you have to include MSG.DLL
from the Companion Disk in your path or place it in the application
directory, and cc_count has to be copied to a .PBL file of the
application. You can also modify the source code of the DLL (MSG.C,
MSG.DEF) to fit your application's or company's specific need.
To make your estimates as precise as possible keep statistics
updated on the tables involved in the query.
There are two limitations on this method.
- You must have the SA privileges to be able to run dbcc.
- The method doesn't work if a query contains a GROUP BY
clause. If it does, the estimate goes completely off the mark.
Conclusion
Developers face this problem all the time. But even a seemingly
outlandish idea for fixing the problem can become a workable solution.
Contents Copyright © 1996, Pinnacle Publishing Inc. All Rights Reserved.