How Many Rows Will Be Returned?
A Quick Peek Into the Future

by David Kogosov




HERE'S a common problem in executing complicated database queries: How can you find out how many rows the result set of a query will contain before the query is executed? If the result set is too large, your application should warn the user to narrow the search instead of just blundering forward. Here are the traditional solutions to this challenge:

  • Use a Select Count(*)... with the same Where clause as the actual query. This will tell you exactly how many rows will be returned by a query. Unfortunately, this is an unacceptably slow solution because it forces you to execute every query twice on the server.

  • Write a script in the RetrieveRow event to count the number of rows as they come in. This slows the query down significantly by making the code in the RetrieveRow event a bottleneck.

The approach that I propose is radically different from these two options and is very effective. Sybase SQL Server query optimizer lets a user obtain a detailed query execution plan. By parsing this output, you can get a pretty precise estimate of the size of the result.

Inspiration and investigation

At the April 1994 Performance and Tuning Conference, Peter Thawley, a senior technical specialist for Sybase's Corporate Technical Services Group, gave the following response when asked how to use database statistics to avoid selecting count(*) before executing the query:

". . . currently it wouldn't be easy. Using the select count(*) method, while accurate, causes the server a lot of extra work, not to mention the creation of a work table. If one was adventurous, one could parse the output of dbcc traceon (302,310) but that wouldn't be a lot of fun."

I felt adventurous enough to run dbcc traceon(302,310) before the showplan and was fascinated by the abundance of information I received. I used the following query:

select name from sysobjects a. sysindexes b where a.id=b.id and a.name like "sys%"

Sure enough there was the estimate of the result set cardinality (see Listing 1).

    Parsing this output didn't seem too tough:

  1. Look for the string "FINAL PLAN".

  2. Look for the string "rows=" and get a value after it.

  3. Keep doing step 2 until a string "STEP" is found, indicating that the detailed execution report is complete.

Implementation

The next logical step was to get access to the detailed plan from my PowerBuilder application. In my PowerBuilder script I did an EXECUTE IMMEDIATE with "dbcc traceon(302,310)", "set showplan on" and "set noexec on" to get a plan. After rummaging through various data elements of SQLCA in the debugger, I realized that SQL Server is sending the detailed execution plan to the database client as a suite of messages. Unfortunately, PowerBuilder doesn't pass database messages on to the client, so I had to do it myself.

In order for me to be able to process Sybase messages, I had to pass to DBLIB the pointer to my message handler. It implied that the message handler couldn't reside in my PowerBuilder application, it had to be implemented as a DLL function.

The evaluation process from within the PowerBuilder application would include several steps:

  1. Initializing the new message handler.

  2. Sending instructions the SQL Server to generate the detailed plan.

  3. Getting results back.

  4. Restoring the old message handler.

To hide the complexity of the procedure from a user, I decided to encapsulate the functionality into a custom class with one public function: count(). This class, cc_count, also uses internally two private functions:

  • execsql()--executes SQL Server commands making sure they aren't issued inside a transaction.

  • read()-reads last SQL Server message, used mostly for debugging.

Here are external functions encapsulated in the class cc_count:

  • MsgInit(uint hwnd)
  • MsgDeInit()
  • MsgRead(ref string mes)

Listing 2 presents an example of PowerScript code used in the sample application.

To use the method described here you have to include MSG.DLL from the Companion Disk in your path or place it in the application directory, and cc_count has to be copied to a .PBL file of the application. You can also modify the source code of the DLL (MSG.C, MSG.DEF) to fit your application's or company's specific need.

To make your estimates as precise as possible keep statistics updated on the tables involved in the query.

There are two limitations on this method.

  • You must have the SA privileges to be able to run dbcc.

  • The method doesn't work if a query contains a GROUP BY clause. If it does, the estimate goes completely off the mark.

Conclusion

Developers face this problem all the time. But even a seemingly outlandish idea for fixing the problem can become a workable solution.



Contents Copyright © 1996, Pinnacle Publishing Inc. All Rights Reserved.

| Home Page | Products | Consulting | Developer Workshop | Contact Us |