Terminus

Friday, July 15, 2011

Locate an Object across all databases on server

How many times have you wanted to quickly find an object on a server?

You don't know which database it's located in and there are 30+ databases on the server.

The following is a quick snippet that does this task. It's not very sophisticated but gets the job done.

Dave

--
-- Look for the object ColumnButtons across the server
-- Dave Winters 7/2011
use master
go
exec sp_msforeachdb @command1 = 'use [?];
if object_id(''ColumnButtons'') > 0 print ''Found in: ?''; '
--

Monday, May 23, 2011

Sorting NULL columns in SQL Server.

Recently I was working with a report developer who was struggling with NULL date-time values in their report output. The problem was this BI programmer was used to working with another RDBMS that had control settings specifying the sort order for NULLs. By SQL convention NULL are ordered first.

Unfortunately the case in the data had “end dates” for when policies come up for renewal. The desired output was to have policies without valid end dates(NULL) to be last in the output. For this article I’ll use Microsoft’s sample database AdventWorks. The table we’ll use for our report is ProductCostHistory. The EndDate column has NULLs that we want to be listed last in our report.

If we simply query the data with an ORDER-BY we’ll get the following result:
SELECT *
FROM [AdventureWorks].[Production].[ProductCostHistory]
ORDER BY [EndDate]

ProductID StartDate EndDate StandardCost ModifiedDate

----------- ----------------------- ----------------------- ------------------ -----------------------
707         2003-07-01 00:00:00.000 NULL                    13.0863            2003-06-17 00:00:00.000
708         2003-07-01 00:00:00.000 NULL                    13.0863            2003-06-17 00:00:00.000
              --Data edited out for clarity---
860         2002-07-01 00:00:00.000 2003-06-30 00:00:00.000 9.7136             2003-06-30 00:00:00.000
860         2003-07-01 00:00:00.000 NULL                    9.1593             2003-06-17 00:00:00.000

As expected the NULL EndDate values come first in the list.

My report writer buddy came up some pretty clever work around involving temp tables and self-joins. Not very clear and poorly performing to say the least. So I showed him the following as the first replacement solution to placing NULLs at the end.

SELECT *

FROM [AdventureWorks].[Production].[ProductCostHistory]
ORDER BY isnull([EndDate],'12/31/9999')
ProductID   StartDate               EndDate                 StandardCost       ModifiedDate
----------- ----------------------- ----------------------- ------------------ -----------------------
707         2001-07-01 00:00:00.000 2002-06-30 00:00:00.000 12.0278            2002-06-30 00:00:00.000
708         2001-07-01 00:00:00.000 2002-06-30 00:00:00.000 12.0278            2002-06-30 00:00:00.000
              --Data edited out for clarity---
860         2002-07-01 00:00:00.000 2003-06-30 00:00:00.000 9.7136             2003-06-30 00:00:00.000
860         2003-07-01 00:00:00.000 NULL                    9.1593             2003-06-17 00:00:00.000

Exactly what we wanted! BUT, this assumes that the date value ‘12/31/9999’ is beyond any other date in the table. Many times this solution is a safe assumption. This may NOT always be the case. I’ve seen databases where high values are sometimes used to indicate “out of range” or “present but invalid” etc.

The next solution addresses this condition by finding the MAX value in the column then adding an appropriate delta value to place the column after this maximum value. Here is the solution for our product cost history example:

DECLARE @maxEndDate DATETIME

SELECT @maxEndDate = MAX(EndDate)+1
     FROM [AdventureWorks].[Production].[ProductCostHistory]
SELECT *
FROM [AdventureWorks].[Production].[ProductCostHistory]
ORDER BY isnull([EndDate],@maxEndDate)
ProductID   StartDate               EndDate                 StandardCost       ModifiedDate
----------- ----------------------- ----------------------- ------------------ -----------------------
707         2001-07-01 00:00:00.000 2002-06-30 00:00:00.000 12.0278            2002-06-30 00:00:00.000
708         2001-07-01 00:00:00.000 2002-06-30 00:00:00.000 12.0278            2002-06-30 00:00:00.000
              --Data edited out for clarity---
860         2002-07-01 00:00:00.000 2003-06-30 00:00:00.000 9.7136             2003-06-30 00:00:00.000
860         2003-07-01 00:00:00.000 NULL                    9.1593             2003-06-17 00:00:00.000

As expected the data is sorted in the correct order with the NULL values following the valid dates.

Now for what I think is a better solution and does not require hard coding a MAX(or Min) value or scanning through the data finding the last value. Here is my preferred solution to our date problem:
SELECT *
FROM [AdventureWorks].[Production].[ProductCostHistory]
  ORDER BY (case when EndDate IS NULL then 1 else 0 end),EndDate
ProductID   StartDate               EndDate                 StandardCost       ModifiedDate
----------- ----------------------- ----------------------- ------------------ -----------------------
707         2001-07-01 00:00:00.000 2002-06-30 00:00:00.000 12.0278            2002-06-30 00:00:00.000
708         2001-07-01 00:00:00.000 2002-06-30 00:00:00.000 12.0278            2002-06-30 00:00:00.000
              --Data edited out for clarity---
860         2002-07-01 00:00:00.000 2003-06-30 00:00:00.000 9.7136             2003-06-30 00:00:00.000
860         2003-07-01 00:00:00.000 NULL                    9.1593             2003-06-17 00:00:00.000

Look at this for a moment. It is obvious but has a subtle requirement; the CASE statement must precede the column that contains NULL values. The benefit is that this will always work and requires no additional consideration for the range or contents of the column.

Conclusion: While this may be a simple little trick you’d be surprised how many report writers and data consumers struggle with exactly this problem. The CASE solution can also be expanded to provide table sort orders based on column ranges instead of simply NULL versus non-NULL.
-Dave Winters

Friday, June 4, 2010

VoltDB, the newest database idea?

Why VoltDB? The new database without a problem to solve….

I have been tracking the development and launch of the VoltDB effort for the past year and listened very closely to Michael Stonebreaker’s presentation on “Urban SQL Myth” on June 3rd. This event was scheduled to be a presentation to debunk some of the ideas that the growing NoSQL community is touting as to why NOT to use SQL DBMS platforms. Here are my take-aways on VoltDB and Stonebreaker’s latest DB effort.

On the whole- the presentation was very effective in putting a lot of the FUD from NoSQL pundits to rest. Yes, if your application is purely a K-V(key value) type of data application, then NoSQL products like Cassandra may be exactly what you need. But how many applications are purely and narrowly confined to K-V data models? There are folks that have passionate arguments that RDBMS platforms are at the core K-V oriented … but that is a mute point if the solutions require home-brew wrappers in the application or data tier of the application to support a K-V only engine. A point that Stonebreaker correctly pointed out.

As the presentation unfolded it became very clear that the VoltDB solution and counter to NoSQL is that it can perform Insert/Update/Delete operations at speeds that approach Warp 9! Michael gave a presentation that dissected the overhead and load that consume time in traditional RDBMS offerings. His statement, and I tend to agree, is that only 12% of the work done in systems like Oracle, MS SQL, or MySQL are really “useful” work! The rest is dedicated to log processing for recovery, lock management, buffer latching, and thread or process marshalling. He described these as the “four deadly horsemen” of performance. In his opinion, the only way to get a truly high-speed SQL database is to eliminate ALL of these from the database.

How does VoltDB do this? What is the secret sauce? Well very simply … it is a single threaded engine that processes SQL in a time stamped sequential order. OK given that condition eliminating all the spin locks, buffer management, and housekeeping overhead makes perfect sense. Of course it will process SQL inserts, updates, and deletes at superfast speed!

So what????

Really how many applications are narrowly focused on this problem? Not many! If you look at past RDBMS platforms that are in-memory solutions you’ll find very few that are still independent products used by wide audience of developers and applications. Remember TimesTen or Solid? The first was purchased by Oracle and the latter by IBM. Both have been recast in the roles of auxiliary cache like appliances to help offload the main database. The only remaining independent memory optimized platform is AntsDB/GeneroDB from 4Js.com. Sybase licensed its non-locking technology for use in its platform offerings.

What important point did Dr. Stonebreaker miss?

Query load!! If you analyze typical web or human facing applications they tend to be query (Select) intensive with a small percentage of updates to the database. It is the “window shopping” or browse versus buy scenario. How many queries are required to support a shopper trying to book an airline ticket, hotel room, or rental car? Literally the ratio of selects to updates approaches 100-to-1. So just to be safe let’s say the ratio is 10-to-1 for query versus updates. Well if you accept this premise, and most web app developers will, you’ll see the real problem in trying to speed up and scale up is handling time consuming Select loads. VoltDB makes no claims for performance advantage in this area.

Couple the browse-buy load with the fact that a high percentage of this load is repetitive over mostly static data sets … you quickly understand why many large sites implement query caching architectures based on custom application and data layer software. Look at Facebook, Twitter, Ebay, Google, and Sabre(Travelocity).

So I conclude that VoltDB is targeted a very very narrow segment of the market. A market that already has entries that went down the “memory” database path with mixed success. TimesTen, Solid, and AntsDB to name a few.

Stonebreaker kept saying that legacy RDBMS platforms should be sent to “the land of tired old databases” and VoltDB is something new. Well, I suggest that we ought to send single threaded memory databases to “the land of tired and already tried ideas.”

Dave Winters

Thursday, April 8, 2010

TRUE and FALSE values in SQL Server, BIT datatypes

I had an interesting discussion with my office mate today about return values and the what is TRUE and FALSE. He was wondering what value to put as 'success' on a RETURN statement inside of a stored procedure.

My recommendation to him was to follow the @@ERROR convention of 0 (zero) being the value for success. He rightly asked if 0 did also mean 'false' by convention. That got me into a discussion of the BIT data type in SQL Server and the use of 0 or 1 to indicate true or false.

Well it turns out that SQL Server recoginizes two strings; 'TRUE' and 'FALSE' as valid values that can be assigned and used for comparison of BIT variables. Any other strings will cause a conversion error.

To illustrate the point see the code at the bottom of this entry. Why would you ever use the strings 'TRUE' and 'FALSE' versus 1 or 0?

The answer CLARITY!

Regards, Dave

--
--
declare @theBit bit
set @theBit = 'TRUE'
if @theBit = 'TRUE' print 'It is true'
if @theBit = 'FALSE' print 'It is false'
--
print '--Next case--'
set @theBit = 'FALSE'
if @theBit = 'TRUE' print 'It is true'
if @theBit = 'FALSE' print 'It is false'
--
--This will cause a conversion error
--
set @theBit = 'BLAH'
--