Friday, June 4, 2010

VoltDB, the newest database idea?

Why VoltDB? The new database without a problem to solve….

I have been tracking the development and launch of the VoltDB effort for the past year and listened very closely to Michael Stonebreaker’s presentation on “Urban SQL Myth” on June 3rd. This event was scheduled to be a presentation to debunk some of the ideas that the growing NoSQL community is touting as to why NOT to use SQL DBMS platforms. Here are my take-aways on VoltDB and Stonebreaker’s latest DB effort.

On the whole- the presentation was very effective in putting a lot of the FUD from NoSQL pundits to rest. Yes, if your application is purely a K-V(key value) type of data application, then NoSQL products like Cassandra may be exactly what you need. But how many applications are purely and narrowly confined to K-V data models? There are folks that have passionate arguments that RDBMS platforms are at the core K-V oriented … but that is a mute point if the solutions require home-brew wrappers in the application or data tier of the application to support a K-V only engine. A point that Stonebreaker correctly pointed out.

As the presentation unfolded it became very clear that the VoltDB solution and counter to NoSQL is that it can perform Insert/Update/Delete operations at speeds that approach Warp 9! Michael gave a presentation that dissected the overhead and load that consume time in traditional RDBMS offerings. His statement, and I tend to agree, is that only 12% of the work done in systems like Oracle, MS SQL, or MySQL are really “useful” work! The rest is dedicated to log processing for recovery, lock management, buffer latching, and thread or process marshalling. He described these as the “four deadly horsemen” of performance. In his opinion, the only way to get a truly high-speed SQL database is to eliminate ALL of these from the database.

How does VoltDB do this? What is the secret sauce? Well very simply … it is a single threaded engine that processes SQL in a time stamped sequential order. OK given that condition eliminating all the spin locks, buffer management, and housekeeping overhead makes perfect sense. Of course it will process SQL inserts, updates, and deletes at superfast speed!

So what????

Really how many applications are narrowly focused on this problem? Not many! If you look at past RDBMS platforms that are in-memory solutions you’ll find very few that are still independent products used by wide audience of developers and applications. Remember TimesTen or Solid? The first was purchased by Oracle and the latter by IBM. Both have been recast in the roles of auxiliary cache like appliances to help offload the main database. The only remaining independent memory optimized platform is AntsDB/GeneroDB from 4Js.com. Sybase licensed its non-locking technology for use in its platform offerings.

What important point did Dr. Stonebreaker miss?

Query load!! If you analyze typical web or human facing applications they tend to be query (Select) intensive with a small percentage of updates to the database. It is the “window shopping” or browse versus buy scenario. How many queries are required to support a shopper trying to book an airline ticket, hotel room, or rental car? Literally the ratio of selects to updates approaches 100-to-1. So just to be safe let’s say the ratio is 10-to-1 for query versus updates. Well if you accept this premise, and most web app developers will, you’ll see the real problem in trying to speed up and scale up is handling time consuming Select loads. VoltDB makes no claims for performance advantage in this area.

Couple the browse-buy load with the fact that a high percentage of this load is repetitive over mostly static data sets … you quickly understand why many large sites implement query caching architectures based on custom application and data layer software. Look at Facebook, Twitter, Ebay, Google, and Sabre(Travelocity).

So I conclude that VoltDB is targeted a very very narrow segment of the market. A market that already has entries that went down the “memory” database path with mixed success. TimesTen, Solid, and AntsDB to name a few.

Stonebreaker kept saying that legacy RDBMS platforms should be sent to “the land of tired old databases” and VoltDB is something new. Well, I suggest that we ought to send single threaded memory databases to “the land of tired and already tried ideas.”

Dave Winters

Thursday, April 8, 2010

TRUE and FALSE values in SQL Server, BIT datatypes

I had an interesting discussion with my office mate today about return values and the what is TRUE and FALSE. He was wondering what value to put as 'success' on a RETURN statement inside of a stored procedure.

My recommendation to him was to follow the @@ERROR convention of 0 (zero) being the value for success. He rightly asked if 0 did also mean 'false' by convention. That got me into a discussion of the BIT data type in SQL Server and the use of 0 or 1 to indicate true or false.

Well it turns out that SQL Server recoginizes two strings; 'TRUE' and 'FALSE' as valid values that can be assigned and used for comparison of BIT variables. Any other strings will cause a conversion error.

To illustrate the point see the code at the bottom of this entry. Why would you ever use the strings 'TRUE' and 'FALSE' versus 1 or 0?

The answer CLARITY!

Regards, Dave

--
--
declare @theBit bit
set @theBit = 'TRUE'
if @theBit = 'TRUE' print 'It is true'
if @theBit = 'FALSE' print 'It is false'
--
print '--Next case--'
set @theBit = 'FALSE'
if @theBit = 'TRUE' print 'It is true'
if @theBit = 'FALSE' print 'It is false'
--
--This will cause a conversion error
--
set @theBit = 'BLAH'
--

Wednesday, February 10, 2010

Sign a SQL Stored Procedure to track changes.

Several times I've run into situations where I need to deploy a SQL stored procedure to clients that maybe complex and critical to correct functioning of the database.

Things go along fine and then I get a call/Email after about 6 months with a bug or performance issue. After a little detective work I think I detect that a critical procedure as been changed either by the client or consultant working on their site. Verifying that the stored procedure is the orginal unaltered version I placed on the server can lead to some long conversations.

So I came up with a solution that allows me to "sign" or finger print any SP after I've installed it. The technique uses SQL 2005 and above ability to hash data with the HashBytes function. The hash contains date information, object definition, object IDs, and a "secret" word.

The secret word allows you to publish the signing procedures without fear that the end-user will fake a signature. At least as long as you don't tell them the secret word.

The resulting binary hash signature is stored in the extended properties of the procedure and cannot be modified using SSMS.

I've found this very helpful with some of my clients that love to tinker with critical procs.

Dave Winters
Terminus
--------------------------------------------------------------------------
--
--
if object_id('USP_Sign_Object') is not NULL
drop procedure USP_Sign_Object
GO
--------------------------------------------------------------
create procedure USP_Sign_Object
@objectName sysname
,@secretWord nvarchar(20)='My Software'
as
--
-- This procedure builds a hash of the contents of a stored procedure for "finger printing"
-- and detecting changes. The hash/signature is stored in the extended properties of the object.
-- Here are some implementatio details:
-- A "secret" word should be supplied to prevent spoofing.
-- The has includes the object ID, create date, modify date, and first 4000 bytes of the DDL
-- To resign the procedure, the "SIGNED" property must be dropped via SSMS
--
-- Dave Winters, January 2010
-- TheSeldonVault.blogspot.com
--
SET NOCOUNT ON
Begin --Top

declare @objid nvarchar(20),
@crdate nvarchar(20),
@modate nvarchar(20),
@signHash sql_variant

select @objid=cast(object_id as nvarchar(20))
,@crdate=cast(create_date as nvarchar(20))
,@modate=cast(modify_date as nvarchar(20)) from sys.objects where [name]=@objectName

if @@rowcount = 0 return -1

SELECT @signHash = cast( HashBytes(
'SHA1',isnull(substring(
@secretWord+@objid+@crdate+@modate
+OBJECT_DEFINITION(OBJECT_ID (@objectName))
,1,4000 ),'EnCrYpTeD' )
) as sql_variant)

EXEC sys.sp_addextendedproperty @name=N'SIGNED'
,@value=@signHash
,@level0type=N'SCHEMA'
,@level0name=N'dbo'
,@level1type=N'PROCEDURE'
,@level1name=@objectName

End --Bottom
GO
--
--exec USP_Sign_Object 'asp_benefit_warnings','MySeCrEtE'
--
------------------------------------------------------------------------
if object_id('USP_Validate_Object') is not NULL
drop procedure USP_Validate_Object
GO
--
create procedure USP_Validate_Object
@objectName sysname
,@secretWord nvarchar(20)='My Software'
as
SET NOCOUNT ON
Begin --Top

declare @objid nvarchar(20),
@crdate nvarchar(20),
@modate nvarchar(20),
@signHash sql_variant,
@currHash sql_variant

select @objid=cast(object_id as nvarchar(20))
,@crdate=cast(create_date as nvarchar(20))
,@modate=cast(modify_date as nvarchar(20)) from sys.objects where [name]=@objectName

if @@rowcount = 0 return -1

SELECT @signHash = cast( HashBytes('SHA1',isnull(substring(
@secretWord+@objid+@crdate+@modate
+OBJECT_DEFINITION(OBJECT_ID (@objectName))
,1,4000 ),'EnCrYpTeD' )
)
as sql_variant)
--
SELECT @currHash = p.value
FROM sys.all_objects AS sp
INNER JOIN sys.extended_properties AS p ON p.major_id=sp.object_id AND p.minor_id=0 AND p.class=1
where sp.name = @objectName--N'asp_benefit_warnings'
and p.name = 'SIGNED'

--select @currHash as CurrHash, @signHash as CompHash

if @currHash <> @signHash
begin
print '!!!Signature Invalid!!!'
return -1
end

Print '---Signature Valid---'
return 0

End --Bottom

--
--
--exec USP_Validate_Object 'asp_benefit_warnings','MySeCrEtE'
--
--

Monday, January 11, 2010

California accepts Open Source

A policy shift occurred January 7th with the California IT czar.

The CIO has issued a new policy directive that may make it easier for open source platform to compete for the scarce contract dollars available for IT projects.

Oracle and Microsoft dominating the California platform purchase pipeline may still be the norm. My hope is that with this new policy there is some opening of opportunity for good open source offerings and consulting.

Here is the link to the official PDF policy letter:
http://bit.ly/6XTifr

Wednesday, January 6, 2010

Simple XML query and looping

Working with a client's SQL 2005 stored procedure this week I came across a very complicated series of T-SQL. The jest of the procedure is to loop through a series of rows inside of an XML variable passed into the SP. The implmentation was built using OPENXML and Select statements with patterns. I spoke with the maintainers of the SP and it became clear that the functionality was overly complex and that using XML XQuery and XPath was a new skill set.

So I devised this simple XML sniplet that will run and demonstrates some of the basics available when dealing with XML data. This should click with most first time XML database folks accustomed to working with T-SQL. They can then explore the rich feature set available.

Dave W
--
-- Simple example of using XML query.
-- Build to shold some basics of "looping" through a set of XML "rows"
-- Dave Winters 6 Jan 2010
--
declare @x xml
set @x='
<grills franchise="1" regionname="West">
<location locationid="Sacramento">
<listing>MenuItem Pizza </listing>
<listing>MenuItem Salad </listing>
<listing>MenuItem Sushi </listing>
</location>
<location locationid="Auburn">
<listing>MenuItem french fries </listing>
<listing>MenuItem hamburgres </listing>
<listing>MenuItem coke </listing>
</location>
</grills>'
--
-- Get a specific location menu
--
SELECT @x.query('
for $listItem in /grills/location
where $listItem/@locationid="Auburn"
return $listItem
') as Result
--
-- XML fragments for the location ID attribute, not proper form.
-- To make it regular you'll need to put inside one root element
--
SELECT @x.query('
for $listItem in /grills/location
return <t>{$listItem/@locationid}</t>
')
--
-- Chunking some locations blobs
--
SELECT T.c.query('(.)[1]') as LocXML
FROM @x.nodes('/grills/location') T(c)
--
-- Build the master menu list
--
SELECT T.c.value('(./text())[1]', 'varchar(256)') as Master_Menu_List
FROM @x.nodes('/grills/location/listing') T(c)
--
-- XML is much more prowerful than this simple example.
-- This was to show how you can go through an XML object in a fashion similar to
-- processing records in SQL tables.
--