Friday, June 4, 2010

VoltDB, the newest database idea?

Why VoltDB? The new database without a problem to solve….

I have been tracking the development and launch of the VoltDB effort for the past year and listened very closely to Michael Stonebreaker’s presentation on “Urban SQL Myth” on June 3rd. This event was scheduled to be a presentation to debunk some of the ideas that the growing NoSQL community is touting as to why NOT to use SQL DBMS platforms. Here are my take-aways on VoltDB and Stonebreaker’s latest DB effort.

On the whole- the presentation was very effective in putting a lot of the FUD from NoSQL pundits to rest. Yes, if your application is purely a K-V(key value) type of data application, then NoSQL products like Cassandra may be exactly what you need. But how many applications are purely and narrowly confined to K-V data models? There are folks that have passionate arguments that RDBMS platforms are at the core K-V oriented … but that is a mute point if the solutions require home-brew wrappers in the application or data tier of the application to support a K-V only engine. A point that Stonebreaker correctly pointed out.

As the presentation unfolded it became very clear that the VoltDB solution and counter to NoSQL is that it can perform Insert/Update/Delete operations at speeds that approach Warp 9! Michael gave a presentation that dissected the overhead and load that consume time in traditional RDBMS offerings. His statement, and I tend to agree, is that only 12% of the work done in systems like Oracle, MS SQL, or MySQL are really “useful” work! The rest is dedicated to log processing for recovery, lock management, buffer latching, and thread or process marshalling. He described these as the “four deadly horsemen” of performance. In his opinion, the only way to get a truly high-speed SQL database is to eliminate ALL of these from the database.

How does VoltDB do this? What is the secret sauce? Well very simply … it is a single threaded engine that processes SQL in a time stamped sequential order. OK given that condition eliminating all the spin locks, buffer management, and housekeeping overhead makes perfect sense. Of course it will process SQL inserts, updates, and deletes at superfast speed!

So what????

Really how many applications are narrowly focused on this problem? Not many! If you look at past RDBMS platforms that are in-memory solutions you’ll find very few that are still independent products used by wide audience of developers and applications. Remember TimesTen or Solid? The first was purchased by Oracle and the latter by IBM. Both have been recast in the roles of auxiliary cache like appliances to help offload the main database. The only remaining independent memory optimized platform is AntsDB/GeneroDB from Sybase licensed its non-locking technology for use in its platform offerings.

What important point did Dr. Stonebreaker miss?

Query load!! If you analyze typical web or human facing applications they tend to be query (Select) intensive with a small percentage of updates to the database. It is the “window shopping” or browse versus buy scenario. How many queries are required to support a shopper trying to book an airline ticket, hotel room, or rental car? Literally the ratio of selects to updates approaches 100-to-1. So just to be safe let’s say the ratio is 10-to-1 for query versus updates. Well if you accept this premise, and most web app developers will, you’ll see the real problem in trying to speed up and scale up is handling time consuming Select loads. VoltDB makes no claims for performance advantage in this area.

Couple the browse-buy load with the fact that a high percentage of this load is repetitive over mostly static data sets … you quickly understand why many large sites implement query caching architectures based on custom application and data layer software. Look at Facebook, Twitter, Ebay, Google, and Sabre(Travelocity).

So I conclude that VoltDB is targeted a very very narrow segment of the market. A market that already has entries that went down the “memory” database path with mixed success. TimesTen, Solid, and AntsDB to name a few.

Stonebreaker kept saying that legacy RDBMS platforms should be sent to “the land of tired old databases” and VoltDB is something new. Well, I suggest that we ought to send single threaded memory databases to “the land of tired and already tried ideas.”

Dave Winters


Steven Messino said...

Any comments on their implementation on stored procedures before Dynamic SQL?

JohnsTestBlog said...

Hi Dave. Thanks for the post.

You're right that the sites you mentioned do use extensive caching, and many have high read to write ratios. As you say, it's much easier to scale reads than writes. But even if you're only writing 10% of the time, that means a perfect cache can only offer a 10X speedup over whatever system is persisting your writes. If that system is a legacy RDBMS, you'll quickly find the writes become your bottleneck. In fact, most or all of the companies you've specifically mentioned do have serious pain scaling writes. I know because I've talked to many of them about it.

With respect to pure read performance, VoltDB might not be as fast as a 100% hit-rate caching in-memory system, but VoltDB commonly saturates gigabit ethernet (per node) in read-heavy workloads. There's also a huge network advantage to being able to mix reads and writes in a single stored procedure. Many network round trips to a caching layer can often be eliminated this way.

One of our beta users has written a blog post about some of the architectural advantages of building your application with VoltDB:

Finally, a lot of people also lump us together with in-memory legacy databases like TimesTen and Solid. Perhaps in-memory is what jumps out at people first, but we put a lot more emphasis on horizontal scale-out and fault-tolerance as the key characteristics of VoltDB that separate us from legacy RDBMSs. As Mike pointed out in his webinar, we haven't simply changed the storage layer.

-John Hugg
VoltDB Engineering

Post a Comment