Tuesday, December 22, 2009

NULL in SQL databases!!!!!!! What a pain!

I am very passionate about the whole issue surrounding NULL in databases. It is a simple yet exceedingly difficult item.

Let me pass along some ideas and one scenario that I use as guides that has help keep me out of trouble.

1) NULL is NEVER a value! It is the absence of value! Or as EF Codd originally stated: “missing information and inapplicable information”
See: http://en.wikipedia.org/wiki/Null_(SQL)
2) NULL should NEVER EVER be used as a flag in decision logic. Testing for NULL is fine with a narrow constraint that we are checking for NULL state.
3) Setting anything to NULL is not the same as “deleting information.”

Here is a scenario I use to illustrate a simple practical application of these ideas. Assume we have a simple order entry SQL application that has a shipping address form coupled to an address table. Something like:
Create table ShipAddr( CustName varchar(40) null, Street1 varchar(30) null, City varchar(30) null, State varchar(20) null, ZipCode varchar(9) null)

Here is the scenario:

1- Customer orders a product and customer service calls up the shipping address form to fill out the information.
2- The customer supplies all of the information but doesn’t have their Zip code. This field is not updated and the form is saved. The ZipCode field has the original “state” NULL.
3- The next day the customer calls back and supplies the value “35123” as their Zip code. The form is updated and saved. The ZipCode field now has the value “35123”
4- Order fulfillment department runs a check and determines that “35123” is an invalid Zip. The ship address form is updated by deleting all of the values in this field. The form is saved.
IMPORTANT NOTE: Now the ZipCode field has the value “”, or in other words a zero length string!!! Not NULL!

Why not just set it back to NULL? The most important reason is that doing this completely loses the “transitional state” of the field in this record! If you look at classic 3VL and tuple calculus set operations it negates the ability to segregate subsets of the data based on if the value in the field has ever been “touched.” If the field is set back to NULL as a “value” the segregation of records that have never had an address value is lost. How do we also determine that a field in the general sense may actually have a correct value of “”?

Ok at this point you are probably asking yourself … who cares?
Having seen this problem repeated over and over in many SQL implementations I would suggest that this issue is important for several reasons:
1)Maintainability, confusion around NULLs is costly
2) Accuracy, loss of state or ability to validate because of NULL “values” can be expensive
3) Interoperability, NULLs used as values in one table or application may not have the same meaning in another application.

Regards,

Dave W

No comments:

Post a Comment