Tuesday, August 27, 2013

Simple way to get 30% better HDFS read perfromance


Working with a high speed cluster this week doing some benchmarks.

I had assumed that the vendor engineering team was sharp enough to have set up the SLES 11 system with best practices.

Ran my first tests and noticed that the individual datanodes were performing some write activity on a purely read only  operation.  This was puzzling since I was trying to get maximum read numbers.  I double checked my tests to verify I wasn't doing some unintentional  write op.

After spending about 30 minutes looking it over it dawned on me to check the file system.  Issued a 'mount' command and didn't see the 'noatime' flag set.

Looked at the /etc/fstab and sure enough it was missing.
So I added the noatime flag on each mount and issued a 'mount -o remount'

Reran the tests and pow! About 30% better performance.

What is noatime?

This flag tells Linux to not update the accessed time on the files that make up the HDFS blocks.  There is no point in maintaining this information underneath Hadoop.

Dave W