Tuesday, August 27, 2013
Simple way to get 30% better HDFS read perfromance
Working with a high speed cluster this week doing some benchmarks.
I had assumed that the vendor engineering team was sharp enough to have set up the SLES 11 system with best practices.
Ran my first tests and noticed that the individual datanodes were performing some write activity on a purely read only operation. This was puzzling since I was trying to get maximum read numbers. I double checked my tests to verify I wasn't doing some unintentional write op.
After spending about 30 minutes looking it over it dawned on me to check the file system. Issued a 'mount' command and didn't see the 'noatime' flag set.
Looked at the /etc/fstab and sure enough it was missing.
So I added the noatime flag on each mount and issued a 'mount -o remount'
Reran the tests and pow! About 30% better performance.
What is noatime?
This flag tells Linux to not update the accessed time on the files that make up the HDFS blocks. There is no point in maintaining this information underneath Hadoop.
Dave W
Subscribe to:
Posts (Atom)