The National Security Agency (NSA) has just made a new submission to the Apache Foundation. It’s called Accumulo, and it is a key/value data store based on the BigTable paper. It runs on top of Hadoop, Zookeeper, and Thrift.
IBM Research has developed a hardware and software solution to join 200,000 hard disks together into a single 120 petabyte storage cluster. Here’s an article from ExtremeTech and here’s one from O’Reilly Radar with more details. As of last year, Facebook had the worlds largest Hadoop cluster at 21 petabytes. This IBM cluster is for a customer, likely a government agency.
MapR announced a $20 million second round of funding today. Their aim is to bring Hadoop to the enterprise. MapR will use the new funds to scale their operations. Here’s the Venture Beat article.
Strata describes the process Facebook recently went through to move 30 petabytes of Hadoop data from one data center to another.
If you’re interested in using Hadoop as a tool within your enterprise, it can be quite an endeavor – figuring out what software components you need, what configuration you need, and what hardware it should run on. Lots of people are running different configurations and while the community does share a lot of information, there aren’t many good recaps of hardware being used. Monash Research has a good writeup that also compares how Hadoop hardware has changed over the past couple years.
Scobleizer brings us this interview with Jack Levin talking about using NVidia GPU cards in their Hadoop environment and getting 30X improvement in search performance.
Here’s a post from the Netflix Tech blog talking about their usage of NOsql technologies including Amazon SimpleDB, Hadoop, HBase, and Cassandra. Of special interest was the mention of Datastax, a company offering commercial support for Cassandra.
Here’s a presentation from the QCon conference about the architecture used by Quantcast to process 100s of TB of data daily using Hadoop.

