Check out this Computer World article. They’ve got some stories about the following companies/organizations and their usage of data:
- Library of Congress
- Amazon.com
- Mazda Motor Corp.
- The Nielsen Company
Check out this Computer World article. They’ve got some stories about the following companies/organizations and their usage of data:
IBM Research has developed a hardware and software solution to join 200,000 hard disks together into a single 120 petabyte storage cluster. Here’s an article from ExtremeTech and here’s one from O’Reilly Radar with more details. As of last year, Facebook had the worlds largest Hadoop cluster at 21 petabytes. This IBM cluster is for a customer, likely a government agency.
Check out this VentureBeat article about Facebook open sourcing the hardware in their data center through the Open Compute Project.
Strata describes the process Facebook recently went through to move 30 petabytes of Hadoop data from one data center to another.
If you’re interested in using Hadoop as a tool within your enterprise, it can be quite an endeavor – figuring out what software components you need, what configuration you need, and what hardware it should run on. Lots of people are running different configurations and while the community does share a lot of information, there aren’t many good recaps of hardware being used. Monash Research has a good writeup that also compares how Hadoop hardware has changed over the past couple years.
Scientists have estimated that to store the sum of all the world’s knowledge would take up 295 exabytes. Check out the BBC article.
Since by definition, derived data is based on lower level raw data, it could be derived again as needed. DBMS2 takes us through some thinking about how better to handle derived data.
Cloudera has formed an integration alliance with Greenplum. Cloudera will integrate their distribution of Hadoop with Greenplum’s Chorus product. Read more at ZDNet.
IBM has just announce they intend to purchase Netezza. Check out the press release.
And check out DBMS2 and Techcrunch for some comments.