Examples of how the enterprise uses Hadoop
12th October 2009 by No CommentsMonash Research lists off how some Cloudera customers are using Hadoop.
Predictive analytics, data mining, business intelligence and more. Information useful to analysts and data people of all kinds
Monash Research lists off how some Cloudera customers are using Hadoop.
As the Open Source software movement continues the strengthen, questions abound about where the opportunities to create commercially viable solutions. Red Hat did it with Linux. Can Cloudera do it with Hadoop? Read this GigaOm article.
The guys from Cloudera put together the following executive overview of what Hadoop can do for big data.
Hadoop and Big Data 1: Challenging Old Assumptions from Cloudera on Vimeo.
MapReduce has been facing some criticism based on some recent performance tests. Don’t worry. The outcome is basically not to compete with DBMS in areas where DBMS is already good. The article suggests MapReduce should be used to solve the following problem types:
text tokenization, indexing, and search
Creation of other kinds of data [...]
Cloudera put together this list of 10 MapReduce tips.
You might also want to check out their list of 5 common Hadoop questions.
Who has the biggest database? Due to the increasing amount of behavioral information tracked during a web browsing session, some internet properties are starting to rack up some pretty hefty databases.
Ebay has a 6.5 petabyte Greenplum warehouse and a 2.5 petabyte Teradata warehouse. This system ingests hundreds of billions of new rows of [...]
Cloudera’s online Hadoop training videos now include two sessions for Apache Pig thanks to some help from Alan Gates at Yahoo.
Introduction to Pig
Pig Tutorial
Pig is an open source platform for analyzing large data sets that works in conjunction with Hadoop clusters and Map-Reduce jobs. They recently announced their 0.20 release featuring a 5X performance gain over the previous version. Check out the details.
Here is a recent Hadoop and Hive presentation from Joydeep Sen Sarma from Facebook delivered at IIT Delhi.
Amazon has announced the public beta of their hosted Hadoop framework. Using Elastic MapReduce, you can quickly launch as much processing power as needed for your analytics task. Data can be stored on the S3 platform. Sign in to the AWS Management Console to kick things off.