May 01

DBMS2 reports on the launch of mapreduce.org by Aster Data. The site contains lots of good Mapreduce information and links.

Tagged with:
Apr 19

Here’s a post from Joydeep Sen Sarma about the combo of Hbase and Mapreduce.

Tagged with:
Apr 18

A lot is happening these days with open source solutions to data problems and PostgreSQL and Hadoop are both at the center of the solutions. Each offers unique capabilities. Tim Sell from Last.fm has put together some information about how the two can be used together. Check out the slides and video.

Tagged with:
Oct 12

Monash Research lists off how some Cloudera customers are using Hadoop.

Tagged with:
Oct 05

As the Open Source software movement continues the strengthen, questions abound about where the opportunities to create commercially viable solutions. Red Hat did it with Linux. Can Cloudera do it with Hadoop? Read this GigaOm article.

Tagged with:
Jul 09

The guys from Cloudera put together the following executive overview of what Hadoop can do for big data.

Hadoop and Big Data 1: Challenging Old Assumptions from Cloudera on Vimeo.

Tagged with:
May 20

MapReduce has been facing some criticism based on some recent performance tests. Don’t worry. The outcome is basically not to compete with DBMS in areas where DBMS is already good. The article suggests MapReduce should be used to solve the following problem types:

  • text tokenization, indexing, and search
  • Creation of other kinds of data structures (e.g., graphs)
  • Data mining and machine learning
  • Data transformation
Tagged with:
May 20

Cloudera put together this list of 10 MapReduce tips.

You might also want to check out their list of 5 common Hadoop questions.

Tagged with:
Apr 30

Who has the biggest database? Due to the increasing amount of behavioral information tracked during a web browsing session, some internet properties are starting to rack up some pretty hefty databases.

Ebay has a 6.5 petabyte Greenplum warehouse and a 2.5 petabyte Teradata warehouse. This system ingests hundreds of billions of new rows of data every day.
Facebook has a 2.5 petabyte Hadoop system
Yahoo has more than 1 petabyte running on their homemade system

Tagged with:
Apr 25

Cloudera’s online Hadoop training videos now include two sessions for Apache Pig thanks to some help from Alan Gates at Yahoo.

Introduction to Pig
Pig Tutorial

Tagged with:
preload preload preload