And here is the Techcrunch commentary.
Check out Mallet, a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
Learn about what you can do with the open source language, Processing, to build visualizations.
Here’s a presentation from the QCon conference about the architecture used by Quantcast to process 100s of TB of data daily using Hadoop.
Cassandra is an open sourced distributed database that’s part of the Apache project. It was originally developed at Facebook. Twitter has announced that they will continue to use MySQL to store tweets but will be using Cassandra to develop a real-time analytics capability. Read the rest in the Techcrunch article.
As the Open Source software movement continues the strengthen, questions abound about where the opportunities to create commercially viable solutions. Red Hat did it with Linux. Can Cloudera do it with Hadoop? Read this GigaOm article.
Check out this Techcrunch article highlighting Myspace’s new recommendation system, Qizmt. It’s built on a mapreduce framework.
Here’s a video of Tom White from the Hadoop Summit talking about running Hadoop in the cloud. Tom is the author of Hadoop the Definitive Guide.
NoSQL is about open source, distributed, non-relational databases. At a recent meetup in San Francisco, some of the following new technologies were discussed…
Voldemort
Cassandra
Dynomite
HBase
Hypertable
CouchDB
Here’s a ComputerWorld article with more info.
Axiis is a new open source visualization tool built in Flex.

