Check out Mallet, a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
Here’s the Youtube playlist of the Strata Conference keynote speeches 45 videos in all.
And here’s the link to many of the slide decks.
And here’s a good recap of the StrataConf event.
Kaggle has hosted several data mining competitions, similar to the Netflix prize, but recently announced a new and big one. It’s called the Heritage Health Prize and the prize has been set at $3M. The focus on the prize is being able to predict when a person needs to go to the hospital before they actually make a visit. Here’s some more info from O’Reilly Radar. And here is Anthony Goldbloom of Kaggle announcing the contest at the Strata Conference…
Darren Vengroff, chief scientist at RIchRelevance, explains how he is working to make recommendation systems smarter. Check out the Fast Company article.
Check out the 2010 INFORMS Data Mining Contest. Participants are challenged to predict stock prices at five minute intervals. Visit the site to download the training data set. The submission deadline in October 10th, 2010.
A question posed recently on Quora – How do I become a data scientist? has received tons of interesting and helpful feedback including some recommended steps. Also check out coverage on the topic over at O’Reilly Radar.
SETI (Search for Extra-terrestrial Intelligence) is fairly well-known for their utilization of distributed computing. For decades, they have allowed the home computer user to donate time and computing resources to analyze radio signals hoping to identify signs of extra-terrestrials. This article from O’Reilly Radar details some of the changes SETI has made.

