Check out this Boing Boing post to see how a social graph analysis from OrgNet shows that slumlords work together to avoid maintenance on the buildings they buy and sell.
If you’re in to machine learning and data mining, then Waffles might be just what you need. Waffles is a collection of command line tools for machine learning and data mining.
We’ve written about Kaggle before, but here’s a new post from the New York Times about the site and some new funding they’ve received.
Most discussions about data mining are related to systematically finding patterns or correlations in large data sets. There are also applications for data mining in comparing images to find similarities. Check out the explanation.
By mining the past 30 years of news stories, a new computer system can help identify when and where a revolution is likely to occur. The two main techniques used included sentiment analysis and “full-text geocoding” to match to a location. The process runs on the SGI Altix supercomputer Nautilus. Read the SingularityHub article for more info.
CLIPS (Computational Linguistics & Psycholinguistics) has released a new module for web mining for Python called Patterns.
Quoting from the CLIPS site:
It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks).
Visit their site to get the download.
Check out Mallet, a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
Kaggle has hosted several data mining competitions, similar to the Netflix prize, but recently announced a new and big one. It’s called the Heritage Health Prize and the prize has been set at $3M. The focus on the prize is being able to predict when a person needs to go to the hospital before they actually make a visit. Here’s some more info from O’Reilly Radar. And here is Anthony Goldbloom of Kaggle announcing the contest at the Strata Conference…

