Check out this Boing Boing post to see how a social graph analysis from OrgNet shows that slumlords work together to avoid maintenance on the buildings they buy and sell.
If you’re in to machine learning and data mining, then Waffles might be just what you need. Waffles is a collection of command line tools for machine learning and data mining.
We’ve written about Kaggle before, but here’s a new post from the New York Times about the site and some new funding they’ve received.
Most discussions about data mining are related to systematically finding patterns or correlations in large data sets. There are also applications for data mining in comparing images to find similarities. Check out the explanation.
By mining the past 30 years of news stories, a new computer system can help identify when and where a revolution is likely to occur. The two main techniques used included sentiment analysis and “full-text geocoding” to match to a location. The process runs on the SGI Altix supercomputer Nautilus. Read the SingularityHub article for more info.
AOL has a new article on their job board about the Data Scientist, calling the role “the hottest job you haven’t heard of.” While it may be a bit of a buzzword, the idea has been around for a while. Here are a few other references to the term Data Scientist or Data Science…
- Data scientist: The hot new gig in tech – Fortune – September 2011
- EMC Data Scientist Summit
- What is Data Science – O’Reilly Media – June 2010
- Rise of the Data Scientist – FlowingData – June 2009
Check out what LinkedIn founder, Reid Hoffman, has to say to say to Venture Beat about the new world of data. He lists off a few companies that he thinks are making good use of data.
CLIPS (Computational Linguistics & Psycholinguistics) has released a new module for web mining for Python called Patterns.
Quoting from the CLIPS site:
It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks).
Visit their site to get the download.

