News and Blog Data 2010

Contains both raw and processed data for news articles and blogs 04/26/10 through 05/29/10.

01-Politico News Raw Processed
02-Fox News Raw Processed
03-NYTimes Politics News Raw Processed
04-NYTimes Business News Raw Processed
05-Digg World_Business Raw Processed
06-Gothamist Raw Processed
07-Washington Post News Raw Processed
08-NY Daily News Raw Processed
09-CNet News Raw Processed
10-ABC Tech News Raw Processed
11-Cnet Blog Raw Processed
12-Engadget Raw Processed

All files Before Processing

Everything Processed Together

To process the raw text files, we used the MC toolkit program developed by a student at the University of Texas

For more information on this and how to use it please visit his websitehere