Words in Media
A system to harvest facts and their relationships while
eliminating editorial bias from the news in near-real-time.
The original site was a functional demonstration of the Words in Media framework. I've put the resources used to power the site to other projects, but I am making the entire site's codebase, the database with data, and notes for the thesis available to anyone interested:
Misc Stats from the DB dataset:
- Articles scanned: 332,844
- Daily summaries for various words: 205,675
- Word histograms: 4,136,114
- Words encountered: 91,034