Home
Name Modified Size InfoDownloads / Week
README.txt 2010-02-06 1.4 kB
parserd.php 2010-02-06 2.7 kB
logger.php 2010-02-06 2.0 kB
Totals: 3 Items   6.1 kB 0
The project is aiming to do two things: First, track the total transit time it takes a news story to 
travel from site A to site Z across the internet (total transit time = t3, get it? :P) 

Additionally, we'll be indexing the content of the article at each hop and tracking the morphing of 
the text of the article (or summaries) as it traverses the web. The goal here is to gather some 
interesting data sets relating to how quickly news flows across the net, and how it changes based on 
the genre of news site it hits (or if it changes at all).

The technologies used are as follows:

Main parsing system: PHP-CLI interfacing with TokyoCabinet / TokyoTyrant Database system
Main Front End (web): PHP
Controller scripts (add/remove/edit news sites we watch): PHP-CLI / Perl

If you are interested in helping the project out with code, database design, or anything that may
be of use, please visit our sourceforge page at: http://www.sourceforge.net/projects/t3study and
get in touch!

Files:

parserd.php - this is the main application that creates a listening daemon on the server for the 
(soon to come) fetching+dumping app to dump its data too.  This daemon will parse out the required
information (author, title, body, post date) from the dumped articles and write them to the DB.

logger.php - This servers as the function includes file for the project.

README.txt - This file :-)

- Mike <mb1689@gmail.com>
Source: README.txt, updated 2010-02-06