Document Summarization / News: Recent posts

Docsum Similar Content

Revision 236 of the project source code reflects an effort to remove similar content from summaries. This is functional in the online demo.

Posted by Terrence Pietrondi 2007-07-16

Document and content ranking in docsum

In past releases, the presentation of document content in a summary has been entirely based on the SQL queries to select all content and locate the the phrase given by the user to match that phrase in the content and present that content amongst other matches as a summary. The problem with this selection is that the database will likely select in the same order each time the content that matches the user query for a summary. And so, relevant content that is not included in the summary due to page length limitations and being selected later then earlier matched content, is not included. This later content might also be more relevant and more valuable to the user requesting the summary. ... read more

Posted by Terrence Pietrondi 2007-04-09

Docsum Demo with Feeds

The demo now supports RSS feeds to be a part of the summary repository. But since the SF demo site does not allow accessing external sources, I've added a few samples via local files (xml). Check out the demo at:

Posted by Terrence Pietrondi 2007-03-14

Docsum 2.0 Released

Docsum 2.0 is available for download. Another release will come soon with some web fixes and code rework, but all functionality will remain. Enjoy. SVN revision for this release is 195.

Posted by Terrence Pietrondi 2007-03-03

Demo 2.0 Fix

The last post mentioned me taking down the SF demo, well I worked around it. You still can't add remote files, but I fixed the local file add/remote issues. The site demo is located at Enjoy.

Posted by Terrence Pietrondi 2007-02-28

Almost 2.0

Docsum is nearing the 2.0 release. The demo on sourceforge will no longer work since I recently realized that "For security reasons, outgoing internet connectivity is disabled on the
shell server." The 2.0 release might be out next month, but if you are interested you can check out the most recent version in the SVN repository. Thanks for listening.

Posted by Terrence Pietrondi 2007-02-28

File deletion in demo

Thanks to some awesome person, the files have been removed from the demo. That's life I guess. I will add them when I get a chance.

Posted by Terrence Pietrondi 2007-02-14

docsum-1.0 Now Available

Docsum 1.0 is now available for download. Please read the documentation available at Send any feedback to Demo URL is now The subversion revision for 1.0 is 156.

Posted by Terrence Pietrondi 2006-09-18

Demo URL Changed (again)

The demo URL has changed to The page used to be named demo.php, I realized this would not make sense when providing as a download.

Posted by Terrence Pietrondi 2006-06-24

Demo URL Changed

The demo URL has changed to The page used to be named demo.php, I realized this would not make sense when providing as a download.

Posted by Terrence Pietrondi 2006-06-17

Add file demo

The demo site how has the ability to attach files via the web. Check it out.

Posted by Terrence Pietrondi 2006-06-07

Demo site up

I've setup a demo site to display what I am doing here, check it out at:

Posted by Terrence Pietrondi 2006-05-23

Check out the SVN

All development can be viewed in the SVN repository. Take a peek. The application is being developed using Python at the moment.

Posted by Terrence Pietrondi 2006-05-19

Docsum Project Opened

The Document Summarization (docsum) project has been opened on SF. This project is still in its planning phase, and many design considerations are still being made. At a high level, this system will be designed to construct summary information from large documents. For example, various research papers would be added to this system, and based on a user query a summary would be produced with relevant information related to the query. Sounding simple, this is expected to be a complex task. Beginning stages will include database design, document handling and user interfaces. Stay tuned.

Posted by Terrence Pietrondi 2006-05-18