BlogTEX: Blog posts extraction for TREC.

beta
Add a Review
1 Download (This Week)
Last Update:
Download blogtex.zip
Browse All Files
Windows Mac Linux

Description

BlogTEX is an ad-hoc blog posts extraction algorithm written in Java for TREC Blog08 dataset. It includes an optimized sentence model for clearly identifying sentence boundaries in each blog post. Its output can be customized using its config file.

BlogTEX: Blog posts extraction for TREC. Web Site

Update Notifications





Write a Review

User Reviews

Be the first to post a review of BlogTEX: Blog posts extraction for TREC.!

Additional Project Details

Languages

English

Intended Audience

Science/Research

Programming Language

Java

Registered

2011-07-13
Screenshots can attract more users to your project.
Features can attract more users to your project.

Icons must be PNG, GIF, or JPEG and less than 1 MiB in size. They will be displayed as 48x48 images.