Web pages do not offer reliable metadata concerning their creation date and time. However, getting the document creation time is a necessary step for allowing to apply temporal normalization systems to web pages. DCTFinder is a system that parses a web page and extracts from its content the title and the creation date of this web page. DCTFinder combines heuristic title detection, supervised learning with Conditional Random Fields (CRFs) for document date extraction, and rule-based creation time recognition.

DCTFinder is released under CeCILL free software license agreement.

The system is described in the following paper (see 'Files' section):
Xavier Tannier. "Extracting News Web Page Creation Time with DCTFinder". Proceedings of the 9th Language Resources and Evaluation Conference. Reykjavik, Iceland.

Project Activity

See All Activity >

License

Other License

Follow DCTFinder

DCTFinder Web Site

Other Useful Business Software
AI-generated apps that pass security review Icon
AI-generated apps that pass security review

Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
Try Retool free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of DCTFinder!

Additional Project Details

Intended Audience

Information Technology, Science/Research

Programming Language

Java

Related Categories

Java Information Analysis Software, Java Linguistics Software

Registered

2013-04-06