DCTFinder

Web pages do not offer reliable metadata concerning their creation date and time. However, getting the document creation time is a necessary step for allowing to apply temporal normalization systems to web pages. DCTFinder is a system that parses a web page and extracts from its content the title and the creation date of this web page. DCTFinder combines heuristic title detection, supervised learning with Conditional Random Fields (CRFs) for document date extraction, and rule-based creation time recognition.

DCTFinder is released under CeCILL free software license agreement.

The system is described in the following paper (see 'Files' section):
Xavier Tannier. "Extracting News Web Page Creation Time with DCTFinder". Proceedings of the 9th Language Resources and Evaluation Conference. Reykjavik, Iceland.

Project Activity

See All Activity >

License

Other License

Follow DCTFinder

DCTFinder Web Site

Other Useful Business Software

$300 Free Credits for Your Google Cloud Projects

Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial

Rate This Project

User Reviews

Be the first to post a review of DCTFinder!

Additional Project Details

Intended Audience

Information Technology, Science/Research

Programming Language

Java

Related Categories

Java Information Analysis Software, Java Linguistics Software

Registered

2013-04-06

Similar Business Software

QBench

The modern, flexible, easy-to-use LIMS. QBench enables our customers to get a LIMS up and running faster. Automate your entire lab with our developer-friendly API, Inventory Management, Customer Portal, Billing, and Quality Management System modules. QBench is a cloud-based Laboratory...

See Software
Lockbox LIMS

A sample tracking, test result capture, and inventory management cloud LIMS for life science research, biotech/NGS, and industrial QC labs. Includes regulatory support for CLIA, HIPAA, Part 11, and ISO 17025. Nothing is more critical to a lab’s success than the quality, security, and...

See Software
SAP S/4HANA Cloud Public Edition

SAP Cloud ERP is the premier ERP solution for growth-focused organizations. Seamlessly integrating AI, and predictive analytics, it empowers businesses to digitally transform and streamline processes end to end. Leveraging built-in industry best practices, SAP Cloud ERP accelerates...

See Software
Qualio

Qualio is the leading quality and compliance platform built exclusively for emerging life sciences companies. MedTech, pharma, biotech, and diagnostics teams use Qualio to standardize quality processes, connect them to regulatory obligations, and gain real-time visibility into compliance...

See Software
RegDesk

RegDesk is a Regulatory Information Management System (RIMS) that helps medical device companies manage global regulatory submissions, product registrations, and compliance in one centralized platform. It streamlines regulatory workflows, organizes regulatory data, and provides global regulatory...

See Software
Calira

Calira is an equipment booking and management platform for shared R&D lab equipment. It replaces the shared spreadsheets, Outlook calendars, paper signup sheets, and other improvised systems that most labs use to manage access to shared instruments. Labs use Calira to track instrument...

See Software