With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.
You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Try free now
Cloud tools for web scraping and data extraction
Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.
Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
iText is a free open source Java-PDF library released on SF under the MPL/LGPL; iText comes with a simple GUI: the iText toolbox. The original developers of iText want to publish this toolbox as a separate project under the more permissive MIT license.
Java to PDF. For now, it only has a function to combine multiple image files (.jpg, .bmp or .png) into a single pdf file. Tools: NetBeans 6.9 and JDK 1.6
ElateXam is a complete toolsuite for electronic exams. It includes several task types (multiple choice, cloze texts, free texts, mapping, drawing, autotool), correction tools, analysis and export features. It's used at the university of Leipzig.
For companies looking to automate their consolidation and financial statement function
The software is cloud based and automates complexities around consolidating and reporting for groups with multiple year ends, currencies and ERP systems with a slice and dice approach to reporting. While retaining the structure, control and validation needed in a financial reporting tool, we’ve managed to keep things flexible.
This project creates a command line java application that uses OpenOffice.org in a headless mode to convert a document to the pdf file format. The source document had to be in a file format that OpenOffice.org can open.
Andorid PDF Viewer is a viewer for PDF-Files on ANDROID mobiles. The implementation will be a port of the pdf-renderer which is published by SUN under the LGPL: https://pdf-renderer.dev.java.net/. The first version will be very slow, so do not hurry...
SplitPDF -SplitPDF.jar- is a ‘command-line driven’ Java-program, it splits a PDF-file by bookmarks into separated PDF’s. The bookmark is used as title for the newly created PDF. Extremely usefull and fast in a batch processing environment.
ANts P2P realizes a third generation P2P net. It protects your privacy while you are connected and makes you not trackable, hiding your identity (ip) and crypting everything you are sending/receiving from others.
The COSMAT project provides a RESTful web service named COSMATService, that extracts data comming from a pdf file and translates the content to several languages. The returnes extractions and tranbslations are encoded in the tei format.
A Java library for rendering forms on PDF (may be extended for other formats), based on a Template File (PDF or other type), and an XML description of contents. This library uses the iText package (http://www.lowagie.com/iText/) for PDF manipulation.
Autshumato PTE (PDF Text Extractor) is a utility application which extracts the text from PDF documents with the aim of making it translatable. It is also able to extract the pages of the PDF document as PNG images.
CNV Workshop is a web-enabled platform for analyzing genome variation such as copy number variation (CNV). Learn about CNV Workshop in our associated BMC Bioinformatics manuscript: http://www.biomedcentral.com/1471-2105/11/74
Booletin es un buscador de Boletines oficiales (BOE, BOCM, etc.), que incluye un sistema de alertas por correo electrónico. Utiliza Apache Lucene para indexar el contenido en pdf de los boletines oficiales de España.
Reporting engine library written in C. Create one XML file and generate PDF, HTML, TXT, and CSV reports based on queries. Has support for MySQL, PostgreSQL, ODBC. Bindings for PHP, Java, Python.
PODR is a PHP mailmerging and converting library mostly designed to parse and convert ODT templates to DOC/PDF. Templating is based on Savant, Conversion uses a webservice of JODConverter. A filter is available to include runtime generated images.
Extractor y organizador de tablas horarias en documentos pdf. El objetivo general es la extracción de tablas de un documento pdf, y el objetivo específico es manejar las planificaciones de la Facultad de Químicas de Oviedo. (PFC de EUITIO)
The JODConverterService is written as a WCF application and provides functionality to convert documents such as .eml, .doc(x), .xls(x), etc. to the PDF format by using the Java library JODConverter which uses a service instance of OpenOffice.org.
Note as of 2013-09-13: I'm moving this project over to github due to this:
http://www.gluster.org/2013/08/how-far-the-once-mighty-sourceforge-has-fallen/
Feel free to rejoin the more updated versions on
https://github.com/mnott/PDFOCRWrapper
Thanks.
Matthias
--
This is a wrapper written in Java that allows to recursively iterate a directory structure and call an OCR engine on each found PDF on the condition that it hat not yet been called for that PDF. It works well with the ABBYY OCR Engine for Linux.
Toolkit e-formulieren is een opensource toolkit voor het op een gebruikersvriendelijke manier kunnen maken en onderhouden van e-formulieren.
De Toolkit maakt gebruik van Orbeon, en ondersteunt XForms-compliant e-formulieren, evt. met voorinvulling.