With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.
You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Try free now
Cloud tools for web scraping and data extraction
Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.
Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
Andorid PDF Viewer is a viewer for PDF-Files on ANDROID mobiles. The implementation will be a port of the pdf-renderer which is published by SUN under the LGPL: https://pdf-renderer.dev.java.net/. The first version will be very slow, so do not hurry...
SplitPDF -SplitPDF.jar- is a ‘command-line driven’ Java-program, it splits a PDF-file by bookmarks into separated PDF’s. The bookmark is used as title for the newly created PDF. Extremely usefull and fast in a batch processing environment.
ANts P2P realizes a third generation P2P net. It protects your privacy while you are connected and makes you not trackable, hiding your identity (ip) and crypting everything you are sending/receiving from others.
OrangeHRM provides a world-class HRIS experience and offers everything you and your team need to be that HR hero you know that you are.
Give your HR team the tools they need to streamline administrative tasks, support employees, and make informed decisions with the OrangeHRM free and open source HR software.
The COSMAT project provides a RESTful web service named COSMATService, that extracts data comming from a pdf file and translates the content to several languages. The returnes extractions and tranbslations are encoded in the tei format.
A Java library for rendering forms on PDF (may be extended for other formats), based on a Template File (PDF or other type), and an XML description of contents. This library uses the iText package (http://www.lowagie.com/iText/) for PDF manipulation.
Autshumato PTE (PDF Text Extractor) is a utility application which extracts the text from PDF documents with the aim of making it translatable. It is also able to extract the pages of the PDF document as PNG images.
Total Network Visibility for Network Engineers and IT Managers
Network monitoring and troubleshooting is hard. TotalView makes it easy.
This means every device on your network, and every interface on every device is automatically analyzed for performance, errors, QoS, and configuration.
CNV Workshop is a web-enabled platform for analyzing genome variation such as copy number variation (CNV). Learn about CNV Workshop in our associated BMC Bioinformatics manuscript: http://www.biomedcentral.com/1471-2105/11/74
Booletin es un buscador de Boletines oficiales (BOE, BOCM, etc.), que incluye un sistema de alertas por correo electrónico. Utiliza Apache Lucene para indexar el contenido en pdf de los boletines oficiales de España.
Reporting engine library written in C. Create one XML file and generate PDF, HTML, TXT, and CSV reports based on queries. Has support for MySQL, PostgreSQL, ODBC. Bindings for PHP, Java, Python.
PODR is a PHP mailmerging and converting library mostly designed to parse and convert ODT templates to DOC/PDF. Templating is based on Savant, Conversion uses a webservice of JODConverter. A filter is available to include runtime generated images.
Extractor y organizador de tablas horarias en documentos pdf. El objetivo general es la extracción de tablas de un documento pdf, y el objetivo específico es manejar las planificaciones de la Facultad de Químicas de Oviedo. (PFC de EUITIO)
Note as of 2013-09-13: I'm moving this project over to github due to this:
http://www.gluster.org/2013/08/how-far-the-once-mighty-sourceforge-has-fallen/
Feel free to rejoin the more updated versions on
https://github.com/mnott/PDFOCRWrapper
Thanks.
Matthias
--
This is a wrapper written in Java that allows to recursively iterate a directory structure and call an OCR engine on each found PDF on the condition that it hat not yet been called for that PDF. It works well with the ABBYY OCR Engine for Linux.
Toolkit e-formulieren is een opensource toolkit voor het op een gebruikersvriendelijke manier kunnen maken en onderhouden van e-formulieren.
De Toolkit maakt gebruik van Orbeon, en ondersteunt XForms-compliant e-formulieren, evt. met voorinvulling.
Application to create PDF document on the fly from any source file format(PRN,HTML,TEXT,CSV) with complete mailing system and reports module. It's made over following library Pdf - IText Web Server- Simple Frame work Database- H2
CSSToXSLFO is a conversion utility from CSS2 to XSL-FO, which can be converted to PDF, PostScript, etc. It has special support for XHTML. The tool has a number of page-related CSS extensions. It comes with an API in the form of an XML filter.
The PDF Form Generator module currently works with properties files only, but additional formats (such as csv, xml, tab delimited etc) will soon be supported.
Goal of the project is to make Documents readable fór Java Applications. We first will deliver an interface for PDF-Files. Used and related software is "fontbox" and "PDFbox", both under BSD License. Thanks a lot for the basics.
Small tool that creates a PDF file with thumbnails of all images in a folder. The number of thumbnails per page along with some other settings can be adjusted. Jpg2Pdf uses the iText library for pdf-generating.
dmachinery is a repository and generation framework for document templates, document building blocks like images and businessrules. Templates and parts may be stored in different technologies, e.g. pdf and xsl-fo. Allows on-demand and batch printing.