With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.
You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Try free now
Cloud tools for web scraping and data extraction
Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.
Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
Note as of 2013-09-13: I'm moving this project over to github due to this:
http://www.gluster.org/2013/08/how-far-the-once-mighty-sourceforge-has-fallen/
Feel free to rejoin the more updated versions on
https://github.com/mnott/PDFOCRWrapper
Thanks.
Matthias
--
This is a wrapper written in Java that allows to recursively iterate a directory structure and call an OCR engine on each found PDF on the condition that it hat not yet been called for that PDF. It works well with the ABBYY OCR Engine for Linux.
Toolkit e-formulieren is een opensource toolkit voor het op een gebruikersvriendelijke manier kunnen maken en onderhouden van e-formulieren.
De Toolkit maakt gebruik van Orbeon, en ondersteunt XForms-compliant e-formulieren, evt. met voorinvulling.
Application to create PDF document on the fly from any source file format(PRN,HTML,TEXT,CSV) with complete mailing system and reports module. It's made over following library Pdf - IText Web Server- Simple Frame work Database- H2
Create and convert pipeline at scale through industry leading SMS campaigns, automation, and conversation management.
TextUs is the leading text messaging service provider for businesses that want to engage in real-time conversations with customers, leads, employees and candidates. Text messaging is one of the most engaging ways to communicate with customers, candidates, employees and leads. 1:1, two-way messaging encourages response and engagement. Text messages help teams get 10x the response rate over phone and email. Business text messaging has become a more viable form of communication than traditional mediums. The TextUs user experience is intentionally designed to resemble the familiar SMS inbox, allowing users to easily manage contacts, conversations, and campaigns. Work right from your desktop with the TextUs web app or use the Chrome extension alongside your ATS or CRM. Leverage the mobile app for on-the-go sending and responding.
CSSToXSLFO is a conversion utility from CSS2 to XSL-FO, which can be converted to PDF, PostScript, etc. It has special support for XHTML. The tool has a number of page-related CSS extensions. It comes with an API in the form of an XML filter.
The PDF Form Generator module currently works with properties files only, but additional formats (such as csv, xml, tab delimited etc) will soon be supported.
Small tool that creates a PDF file with thumbnails of all images in a folder. The number of thumbnails per page along with some other settings can be adjusted. Jpg2Pdf uses the iText library for pdf-generating.
dmachinery is a repository and generation framework for document templates, document building blocks like images and businessrules. Templates and parts may be stored in different technologies, e.g. pdf and xsl-fo. Allows on-demand and batch printing.
FO3D describes an XSL-FO standard compliant method for representing 3D content in the FO documents and provides an exemplary extension for the use with Apache FOP (Version 0.95) - available under the Apache License v2.0 - to create 3D PDF documents.
It's a modern take on desktop management that can be scaled as per organizational needs.
Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.
Optex Analyzer is a software to analyze and compare algorithms to solve approximately optimization problems. It has a GUI that allows select a set of input files containing raw algorithm results. The analysis is shown with tables and charts.
Agido is an extensible Agile Documentation Tool for Agile Development Projects. The documentation can be written in plain text in a wiki style markup and exported to different output formats such as PDF and HTML.
Events in Google calendars form the base for bills, to be completed with rates, text and other items. The program outputs bills in PDF format, which may be mailed. The program offers a simple debitors/creditors accounting.
XTRACT4J V2 is a stand-alone, pure-Java program which creates XML file by dependent or independent SQL queries. It is designed as a drop-in replacement for Oracle Report to generate XML file. It also incorporates BI Publisher to create PDF reports.
TPMitigation is a transparent HTTP-proxy for mitigation of drive-by-malware. Content is converted on-the-fly and/or replaced where there is a risk of a infection by embedded drive-by-maleware. Also visit http://tpmitigation.sourceforge.net/
SPASE Model is a collection of tools for working with the structured data model information. Tools can convert the relational version of the data model into various expressions, including XSD, XMI and PDF documentation.
A system to help with the distribution of Watchtower and Awake! magazines within congregations of Jehovah's Witnesses. The system builds on the freely available Express-C Edition of IBM DB2 Database. There are scripts available for setting up the db
Automatically embed Wikipedia topic information into PDF documents via pop up annotations. This relies on the Wikipedia Miner service that is also available on Sourceforge.
pdfText is an open source library for creating and manipulating pdf files in Java. It's an iText project fork based on the 2.17 branch version. It's a full LGPL library and could be freely linked to any closed source or commercial project.
Small and simple java library for working with Jasper Reports dynamically, enabling dynamic column creation and dynamic data sets using Apache DynaBeans. Project is developed by people at small software company called Softberries www.softberries.com