Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.
Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
Explore 10,000+ tools
Total Network Visibility for Network Engineers and IT Managers
Network monitoring and troubleshooting is hard. TotalView makes it easy.
This means every device on your network, and every interface on every device is automatically analyzed for performance, errors, QoS, and configuration.
DocInfoRetriever is a Web_based document full-text search engine based on lucene. It allows you to search the contents and metadata of documents . Supported document formats, likes doc, xls, pdf, odt, jpg...etc.,and torrent files.
An easy to use presentation software with focus on PDF documents. External display / projector presentations: PDF, images, text and whiteboard with annotations; intuitive user interface, optimal mouse and pen input handling.
A simple program to view your favorite pdf documents ( or chm ). Select the document from a list to read/view and "Open selected pdf" from the File menu or start reading in the right display panel.
Its a pure Java Reporting Library and Reportdesigner for viewing, printing, exporting (xls, pdf, html, rtf) of professional looking reports. Supports Text-, Image- and Chart-Fields, multiple Grouping, XML-Templates and different Datasources.
Martus Solutions provides seamless budgeting, reporting, and forecasting tools that integrate with accounting systems for real-time financial insights
Martus' collaborative and easy-to-use budgeting and reporting platform will save you hundreds of hours each year. It's designed to make the entire budgeting process easier and create unlimited financial transparency.
Data Import and export framework in JAVA. Data can be exported / imported to and from XML, Excel, PDF, Delimited file (CSV, TAB, User defined delimiter), Database table.
Xuse manages requirements, use cases & other artefacts that drive software design. Xuse focuses on clear documentation & communication. It defines an XML data model for requirements & use cases with XSLT providing multiple derived views: HTML/SVG/PDF.
Note there is no GUI for entering requirements, however another project (https://sourceforge.net/projects/xguse/) will provide a GUI.
ElateXam is a complete toolsuite for electronic exams. It includes several task types (multiple choice, cloze texts, free texts, mapping, drawing, autotool), correction tools, analysis and export features. It's used at the university of Leipzig.
SplitPDF -SplitPDF.jar- is a ‘command-line driven’ Java-program, it splits a PDF-file by bookmarks into separated PDF’s. The bookmark is used as title for the newly created PDF. Extremely usefull and fast in a batch processing environment.
ANts P2P realizes a third generation P2P net. It protects your privacy while you are connected and makes you not trackable, hiding your identity (ip) and crypting everything you are sending/receiving from others.
Axe Credit Portal - ACP- is axefinance’s future-proof AI-driven solution to digitalize the loan process from KYC to servicing, available as a locally hosted or cloud-based software.
Banks, lending institutions
Founded in 2004, axefinance is a global market-leading software provider focused on credit risk automation for lenders looking to provide an efficient, competitive, and seamless omnichannel financing journey for all client segments (FI, Retail, Commercial, and Corporate.)
A Java library for rendering forms on PDF (may be extended for other formats), based on a Template File (PDF or other type), and an XML description of contents. This library uses the iText package (http://www.lowagie.com/iText/) for PDF manipulation.
Autshumato PTE (PDF Text Extractor) is a utility application which extracts the text from PDF documents with the aim of making it translatable. It is also able to extract the pages of the PDF document as PNG images.
CNV Workshop is a web-enabled platform for analyzing genome variation such as copy number variation (CNV). Learn about CNV Workshop in our associated BMC Bioinformatics manuscript: http://www.biomedcentral.com/1471-2105/11/74
Note as of 2013-09-13: I'm moving this project over to github due to this:
http://www.gluster.org/2013/08/how-far-the-once-mighty-sourceforge-has-fallen/
Feel free to rejoin the more updated versions on
https://github.com/mnott/PDFOCRWrapper
Thanks.
Matthias
--
This is a wrapper written in Java that allows to recursively iterate a directory structure and call an OCR engine on each found PDF on the condition that it hat not yet been called for that PDF. It works well with the ABBYY OCR Engine for Linux.
Toolkit e-formulieren is een opensource toolkit voor het op een gebruikersvriendelijke manier kunnen maken en onderhouden van e-formulieren.
De Toolkit maakt gebruik van Orbeon, en ondersteunt XForms-compliant e-formulieren, evt. met voorinvulling.
CSSToXSLFO is a conversion utility from CSS2 to XSL-FO, which can be converted to PDF, PostScript, etc. It has special support for XHTML. The tool has a number of page-related CSS extensions. It comes with an API in the form of an XML filter.
The PDF Form Generator module currently works with properties files only, but additional formats (such as csv, xml, tab delimited etc) will soon be supported.
Small tool that creates a PDF file with thumbnails of all images in a folder. The number of thumbnails per page along with some other settings can be adjusted. Jpg2Pdf uses the iText library for pdf-generating.
dmachinery is a repository and generation framework for document templates, document building blocks like images and businessrules. Templates and parts may be stored in different technologies, e.g. pdf and xsl-fo. Allows on-demand and batch printing.
Optex Analyzer is a software to analyze and compare algorithms to solve approximately optimization problems. It has a GUI that allows select a set of input files containing raw algorithm results. The analysis is shown with tables and charts.
Agido is an extensible Agile Documentation Tool for Agile Development Projects. The documentation can be written in plain text in a wiki style markup and exported to different output formats such as PDF and HTML.
XTRACT4J V2 is a stand-alone, pure-Java program which creates XML file by dependent or independent SQL queries. It is designed as a drop-in replacement for Oracle Report to generate XML file. It also incorporates BI Publisher to create PDF reports.