PDF Constructor
Using an XML grammar incorporating features of XHTML, CSS, and SVG, PDF Constructor creates single or multiple-page PDF documents using existing or dynamically-created raster, vector, and text content. Build PDFs with content that is ready to go to print. Use CMYK and spot colors. Specify the bleed and trim. Use Type 1, TrueType, or OpenType fonts, always embedded and optionally subset. Produce web or screen-ready documents with bookmarks, hyperlinks, actions, and JavaScript. You can even build complete Acrobat Forms dynamically. Include JPEG and TIFF images in any colorspace and resolution. Apply your choice of transformations to ensure the image fits correctly into your layout. Include SVG drawings directly or by reference. Specify individual pages or entire PDF documents as new content or as a template on which to add new elements. Paragraph and character styles based on CSS2 can be specified for flowable content.
Learn more
pdf2docx
pdf2docx is a Python library that uses PyMuPDF to extract data from PDF files, parse their layouts according to rules, and generate corresponding .docx files via python-docx. It supports conversion of text, images, tables, and other structural elements; it includes tools to extract tables, handle formatting, and preserve layout as much as possible. It offers both a command-line interface and a graphical user interface. The internal architecture is modular; it includes packages for handling pages, layout, tables, images, shape paths, text spans/blocks, and other elements, enabling fine control over how PDF content is mapped into Word documents. Developers can use the API for batch conversions or integrate it into workflows; there's documentation on installation (from PyPI or source), usage, and technical details of layout-parsing, table extraction, and internal modules. The project is open source, hosted on GitHub, and made available under its license with no warranty.
Learn more
Adobe PDF Services API
Create a PDF from Microsoft Office documents, protect the content, and convert to other formats. Programmatically alter a document, such as reordering, inserting, and rotating pages, as well as compressing the file. Access the same cloud-based APIs that power Adobe's end-user applications to quickly deliver scalable, secure solutions. Extract text, images, tables, and more from native and scanned PDFs into a structured JSON file. PDF Extract API leverages AI technology to accurately identify text objects and understand the natural reading order of different elements such as headings, lists, and paragraphs spanning multiple columns or pages. Extract font styles with identification of metadata such as bold and italic text and their position within your PDF. The extracted content is output in a structured JSON file format with tables in CSV or XLSX and images saved as PNG.
Learn more
Filestar
Do anything to any file. Tens of thousands of skills at your fingertips. Quickly convert files in a few clicks. Choose from over 30 000 file conversions. Both common and unusual file formats. Single files or in bulk. Easily merge one or many files at once. Combine files for many different file types. Merge documents, video, audio, Visio or other file formats. Split large files with many pages into several separate ones. For text file formats like .pdf, .doc and .txt. Divide files and documents into parts. Change or alter files. Rotate, add filters, replace file names, add watermarks, add text to images, and much more. One at a time or many at once. Simply compress or reduce the file size of your files. Wide selection of file compression formats and zip options to choose from. Smoothly extract selected pages or elements from a document. Collect images out of a file, or get all images or text from a document.
Learn more