pdf2docx
pdf2docx is a Python library that uses PyMuPDF to extract data from PDF files, parse their layouts according to rules, and generate corresponding .docx files via python-docx. It supports conversion of text, images, tables, and other structural elements; it includes tools to extract tables, handle formatting, and preserve layout as much as possible. It offers both a command-line interface and a graphical user interface. The internal architecture is modular; it includes packages for handling pages, layout, tables, images, shape paths, text spans/blocks, and other elements, enabling fine control over how PDF content is mapped into Word documents. Developers can use the API for batch conversions or integrate it into workflows; there's documentation on installation (from PyPI or source), usage, and technical details of layout-parsing, table extraction, and internal modules. The project is open source, hosted on GitHub, and made available under its license with no warranty.
Learn more
Adobe Acrobat
Adobe Acrobat is a versatile PDF solution that allows users to create, edit, and sign PDF documents seamlessly. Whether you're working on a desktop, mobile device, or online, Acrobat enables you to convert files to and from PDFs, edit text and images, and organize your documents with ease. The platform also offers advanced features like e-signatures, document protection, and PDF comparison, making it ideal for both personal and business use. Acrobat integrates with popular cloud storage services, allowing for easy document sharing and collaboration from anywhere.
Learn more
Filestar
Do anything to any file. Tens of thousands of skills at your fingertips. Quickly convert files in a few clicks. Choose from over 30 000 file conversions. Both common and unusual file formats. Single files or in bulk. Easily merge one or many files at once. Combine files for many different file types. Merge documents, video, audio, Visio or other file formats. Split large files with many pages into several separate ones. For text file formats like .pdf, .doc and .txt. Divide files and documents into parts. Change or alter files. Rotate, add filters, replace file names, add watermarks, add text to images, and much more. One at a time or many at once. Simply compress or reduce the file size of your files. Wide selection of file compression formats and zip options to choose from. Smoothly extract selected pages or elements from a document. Collect images out of a file, or get all images or text from a document.
Learn more
PDFBox
The Apache PDFBox® library is an open-source Java tool for working with PDF documents. This project allows the creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command-line utilities. Apache PDFBox is published under the Apache License v2.0. Extract Unicode text from PDF files. Split a single PDF into many files or merge multiple PDF files. Extract data from PDF forms or fill a PDF form. Validate PDF files against the PDF/A-1b standard. Print a PDF file using the standard Java printing API. Create a PDF from scratch, with embedded fonts and images. Save PDFs as image files, such as PNG or JPEG and digitally sign PDF files. See also the export control information related to the encryption features included in Apache PDFBox.
Learn more