MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
Start Free
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.
You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Extracts plain text from a variety of different file types
TextExtractor extracts plain text from hundreds of different file types, storing the text extracted in suitably named text files.
TextExtractor 1.10 works in six different modes :-
Instant Mode - Just select any file and extract the text from it.
Batch Mode - Select a group of files and extract the text from all of them in one go.
Polling Mode - Watch a folder location, processing new files as they appear there.
Hierarchical Mode - Extract Text from files in a directory...
ConcatPDF is the tool to concatenate PDF files. It can concatenate, extract, encrypt, decrypt, configure PDF files, convert image files to PDF. GUI version and CUI version are both available.
iText.NET is iText porting on .NET Framework by J#. This library allows you to generate PDF, (X)HTML, XML, RTF files on Microsoft.NET Framework including ASP.NET.
NO LONGER MAINTAINED, NO LONGER SUPPORTED
Xena transforms files into open data formats for long-term digital preservation, encodes content in Base64 and wraps in XML metadata. Formats supported include MBOX, PST, MSG, DOC, XLS, PPT, RTF, PNG, XML, PDF, JPG, TIFF, PCX, WAV, MP3 and more.
CoolReader is fast and small cross-platform XML/CSS based eBook reader for desktops and handheld devices. Supported formats: FB2, TXT, RTF, DOC, TCR, HTML, EPUB, CHM, PDB, MOBI. Platforms: Win32, Linux, Android. Ported on some eInk based devices.
A RESTFul/JSON Web Service for text and metata extraction
An open source RESTFul Web Service for text , meta-data extraction and analysis.
oss-text-extractor supports various binary formats:
Word processor (doc, docx, odt, rtf)
Spreadsheet (xls, xlsx, ods)
Presentation (ppt, pptx, odp)
Publishing (pdf, pub)
Web (rss, html/xhtml)
Medias (audio, images)
Others (vsd, text)
The goal of this project is to provide a reusable library to transform common file formats to content objects and ContentProvider plugins to common file repositories like Filesystem, CMIS and others for iQser GIN Semantic Middleware (www.iqser.com).
Regain is a Java search engine based on Jakarta Lucene. It provides indexing and searching files for plenty of formats (HTML,XML,doc(x),xls(x),ppt(x),oo,PDF,RTF,mp3,mp4,Java). A TagLibrary eases integrating search results in your JSP based web page.
NRtfTree library is a set of classes written entirely in C# which may be used to manage (read and write) RTF documents in your own applications.
A java port of the library can be found in http://www.sgoliver.net/blog/?page_id=92
With MajiX you can automatically transform RTF files (Microsoft Word files) in XML. MajiX is Java compliant. You can convert Headings, lists (numbered or not), tables, bold, italics, underline and some more.
With maps, space and desk management, distance planning, analytics, and more, returning to the office is easier than ever.
Whether you want to make it easier to find, book meeting rooms or search and reserve shared desks, Robin empowers office managers and employees alike to make the office work for them, and not the other way around.
File Type Checker checks the file data to determine the actual file type. As of this writing filetypechecker supports doc, rtf, xls, pdf, jpg, jpeg, and gif. more file support will be added soon.
Calenco is a Web collaborative platform that enable remote teams of writers, proofreader, graphic designers, translators, etc. to produce together XML documents like user guides, security procedures, etc.
JODConverter automates conversions between office document formats using OpenOffice.org. Supported formats include OpenDocument, PDF, RTF, Word, Excel, PowerPoint, and Flash. It can be used as a Java library, a command line tool, or a Web application.
Includes tools for creating ebooks in xml-format. xTrans helps in creating an XML-Ebook from plain text like RTF, TXT. XTrans converts xml-ebooks into the final format like PDF, HTML, RTF, PDB (various forms), ...
TagPrint is a DOM serialization library. It prints DOM documents with various format, such as XML, HTML, PDF, RTF, etc... You can write these documents very easily.