DocWire SDK, a standout C++17/20 data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis.
The upcoming integration of C++17 and C++20 will bring advanced functionalities, particularly in areas like HTTP capabilities and web data extraction.
For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI.
DocWire SDK aims to expand its capabilities, focusing on versatile data extraction, platform support, and seamless integration with various systems.
DocWire SDK is dedicated to streamlining data processing, reducing development time and costs, and harnessing the potential of AI. Its advancements promise a superior experience compared to its predecessor, DocToText.

Features

  • Able to extract/import and export text, images, formatting, and metadata along with annotations
  • Data can be transformed between import and export (filtering, aggregating, etc)
  • Equipped with multiple importers: Microsoft Office new Office Open XML (OOXML): DOCX, XLSX, PPTX files, Microsoft Office old binary formats: DOC, XLS, XLSB, PPT files, OpenOffice/LibreOffice Open Document Format (ODF): ODT, ODS, ODP files, Portable Document Format: PDF files, Webpages: HTML, HTM, and CSS files, Rich Text Format: RTF files, Email formats with attachments: EML files, MS Outlook PST, OST files, Image formats: JPG, JPEG, JFIF, BMP, PNM, PNG, TIFF, WEBP with OCR capabilities, Apple iWork: PAGES, NUMBERS, KEYNOTE files, ODFXML (FODP, FODS, FODT), Archives (ZIP, TAR, RAR, GZ, BZ2, XZ), Scripts and source codes: ASM, ASP, ASPX, BAS, BAT, C, CC, CMAKE, CS, CPP, CXX, D, F, FPP, FS, GO, H, HPP, HXX, JAVA, JS, JSP, LUA, PAS, PHP, PL, PERL, PY, R, SH, TCL, VB, VBS, WS files, XML format family: XML, XSD, XSL files, Comma-Separated Values: CSV files, Other structured text formats: JSON, YML, YAML, RSS, CONF files, Other unstructured text formats: MD, LOG files, DICOM (DCM) as an additional commercial plugin
  • Equipped with multiple exporters: Plain text: Easily extract and export text content. HTML: Export content in HTML format for web use. CSV: Export data to Comma-Separated Values format. XLSX and more are coming: Additional export formats for diverse use cases.
  • Facilitate seamless communication with external HTTP APIs or services, enabling data exchange and integration with external systems
  • Integration with OpenAI API: TranslateTo: Translate text to different languages. Summarize: Generate summarized content from longer texts. ExtractEntities: Extract entities and key information from text. Classify: Perform text classification and categorization. ExtractKeywords: Identify and extract keywords from text. DetectSentiment: Analyze and detect sentiment in text. AnalyzeData: Perform data analysis on text content. Chat: Conduct chat-based interactions and conversations.
  • Equipped with a high grade, scriptable and trainable OCR that has LSTM neural networks based character recognition
  • Incremental parsing returning data as soon as they are available
  • Cross platform: Linux, Windows, MacOSX (and more is coming)
  • Can be embeded in your application (SDK)
  • Can be integrated with other data mining and data analytics applications
  • Parsing process can be easily designed by connecting objects with pipe | operator into a chain
  • Parsing chain elements communication based on Boost Signals
  • Custom parsing chain elements can be added (importers, transformers, exporters)
  • Small binaries, fast native C++ code

Project Activity

See All Activity >

License

GNU General Public License version 2.0 (GPLv2)

Follow DocWire SDK

DocWire SDK Web Site

Other Useful Business Software
Manage your entire team in one app Icon
Manage your entire team in one app

With Connecteam you can manage every aspect of your business on the go, no workstation needed.

Connecteam is an award-winning all-in-one employee management solution for daily operations, communications, and human resource management.
Rate This Project
Login To Rate This Project

User Ratings

★★★★★
★★★★
★★★
★★
1
0
0
1
0
ease 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 3 / 5
features 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 3 / 5
design 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 3 / 5
support 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 3 / 5

User Reviews

  • hard to build. Hard to use.
  • Great software if set up correctly!
Read more reviews >

Additional Project Details

Operating Systems

Linux, Mac, Windows

Intended Audience

Advanced End Users, Developers, End Users/Desktop

User Interface

Command-line

Programming Language

C++

Related Categories

C++ Text Processing Software, C++ Libraries, C++ Data Recovery Software, C++ OCR Software, C++ Data Analytics Tool

Registered

2008-10-29