Apache Parquet

Apache Parquet

The Apache Software Foundation
DeepSeek-OCR

DeepSeek-OCR

DeepSeek
+
+

Related Products

  • Google Cloud BigQuery
    2,016 Ratings
    Visit Website
  • TinyPNG
    58 Ratings
    Visit Website
  • MASV
    94 Ratings
    Visit Website
  • CirrusPrint
    2 Ratings
    Visit Website
  • Comet Backup
    218 Ratings
    Visit Website
  • MobiPDF (formerly PDF Extra)
    6,998 Ratings
    Visit Website
  • Google Cloud Platform
    60,934 Ratings
    Visit Website
  • Altium Develop
    1,359 Ratings
    Visit Website
  • Gr4vy
    6 Ratings
    Visit Website
  • QUODD
    1 Rating
    Visit Website

About

We created Parquet to make the advantages of compressed, efficient columnar data representation available to any project in the Hadoop ecosystem. Parquet is built from the ground up with complex nested data structures in mind, and uses the record shredding and assembly algorithm described in the Dremel paper. We believe this approach is superior to simple flattening of nested namespaces. Parquet is built to support very efficient compression and encoding schemes. Multiple projects have demonstrated the performance impact of applying the right compression and encoding scheme to the data. Parquet allows compression schemes to be specified on a per-column level, and is future-proofed to allow adding more encodings as they are invented and implemented. Parquet is built to be used by anyone. The Hadoop ecosystem is rich with data processing frameworks, and we are not interested in playing favorites.

About

DeepSeek-OCR is an open source model for Contexts Optical Compression, built to explore the boundaries of visual-text compression and investigate the role of vision encoders from an LLM-centric viewpoint. It is designed to compress long contexts through optical 2D mapping, using DeepEncoder as the core engine and DeepSeek3B-MoE-A570M as the decoder. DeepEncoder maintains low activations under high-resolution input while achieving high compression ratios, keeping the number of vision tokens manageable for document understanding. The model supports OCR and document parsing workflows for images and PDFs, with inference through vLLM or Transformers. Users can run image OCR with streaming output, process PDFs with high concurrency, or run batch evaluation for benchmarks. DeepSeek-OCR can convert documents to Markdown, perform free OCR without layouts, parse figures, describe images in detail, and locate referenced text inside an image.

Platforms Supported

Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook

Platforms Supported

Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook

Audience

Individuals requiring a columnar storage solution available to any project in the Hadoop ecosystem

Audience

AI researchers and document-processing engineers who need an open OCR model for efficient document parsing, Markdown conversion, and vision-text compression experiments

Support

Phone Support
24/7 Live Support
Online

Support

Phone Support
24/7 Live Support
Online

API

Offers API

API

Offers API

Screenshots and Videos

Screenshots and Videos

Pricing

No information available.
Free Version
Free Trial

Pricing

Free
Free Version
Free Trial

Reviews/Ratings

Overall 0.0 / 5
ease 0.0 / 5
features 0.0 / 5
design 0.0 / 5
support 0.0 / 5

This software hasn't been reviewed yet. Be the first to provide a review:

Review this Software

Reviews/Ratings

Overall 0.0 / 5
ease 0.0 / 5
features 0.0 / 5
design 0.0 / 5
support 0.0 / 5

This software hasn't been reviewed yet. Be the first to provide a review:

Review this Software

Training

Documentation
Webinars
Live Online
In Person

Training

Documentation
Webinars
Live Online
In Person

Company Information

The Apache Software Foundation
Founded: 1999
United States
parquet.apache.org

Company Information

DeepSeek
Founded: 2023
China
github.com/deepseek-ai/DeepSeek-OCR

Alternatives

Apache Iceberg

Apache Iceberg

Apache Software Foundation

Alternatives

GLM-OCR

GLM-OCR

Z.ai
DeepSeek-VL

DeepSeek-VL

DeepSeek
DeepSeek-V2

DeepSeek-V2

DeepSeek
DeepSeek-V4

DeepSeek-V4

DeepSeek

Categories

Categories

Integrations

3LC
APERIO DataWise
Amazon Data Firehose
CSViewer
DeepSeek
Gravity Data
IBM Db2 Event Store
Indexima Data Hub
MLJAR Studio
Mage Platform
PuppyGraph
Querri
Semarchy xDI
Sliq
StarfishETL
Streamkap
Tad
Tictable
Timeplus
e6data

Integrations

3LC
APERIO DataWise
Amazon Data Firehose
CSViewer
DeepSeek
Gravity Data
IBM Db2 Event Store
Indexima Data Hub
MLJAR Studio
Mage Platform
PuppyGraph
Querri
Semarchy xDI
Sliq
StarfishETL
Streamkap
Tad
Tictable
Timeplus
e6data
Claim Apache Parquet and update features and information
Claim Apache Parquet and update features and information
Claim DeepSeek-OCR and update features and information
Claim DeepSeek-OCR and update features and information