Showing 71 open source projects for "simd"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    SIMD

    SIMD

    C++ wrappers for SIMD intrinsics

    SIMD is a C++ library that provides portable abstractions over SIMD (Single Instruction, Multiple Data) instructions, enabling developers to write high-performance vectorized code without dealing directly with architecture-specific intrinsics. SIMD instructions allow a single operation to be applied to multiple data elements simultaneously, significantly accelerating numerical and data-parallel computations.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Simd Library

    Simd Library

    C++ image processing and machine learning library with using of SIMD

    ...The algorithms are optimized with using of different SIMD CPU extensions. In particular, the library supports the following CPU extensions: SSE, AVX, AVX-512, and AMX for x86/x64, and NEON for ARM. The Simd Library has C API and also contains useful C++ classes and functions to facilitate access to C API. The library supports dynamic and static linking, 32-bit and 64-bit Windows and Linux, MSVS, G++ and Clang compilers, MSVS projects, and CMake build systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    HLSL++

    HLSL++

    Math library using HLSL syntax with multiplatform SIMD support

    ...It provides vector, matrix, and math operations with a syntax identical or very similar to HLSL, allowing seamless transition between shader code and application code. The library is optimized for performance and supports SIMD instructions across multiple architectures, including SSE, AVX, AVX2, AVX512, and ARM NEON, ensuring high efficiency on modern hardware. It also extends beyond standard HLSL capabilities by introducing additional features such as quaternion support, advanced matrix operations, and extended vector types like float8. The library is particularly valuable for game developers who need consistency between CPU and GPU computations, reducing errors and improving maintainability.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Google Highway

    Google Highway

    Performance-portable, length-agnostic SIMD with runtime dispatch

    Google Highway is a high-performance C++ library designed to provide portable SIMD (Single Instruction, Multiple Data) vectorization across multiple CPU architectures while maintaining predictable and efficient behavior. It abstracts low-level vector intrinsics into a consistent API that maps closely to hardware instructions, allowing developers to write high-performance code without relying heavily on compiler auto-vectorization.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    LoopVectorization.jl

    LoopVectorization.jl

    Macro(s) for vectorizing loops

    LoopVectorization.jl is a Julia package for accelerating numerical loops by automatically applying SIMD (Single Instruction, Multiple Data) vectorization and other low-level optimizations. It analyzes loops and generates highly efficient code that leverages CPU vector instructions, making it ideal for performance-critical computing in fields such as scientific computing, signal processing, and machine learning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Datalevin

    Datalevin

    A simple, fast and versatile Datalog database

    Datalevin is an open-source Datalog-based database written in Clojure that runs natively on top of LMDB. It supports full ACID transactions, schema-less EDN data storage, vector search with SIMD acceleration, and text-based querying. Usable as an embedded library or as a client/server database with RBAC access, it acts like SQLite or Datomic on a ledger of immutable datoms, plus modern features like vector and full-text search.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    ispc

    ispc

    Intel SPMD Program Compiler

    ...Under the SPMD model, the programmer writes a program that generally appears to be a regular serial program, though the execution model is actually that a number of program instances execute in parallel on the hardware. ispc compiles a C-based SPMD programming language to run on the SIMD units of CPUs and GPUs; it frequently provides a 3x or more speedup on architectures with 4-wide vector SSE units and 5x-6x on architectures with 8-wide AVX vector units, without any of the difficulty of writing intrinsics code. Parallelization across multiple cores is also supported by ispc, making it possible to write programs that achieve performance improvement that scales by both numbers of cores and vector unit size. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Claude-Flow

    Claude-Flow

    The leading agent orchestration platform for Claude

    ...The platform supports both quick swarm tasks and persistent multi-agent sessions known as hives, facilitating distributed AI collaboration with persistent contextual memory. At its core, Claude-Flow integrates Dynamic Agent Architecture (DAA) for self-organizing agent management, neural pattern recognition accelerated by WebAssembly SIMD, and a SQLite-based memory system for context retention and knowledge persistence across tasks. It automates development workflows via pre- and post-operation hooks, providing seamless coordination, code formatting, validation, and performance optimization.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 9
    node-rs

    node-rs

    Node.js bindings Rust crates

    When Node.js meets Rust. Make rust crates binding to Node.js use napi-rs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop vibe-debugging. Icon
    Stop vibe-debugging.

    Plug Claude into your app's actual errors.

    AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.
    Free 30 days.
  • 10
    QSV

    QSV

    Blazing-fast Data-Wrangling toolkit

    qsv is a fast, command-line CSV data toolkit written in Rust that extends the capabilities of xsv. It’s designed to make working with CSV files at scale easy and efficient, offering over 40 powerful subcommands for tasks like querying, sampling, splitting, deduplicating, and more. qsv is ideal for data engineers, analysts, and developers who need high-performance CSV manipulation on the command line.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    torchvision

    torchvision

    Datasets, transforms and models specific to Computer Vision

    The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. We recommend Anaconda as Python package management system. Torchvision currently supports Pillow (default), Pillow-SIMD, which is a much faster drop-in replacement for Pillow with SIMD, if installed will be used as the default. Also, accimage, if installed can be activated by calling torchvision.set_image_backend('accimage'), libpng, which can be installed via conda conda install libpng or any of the package managers for debian-based and RHEL-based Linux distributions, and libjpeg, which can be installed via conda conda install jpeg or any of the package managers for debian-based and RHEL-based Linux distributions. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Sonic JSON

    Sonic JSON

    A blazingly fast JSON serializing & deserializing library

    A blazingly fast JSON serializing & deserializing library, accelerated by JIT (just-in-time compiling) and SIMD (single-instruction-multiple-data).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Polars

    Polars

    Dataframes powered by a multithreaded, vectorized query engine

    Polars is a high-performance, multi-language DataFrame library built in Rust using Apache Arrow. It delivers blazing-fast, vectorized, and parallel data manipulation with both eager and lazy execution, making it an excellent tool for data processing in Python, Rust, Node.js, R, and SQL contexts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    ReverseDiff

    ReverseDiff

    Reverse Mode Automatic Differentiation for Julia

    ReverseDiff is a fast and compile-able tape-based reverse mode automatic differentiation (AD) that implements methods to take gradients, Jacobians, Hessians, and higher-order derivatives of native Julia functions (or any callable object, really). While performance can vary depending on the functions you evaluate, the algorithms implemented by ReverseDiff generally outperform non-AD algorithms in both speed and accuracy.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Compute Library

    Compute Library

    The Compute Library is a set of computer vision and machine learning

    The Compute Library is a set of computer vision and machine learning functions optimized for both Arm CPUs and GPUs using SIMD technologies. The library provides superior performance to other open-source alternatives and immediate support for new Arm® technologies e.g. SVE2.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Numba

    Numba

    NumPy aware dynamic Python compiler using LLVM

    Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. Numba translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN. You don't need to replace the Python interpreter, run a separate compilation step, or even have a C/C++ compiler installed. Just apply one of the Numba decorators to your...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    wllama

    wllama

    WebAssembly binding for llama.cpp - Enabling on-browser LLM inference

    ...Built as a binding for the llama.cpp inference engine, the project allows developers to run LLM models locally without requiring a server backend or dedicated GPU hardware. The library leverages WebAssembly SIMD capabilities to achieve efficient execution within modern browsers while maintaining compatibility across platforms. By running models locally on the user’s device, wllama enables privacy-preserving AI applications that do not require sending data to remote servers. The framework provides both high-level APIs for common tasks such as text generation and embeddings, as well as low-level APIs that expose tokenization, sampling controls, and model state management.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    StringZilla

    StringZilla

    10x faster string search, split, sort, and shuffle for long strings

    ...It matches the first few letters of words with hyper-scalar code to achieve memcpy speeds. The implementation fits into a single C 99 header file and uses different SIMD flavors and SWAR on older platforms. The Str is designed to replace long Python str strings and wrap our C-level API. On the other hand, the File memory-maps a file from persistent memory without loading its copy into RAM. The contents of that file would remain immutable, and the mapping can be shared by multiple Python processes simultaneously. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    simdjson

    simdjson

    Parsing gigabytes of JSON per second

    JSON is everywhere on the Internet. Servers spend a *lot* of time parsing it. We need a fresh approach. The simdjson library uses commonly available SIMD instructions and microparallel algorithms to parse JSON 4x faster than RapidJSON and 25x faster than JSON for Modern C++. The simdjson library uses three-quarters less instructions than state-of-the-art parser RapidJSON. To our knowledge, simdjson is the first fully-validating JSON parser to run at gigabytes per second (GB/s) on commodity processors. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    XNNPACK

    XNNPACK

    High-efficiency floating-point neural network inference operators

    ...Rather than serving as a standalone ML framework, XNNPACK provides high-performance computational primitives—such as convolutions, pooling, activation functions, and arithmetic operations—that are integrated into higher-level frameworks like TensorFlow Lite, PyTorch Mobile, ONNX Runtime, TensorFlow.js, and MediaPipe. The library is written in C/C++ and designed for maximum portability, efficiency, and performance, leveraging platform-specific instruction sets (e.g., NEON, AVX, SIMD) for optimized execution. It supports NHWC tensor layouts and allows flexible striding along the channel dimension to efficiently handle channel-split and concatenation operations without additional cost.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Zerocopy

    Zerocopy

    Zerocopy makes zero-cost memory manipulation effortless

    Zerocopy is a Rust library designed to make zero-cost memory manipulation both safe and effortless. It allows developers to reinterpret or convert raw byte sequences into structured types—and vice versa—without writing unsafe code directly. The crate provides safe abstractions for transmuting data while preserving Rust’s strict safety guarantees, removing the need for manual memory manipulation. Zerocopy introduces a suite of conversion traits such as TryFromBytes, FromBytes, IntoBytes, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    OpenGL Mathematics

    OpenGL Mathematics

    Highly Optimized Graphics Math (glm) for C

    Highly optimized 2D|3D math library, also known as OpenGL Mathematics (glm) for `C`. cglm provides lot of utils to help math operations to be fast and quick to write. It is community-friendly, feel free to bring any issues, bugs you faced. Almost all functions (inline versions) and parameters are documented inside the corresponding headers. OpenGL-related functions are dropped to make this lib platform/third-party independent. Make sure you have the latest version and feel free to report...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    UniSIMD-assembler

    SIMD macro assembler unified for ARM, MIPS, PPC and x86

    UniSIMD assembler is a high-level C/C++ macro assembler framework unified across ARM, MIPS, POWER and x86 architectures. It establishes a subset of both BASE and SIMD instruction sets with clearly defined common API, so that application logic can be written and maintained in one place without code replication. The assembler itself isn't a separate tool, but rather a collection of C/C++ header files, which applications need to include directly in order to use. At present, Intel SSE/SSE2/SSE4 and AVX/AVX2/AVX-512 (32/64-bit x86 ISAs), ARMv7 NEON/NEONv2, ARMv8 AArch32 and AArch64 NEON, SVE (32/64-bit ARM ISAs), MIPS 32/64-bit r5/r6 MSA and POWER 32/64-bit VMX/VSX (little/big-endian ISAs) are mostly implemented (/w horizontal reductions) although scalar improvements, wider SIMD vectors with zeroing/merging predicates in 3/4-operand instructions are planned as extensions to current 2/3-operand SPMD-driven vertical SIMD ISA. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    QuadRay-engine

    Realtime raytracer using SIMD on ARM, MIPS, PPC and x86

    QuadRay engine is a realtime raytracing project aimed at full SIMD utilization on ARM, MIPS, POWER and x86 architectures. The efficient use of SIMD is achieved by processing four rays at a time to match SIMD register width (hence the name). The rendering core of the engine is written in a unified SIMD assembler allowing single assembler code to be compatible with different processor architectures, thus reducing the need to maintain multiple parallel versions. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Ultralight

    Ultralight

    Lightweight, high-performance HTML renderer for game developers

    ...Official API for C and C++, with bindings for more. Render web-content on the GPU via Direct3D, Metal, OpenGL, or your own engine for unmatched visual performance. Render web-content on the CPU via SIMD/parallel for incredibly easy integration with any environment (including server-side!). Ultralight is engineered for peak performance, ensuring minimal CPU and memory usage. Customize low-level platform functionality, integrate JavaScript directly with native code, dive deep into performance tuning, and more. Built for maximum portability, optimized for PCs, game consoles, TVs, and embedded systems.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
Auth0 Logo