Showing 103 open source projects for "hadoop"

View related business solutions
  • Achieve perfect load balancing with a flexible Open Source Load Balancer Icon
    Achieve perfect load balancing with a flexible Open Source Load Balancer

    Take advantage of Open Source Load Balancer to elevate your business security and IT infrastructure with a custom ADC Solution.

    Boost application security and continuity with SKUDONET ADC, our Open Source Load Balancer, that maximizes IT infrastructure flexibility. Additionally, save up to $470 K per incident with AI and SKUDONET solutions, further enhancing your organization’s risk management and cost-efficiency strategies.
  • Visitor Management and Staff Sign In | Sign In App Icon
    Visitor Management and Staff Sign In | Sign In App

    Sign In App is a modern, enjoyable way to sign in visitors and staff, and book desks and meeting rooms.

    Our visitor management system streamlines registration, check-in, and authorization processes, while our facility management tools streamline room booking, resource allocation, and asset management. We prioritize security with our advanced risk mitigation measures, including health and safety protocols, emergency messaging, and robust analytics for thorough auditing.
  • 1
    ClickHouse

    ClickHouse

    A fast open-source OLAP database management system

    ClickHouse® is a fast, open-source column-oriented database management system that can generate analytical data reports through SQL queries in real time. According to several independent benchmarks, it far exceeds other comparable column-oriented database management systems, working even up to 1000 times faster. It is able to process hundreds of millions to more than a billion rows and tens of gigabytes of data per single server per second. Apart from its blazing speed, ClickHouse is highly...
    Downloads: 33 This Week
    Last Update:
    See Project
  • 2
    Apache HBase

    Apache HBase

    Get random, realtime read/write access to your Big Data

    ... HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options. Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX. Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    ANTLR

    ANTLR

    Parser generator to read, process, or translate structured text

    ... and Pig, the data warehouse and analysis systems for Hadoop, both use ANTLR. Lex Machina uses ANTLR for information extraction from legal texts. Oracle uses ANTLR within SQL Developer IDE and their migration tools. NetBeans IDE parses C++ with ANTLR. The HQL language in the Hibernate object-relational mapping framework is built with ANTLR.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 4
    syslog-ng

    syslog-ng

    Log management solution that improves the performance of SIEM

    ... to Hadoop, Elasticsearch, MongoDB, and Kafka as well as many others. syslog-ng flexibly routes log data from X sources to Y destinations. Instead of deploying multiple agents on hosts, organizations can unify their log data collection and management. syslog-ng Store Box provides automated archiving, tamper-proof encrypted storage, granular access controls to protect log data. The largest appliance can store up to 10TB of raw logs.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Cybersecurity Management Software for MSPs Icon
    Cybersecurity Management Software for MSPs

    Secure your clients from cyber threats.

    Define and Deliver Comprehensive Cybersecurity Services. Security threats continue to grow, and your clients are most likely at risk. Small- to medium-sized businesses (SMBs) are targeted by 64% of all cyberattacks, and 62% of them admit lacking in-house expertise to deal with security issues. Now technology solution providers (TSPs) are a prime target. Enter ConnectWise Cybersecurity Management (formerly ConnectWise Fortify) — the advanced cybersecurity solution you need to deliver the managed detection and response protection your clients require. Whether you’re talking to prospects or clients, we provide you with the right insights and data to support your cybersecurity conversation. From client-facing reports to technical guidance, we reduce the noise by guiding you through what’s really needed to demonstrate the value of enhanced strategy.
  • 5
    Apache Impala

    Apache Impala

    Apache Impala

    Impala provides low latency and high concurrency for BI/analytic queries on the Hadoop ecosystem, including Iceberg, open data formats, and most cloud storage options. Impala also scales linearly, even in multitenant environments. Impala is integrated with native Hadoop security and Kerberos for authentication, and via the Ranger module, you can ensure that the right users and applications are authorized for the right data. Utilize the same file and data formats and metadata, security...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Apache Phoenix

    Apache Phoenix

    Mirror of Apache Phoenix

    Apache Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix enables OLTP and operational analytics in Hadoop for low latency applications by combining the best of both worlds. The power of standard SQL and JDBC APIs with full ACID transaction capabilities and the flexibility of late-bound, schema-on-read capabilities from the NoSQL world by leveraging HBase as its backing store. Apache Phoenix is fully...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    SageMaker Spark

    SageMaker Spark

    A Spark library for Amazon SageMaker

    ... trained models, and, if you have your own ML algorithms built into SageMaker compatible Docker containers, you can use SageMaker Spark to train and infer on DataFrames with your own algorithms -- all at Spark scale. SageMaker Spark depends on hadoop-aws-2.8.1. To run Spark applications that depend on SageMaker Spark, you need to build Spark with Hadoop 2.8. However, if you are running Spark applications on EMR, you can use Spark built with Hadoop 2.7.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    HugeGraph

    HugeGraph

    A graph database that supports more than 100+ billion data

    HugeGraph is a convenient, efficient, and adaptable graph database compatible with the Apache TinkerPop3 framework and the Gremlin query language. HugeGraph supports fast import performance in the case of more than 10 billion Vertices and Edges Graph, millisecond-level OLTP query capability, and can be integrated into big data platforms like Hadoop or Spark for OLAP analysis. The main scenarios of HugeGraph include correlation search, fraud detection, and knowledge graph. Not only supports...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Luigi

    Luigi

    Python module that helps you build complex pipelines of batch jobs

    Luigi is a Python (3.6, 3.7, 3.8, 3.9 tested) package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more. The purpose of Luigi is to address all the plumbing typically associated with long-running batch processes. You want to chain many tasks, automate them, and failures will happen. These tasks can be anything, but are typically long running things like Hadoop...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Desktop and Mobile Device Management Software Icon
    Desktop and Mobile Device Management Software

    It's a modern take on desktop management that can be scaled as per organizational needs.

    Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.
  • 10
    Apache Drill

    Apache Drill

    Apache Drill is a distributed MPP query layer for self describing data

    Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel. Get faster insights without the overhead (data loading, schema creation and maintenance, transformations, etc.) Analyze the multi-structured and nested data in non-relational datastores directly without transforming or restricting the data. Leverage your existing SQL skillsets and BI tools including Tableau...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Apache Hudi

    Apache Hudi

    Upserts, Deletes And Incremental Processing on Big Data

    Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Genie

    Genie

    Distributed Big Data Orchestration Service

    Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    SeaweedFS

    SeaweedFS

    Distributed storage system for blobs, objects, files, and data lake

    SeaweedFS is a distributed storage system for blobs, objects, files, and data lake, to store and serve billions of files fast! Blob store has O(1) disk seek, local tiering, cloud tiering. Filer supports cross-cluster active-active replication, Kubernetes, POSIX, S3 API, encryption, Erasure Coding for warm storage, FUSE mount, Hadoop, WebDAV. SeaweedFS is an independent Apache-licensed open source project with its ongoing development made possible because of the community. SeaweedFS is a simple...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    XGBoost

    XGBoost

    Scalable and Flexible Gradient Boosting

    ... can be used for Python, Java, Scala, R, C++ and more. It can run on a single machine, Hadoop, Spark, Dask, Flink and most other distributed environments, and is capable of solving problems beyond billions of examples.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    IoTDB

    IoTDB

    Apache IoTDB

    Apache IoTDB (Database for Internet of Things) is an IoT native database with high performance for data management and analysis, deployable on the edge and the cloud. Due to its light-weight architecture, high performance and rich feature set together with its deep integration with Apache Hadoop, Spark and Flink, Apache IoTDB can meet the requirements of massive data storage, high-speed data ingestion and complex data analysis in the IoT industrial fields. In the scene of factories...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Jupyter Enterprise Gateway

    Jupyter Enterprise Gateway

    Enables Jupyter Notebooks to share resources across clusters

    ... that enables the ability to launch kernels on behalf of remote notebooks. This leads to better resource management, as the web server is no longer the single location for kernel activity. It essentially exposes a Kernel as a Service model. By default, the Jupyter framework runs kernels locally - potentially exhausting the server of resources. By leveraging the functionality of the underlying resource management applications like Hadoop YARN, Kubernetes, and others.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    TensorFlowOnSpark

    TensorFlowOnSpark

    TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters

    By combining salient features from the TensorFlow deep learning framework with Apache Spark and Apache Hadoop, TensorFlowOnSpark enables distributed deep learning on a cluster of GPU and CPU servers. It enables both distributed TensorFlow training and inferencing on Spark clusters, with a goal to minimize the amount of code changes required to run existing TensorFlow programs on a shared grid.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    JRecord

    Read Cobol data files in Java

    provide Java Record based IO routines for Fixed Width (including Text, Mainframe, Cobol and Binary) and delimited Flat files via a Record Layout (Cobol, CSV or XML). The source is now available at https://github.com/bmTas/JRecord Projects using JRecord include: * https://github.com/thospfuller/rcoboldi - Cobol File in R * https://github.com/tmalaska/CopybookInputFormat - Cobol files in Hadoop * https://github.com/gss2002/copybook_formatter * https://github.com/gss2002/ftp2hdfs has some...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 19
    OpenTSDB

    OpenTSDB

    A scalable, distributed time series database

    OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable. Store and serve massive amounts of time series data without losing granularity. Generate graphs from the GUI, pull from the HTTP API, choose an open source front-end. OpenTSDB...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    spatial-framework-for-hadoop

    spatial-framework-for-hadoop

    The Spatial Framework for Hadoop allows developers

    The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis. For tools, samples, and tutorials that use this framework, head over to GIS Tools for Hadoop. At the root level of this repository, you can build a single jar with everything in the framework using Apache Ant. Alternatively, you can build a jar at the root level of each framework component. Custom MapReduce jobs that use the Esri Geometry API require...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    geometry-api-java

    geometry-api-java

    The Esri Geometry API for Java enables developers to write apps

    The Esri Geometry API for Java can be used to enable spatial data processing in 3rd-party data-processing solutions. Developers of custom MapReduce-based applications for Hadoop can use this API for spatial processing of data in the Hadoop system. The API is also used by the Hive UDF’s and could be used by developers building geometry functions for 3rd-party applications such as Cassandra, HBase, Storm and many other Java-based “big data” applications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Open Source Data Quality and Profiling

    Open Source Data Quality and Profiling

    World's first open source data quality & data preparation project

    ..., Meta Data Discovery, Anomaly Discovery, Data Cleansing, Reporting and Analytic. It also had Hadoop ( Big data ) support to move files to/from Hadoop Grid, Create, Load and Profile Hive Tables. This project is also known as "Aggregate Profiler" Resful API for this project is getting built as (Beta Version) https://sourceforge.net/projects/restful-api-for-osdq/ apache spark based data quality is getting built at https://sourceforge.net/projects/apache-spark-osdq/
    Leader badge
    Downloads: 64 This Week
    Last Update:
    See Project
  • 23

    Custom Apache Big data Distribution

    A Custom Apache Distribution including Spark and Hadoop, for Windows.

    This Distribution has been customized to work out of the box. So, just download it, and unzip it. Set the Path variables for bin folders, HADOOP_HOME, SPARK_HOME, and JAVA_HOME. That's it..! use Hadoop and Spark natively on Windows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Oryx

    Oryx

    Lambda architecture on Apache Spark, Apache Kafka for real-time

    Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large-scale machine learning. It is a framework for building applications but also includes packaged, end-to-end applications for collaborative filtering, classification, regression and clustering. The application is written in Java, using Apache Spark, Hadoop, Tomcat, Kafka, Zookeeper and more. Configuration uses a single Typesafe Config config file, wherein...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    HSRA

    Hadoop spliced read aligner for RNA-seq data

    HSRA is a MapReduce-based parallel tool for mapping reads from RNA sequencing (RNA-seq) experiments. RNA-seq analyses typically begin by mapping reads to a reference genome in order to determine the location from which the reads were originated, which is a very time-consuming step. This tool allows bioinformatics researchers to efficiently distribute their mapping tasks over the nodes of a cluster by combining a fast multithreaded spliced aligner (HISAT2) with Apache Hadoop, which...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next