hadoop free download - SourceForge

Showing 103 open source projects for "hadoop"

View related business solutions

Linux Clear Filters & Widen Search

Achieve perfect load balancing with a flexible Open Source Load Balancer
Take advantage of Open Source Load Balancer to elevate your business security and IT infrastructure with a custom ADC Solution.

Boost application security and continuity with SKUDONET ADC, our Open Source Load Balancer, that maximizes IT infrastructure flexibility. Additionally, save up to $470 K per incident with AI and SKUDONET solutions, further enhancing your organization’s risk management and cost-efficiency strategies.

Learn More
Visitor Management and Staff Sign In | Sign In App
Sign In App is a modern, enjoyable way to sign in visitors and staff, and book desks and meeting rooms.

Our visitor management system streamlines registration, check-in, and authorization processes, while our facility management tools streamline room booking, resource allocation, and asset management. We prioritize security with our advanced risk mitigation measures, including health and safety protocols, emergency messaging, and robust analytics for thorough auditing.

Learn More
1

ClickHouse

A fast open-source OLAP database management system

ClickHouse® is a fast, open-source column-oriented database management system that can generate analytical data reports through SQL queries in real time. According to several independent benchmarks, it far exceeds other comparable column-oriented database management systems, working even up to 1000 times faster. It is able to process hundreds of millions to more than a billion rows and tens of gigabytes of data per single server per second. Apart from its blazing speed, ClickHouse is highly...

Downloads: 33 This Week

Last Update: 11 hours ago
See Project
2

Apache HBase

Get random, realtime read/write access to your Big Data

... HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options. Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX. Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.

Downloads: 6 This Week

Last Update: 2024-07-24
See Project
3

ANTLR

Parser generator to read, process, or translate structured text

... and Pig, the data warehouse and analysis systems for Hadoop, both use ANTLR. Lex Machina uses ANTLR for information extraction from legal texts. Oracle uses ANTLR within SQL Developer IDE and their migration tools. NetBeans IDE parses C++ with ANTLR. The HQL language in the Hibernate object-relational mapping framework is built with ANTLR.

Downloads: 11 This Week

Last Update: 2024-08-03
See Project
4

syslog-ng

Log management solution that improves the performance of SIEM

... to Hadoop, Elasticsearch, MongoDB, and Kafka as well as many others. syslog-ng flexibly routes log data from X sources to Y destinations. Instead of deploying multiple agents on hosts, organizations can unify their log data collection and management. syslog-ng Store Box provides automated archiving, tamper-proof encrypted storage, granular access controls to protect log data. The largest appliance can store up to 10TB of raw logs.

Downloads: 6 This Week

Last Update: 2024-07-23
See Project
Cybersecurity Management Software for MSPs
Secure your clients from cyber threats.

Define and Deliver Comprehensive Cybersecurity Services. Security threats continue to grow, and your clients are most likely at risk. Small- to medium-sized businesses (SMBs) are targeted by 64% of all cyberattacks, and 62% of them admit lacking in-house expertise to deal with security issues. Now technology solution providers (TSPs) are a prime target. Enter ConnectWise Cybersecurity Management (formerly ConnectWise Fortify) — the advanced cybersecurity solution you need to deliver the managed detection and response protection your clients require. Whether you’re talking to prospects or clients, we provide you with the right insights and data to support your cybersecurity conversation. From client-facing reports to technical guidance, we reduce the noise by guiding you through what’s really needed to demonstrate the value of enhanced strategy.

Learn More
5

Apache Impala

Apache Impala

Impala provides low latency and high concurrency for BI/analytic queries on the Hadoop ecosystem, including Iceberg, open data formats, and most cloud storage options. Impala also scales linearly, even in multitenant environments. Impala is integrated with native Hadoop security and Kerberos for authentication, and via the Ranger module, you can ensure that the right users and applications are authorized for the right data. Utilize the same file and data formats and metadata, security...

Downloads: 0 This Week

Last Update: 2 days ago
See Project
6

Apache Phoenix

Mirror of Apache Phoenix

Apache Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix enables OLTP and operational analytics in Hadoop for low latency applications by combining the best of both worlds. The power of standard SQL and JDBC APIs with full ACID transaction capabilities and the flexibility of late-bound, schema-on-read capabilities from the NoSQL world by leveraging HBase as its backing store. Apache Phoenix is fully...

Downloads: 0 This Week

Last Update: 2024-04-16
See Project
7

SageMaker Spark

A Spark library for Amazon SageMaker

... trained models, and, if you have your own ML algorithms built into SageMaker compatible Docker containers, you can use SageMaker Spark to train and infer on DataFrames with your own algorithms -- all at Spark scale. SageMaker Spark depends on hadoop-aws-2.8.1. To run Spark applications that depend on SageMaker Spark, you need to build Spark with Hadoop 2.8. However, if you are running Spark applications on EMR, you can use Spark built with Hadoop 2.7.

Downloads: 0 This Week

Last Update: 2024-02-22
See Project
8

HugeGraph

A graph database that supports more than 100+ billion data

HugeGraph is a convenient, efficient, and adaptable graph database compatible with the Apache TinkerPop3 framework and the Gremlin query language. HugeGraph supports fast import performance in the case of more than 10 billion Vertices and Edges Graph, millisecond-level OLTP query capability, and can be integrated into big data platforms like Hadoop or Spark for OLAP analysis. The main scenarios of HugeGraph include correlation search, fraud detection, and knowledge graph. Not only supports...

Downloads: 1 This Week

Last Update: 2024-03-22
See Project
9

Luigi

Python module that helps you build complex pipelines of batch jobs

Luigi is a Python (3.6, 3.7, 3.8, 3.9 tested) package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more. The purpose of Luigi is to address all the plumbing typically associated with long-running batch processes. You want to chain many tasks, automate them, and failures will happen. These tasks can be anything, but are typically long running things like Hadoop...

Downloads: 1 This Week

Last Update: 2024-05-20
See Project
Desktop and Mobile Device Management Software
It's a modern take on desktop management that can be scaled as per organizational needs.

Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.

Learn More
10

Apache Drill

Apache Drill is a distributed MPP query layer for self describing data

Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel. Get faster insights without the overhead (data loading, schema creation and maintenance, transformations, etc.) Analyze the multi-structured and nested data in non-relational datastores directly without transforming or restricting the data. Leverage your existing SQL skillsets and BI tools including Tableau...

Downloads: 0 This Week

Last Update: 2024-05-17
See Project
11

Apache Hudi

Upserts, Deletes And Incremental Processing on Big Data

Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides...

Downloads: 0 This Week

Last Update: 2024-06-04
See Project
12

Genie

Distributed Big Data Orchestration Service

Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.

Downloads: 0 This Week

Last Update: 2023-08-17
See Project
13

SeaweedFS

Distributed storage system for blobs, objects, files, and data lake

SeaweedFS is a distributed storage system for blobs, objects, files, and data lake, to store and serve billions of files fast! Blob store has O(1) disk seek, local tiering, cloud tiering. Filer supports cross-cluster active-active replication, Kubernetes, POSIX, S3 API, encryption, Erasure Coding for warm storage, FUSE mount, Hadoop, WebDAV. SeaweedFS is an independent Apache-licensed open source project with its ongoing development made possible because of the community. SeaweedFS is a simple...

Downloads: 1 This Week

Last Update: 2024-07-22
See Project
14

XGBoost

Scalable and Flexible Gradient Boosting

... can be used for Python, Java, Scala, R, C++ and more. It can run on a single machine, Hadoop, Spark, Dask, Flink and most other distributed environments, and is capable of solving problems beyond billions of examples.

Downloads: 1 This Week

Last Update: 2024-07-31
See Project
15

IoTDB

Apache IoTDB

Apache IoTDB (Database for Internet of Things) is an IoT native database with high performance for data management and analysis, deployable on the edge and the cloud. Due to its light-weight architecture, high performance and rich feature set together with its deep integration with Apache Hadoop, Spark and Flink, Apache IoTDB can meet the requirements of massive data storage, high-speed data ingestion and complex data analysis in the IoT industrial fields. In the scene of factories...

Downloads: 0 This Week

Last Update: 2024-06-27
See Project
16

Jupyter Enterprise Gateway

Enables Jupyter Notebooks to share resources across clusters

... that enables the ability to launch kernels on behalf of remote notebooks. This leads to better resource management, as the web server is no longer the single location for kernel activity. It essentially exposes a Kernel as a Service model. By default, the Jupyter framework runs kernels locally - potentially exhausting the server of resources. By leveraging the functionality of the underlying resource management applications like Hadoop YARN, Kubernetes, and others.

Downloads: 0 This Week

Last Update: 2024-02-20
See Project
17

TensorFlowOnSpark

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters

By combining salient features from the TensorFlow deep learning framework with Apache Spark and Apache Hadoop, TensorFlowOnSpark enables distributed deep learning on a cluster of GPU and CPU servers. It enables both distributed TensorFlow training and inferencing on Spark clusters, with a goal to minimize the amount of code changes required to run existing TensorFlow programs on a shared grid.

Downloads: 0 This Week

Last Update: 2024-08-05
See Project
18

JRecord

Read Cobol data files in Java

provide Java Record based IO routines for Fixed Width (including Text, Mainframe, Cobol and Binary) and delimited Flat files via a Record Layout (Cobol, CSV or XML). The source is now available at https://github.com/bmTas/JRecord Projects using JRecord include: * https://github.com/thospfuller/rcoboldi - Cobol File in R * https://github.com/tmalaska/CopybookInputFormat - Cobol files in Hadoop * https://github.com/gss2002/copybook_formatter * https://github.com/gss2002/ftp2hdfs has some...

9 Reviews

Downloads: 29 This Week

Last Update: 2024-08-04
See Project
19

OpenTSDB

A scalable, distributed time series database

OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable. Store and serve massive amounts of time series data without losing granularity. Generate graphs from the GUI, pull from the HTTP API, choose an open source front-end. OpenTSDB...

Downloads: 0 This Week

Last Update: 2021-12-10
See Project
20

spatial-framework-for-hadoop

The Spatial Framework for Hadoop allows developers

The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis. For tools, samples, and tutorials that use this framework, head over to GIS Tools for Hadoop. At the root level of this repository, you can build a single jar with everything in the framework using Apache Ant. Alternatively, you can build a jar at the root level of each framework component. Custom MapReduce jobs that use the Esri Geometry API require...

Downloads: 0 This Week

Last Update: 2023-06-12
See Project
21

geometry-api-java

The Esri Geometry API for Java enables developers to write apps

The Esri Geometry API for Java can be used to enable spatial data processing in 3rd-party data-processing solutions. Developers of custom MapReduce-based applications for Hadoop can use this API for spatial processing of data in the Hadoop system. The API is also used by the Hive UDF’s and could be used by developers building geometry functions for 3rd-party applications such as Cassandra, HBase, Storm and many other Java-based “big data” applications.

Downloads: 1 This Week

Last Update: 2023-06-12
See Project
22

Open Source Data Quality and Profiling

World's first open source data quality & data preparation project

..., Meta Data Discovery, Anomaly Discovery, Data Cleansing, Reporting and Analytic. It also had Hadoop ( Big data ) support to move files to/from Hadoop Grid, Create, Load and Profile Hive Tables. This project is also known as "Aggregate Profiler" Resful API for this project is getting built as (Beta Version) https://sourceforge.net/projects/restful-api-for-osdq/ apache spark based data quality is getting built at https://sourceforge.net/projects/apache-spark-osdq/

8 Reviews

Downloads: 64 This Week

Last Update: 2021-01-20
See Project
23

Custom Apache Big data Distribution

A Custom Apache Distribution including Spark and Hadoop, for Windows.

This Distribution has been customized to work out of the box. So, just download it, and unzip it. Set the Path variables for bin folders, HADOOP_HOME, SPARK_HOME, and JAVA_HOME. That's it..! use Hadoop and Spark natively on Windows.

Downloads: 0 This Week

Last Update: 2020-03-11
See Project
24

Oryx

Lambda architecture on Apache Spark, Apache Kafka for real-time

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large-scale machine learning. It is a framework for building applications but also includes packaged, end-to-end applications for collaborative filtering, classification, regression and clustering. The application is written in Java, using Apache Spark, Hadoop, Tomcat, Kafka, Zookeeper and more. Configuration uses a single Typesafe Config config file, wherein...

Downloads: 0 This Week

Last Update: 2023-08-16
See Project
25

HSRA

Hadoop spliced read aligner for RNA-seq data

HSRA is a MapReduce-based parallel tool for mapping reads from RNA sequencing (RNA-seq) experiments. RNA-seq analyses typically begin by mapping reads to a reference genome in order to determine the location from which the reads were originated, which is a very time-consuming step. This tool allows bioinformatics researchers to efficiently distribute their mapping tasks over the nodes of a cluster by combining a fast multithreaded spliced aligner (HISAT2) with Apache Hadoop, which...

Downloads: 0 This Week

Last Update: 2019-01-23
See Project