hdfs free download - SourceForge

Showing 33 open source projects for "hdfs"

View related business solutions

A CRM and Sales Data Management Platform for Multi-Line Sales Teams
The CRM, sales reporting, and commission tracking tool uniquely tailored to the needs of manufacturers, sales reps, and distributors.

Repfabric is a customer relationship management (CRM) software designed specifically for multi-line sales teams (i.e. reps, distributors, wholesalers, dealers, and manufacturers). It streamlines and simplifies the sales process by providing deep integration with email, contacts, calendars, and deal tracking. The platform enables users to track commissions from CRM to sale, make updates directly from mobile devices, and document sales calls using voice-to-text features.

Learn More
Finance Automation that puts you in charge
Tipalti delivers smart payables that elevate modern business.

Our robust pre-built connectors and our no-code, drag-and-drop interface makes it easy and fast to automatically sync vendors, invoices, and invoice payment data between Tipalti and your ERP or accounting software.

Learn More
1

Apache Druid

A high performance real-time analytics database

Druid is designed for workflows where fast ad-hoc analytics, instant data visibility, or supporting high concurrency is important. As such, Druid is often used to power UIs where an interactive, consistent user experience is desired. Druid streams data from message buses such as Kafka, and Amazon Kinesis, and batch load files from data lakes such as HDFS, and Amazon S3. Druid supports most popular file formats for structured and semi-structured data. Druid has been benchmarked to greatly...

Downloads: 11 This Week

Last Update: 2024-09-05
See Project
2

Apache Impala

Apache Impala

Impala provides low latency and high concurrency for BI/analytic queries on the Hadoop ecosystem, including Iceberg, open data formats, and most cloud storage options. Impala also scales linearly, even in multitenant environments. Impala is integrated with native Hadoop security and Kerberos for authentication, and via the Ranger module, you can ensure that the right users and applications are authorized for the right data. Utilize the same file and data formats and metadata, security, and...

Downloads: 4 This Week

Last Update: 2024-08-20
See Project
3

DVC

Data Version Control | Git for Data & Models

DVC is built to make ML models shareable and reproducible. It is designed to handle large files, data sets, machine learning models, and metrics as well as code. Version control machine learning models, data sets and intermediate files. DVC connects them with code and uses Amazon S3, Microsoft Azure Blob Storage, Google Drive, Google Cloud Storage, Aliyun OSS, SSH/SFTP, HDFS, HTTP, network-attached storage, or disc to store file contents. Version control machine learning models, data sets...

Downloads: 1 This Week

Last Update: 2024-08-30
See Project
4

Apache Drill

Apache Drill is a distributed MPP query layer for self describing data

..., Qlikview, MicroStrategy, Spotfire, Excel and more. Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores. For example, you can join a user profile collection in MongoDB with a directory of event logs in Hadoop.

Downloads: 2 This Week

Last Update: 2024-05-17
See Project
Free and Open Source HR Software
OrangeHRM provides a world-class HRIS experience and offers everything you and your team need to be that HR hero you know that you are.

Give your HR team the tools they need to streamline administrative tasks, support employees, and make informed decisions with the OrangeHRM free and open source HR software.

Learn More
5

CubeFS

cloud-native file store

CubeFS is a new generation cloud-native storage that supports access protocols such as S3, HDFS, and POSIX. It is widely applicable in various scenarios such as big data, AI/LLMs, container platforms, separation of storage and computing for databases and middleware, data sharing and protection, etc. Compatible with various access protocols such as S3, POSIX, HDFS, etc., and the access between protocols can be interoperable. Support replicas and erasure coding engines, users can choose flexibly...

Downloads: 0 This Week

Last Update: 2024-04-23
See Project
6

Apache HBase

Get random, realtime read/write access to your Big Data

... HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options. Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX. Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.

Downloads: 1 This Week

Last Update: 2024-07-24
See Project
7

JuiceFS

JuiceFS is a distributed POSIX file system built on top of Redis

A POSIX, HDFS and S3 compatible distributed file system for cloud. JuiceFS is designed to bring back the gold-old memories and experience of file systems in local disks to the cloud. JuiceFS is POSIX compliant and is fully compatible with HDFS and S3. Cloud app building or migrating, file sharing cross-geo and cross-cloud has become easier than ever before. Whether it's a public cloud, private cloud, or hybrid cloud, JuiceFS is available on any cloud of your choice and delivers flexibility...

Downloads: 0 This Week

Last Update: 2024-09-02
See Project
8

Apache Hudi

Upserts, Deletes And Incremental Processing on Big Data

Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides...

Downloads: 0 This Week

Last Update: 2024-06-04
See Project
9

Luigi

Python module that helps you build complex pipelines of batch jobs

... jobs, dumping data to/from databases, running machine learning algorithms, or anything else. You can build pretty much any task you want, but Luigi also comes with a toolbox of several common task templates that you use. It includes support for running Python mapreduce jobs in Hadoop, as well as Hive, and Pig, jobs. It also comes with file system abstractions for HDFS, and local files that ensures all file system operations are atomic.

Downloads: 0 This Week

Last Update: 2024-09-04
See Project
Automated RMM Tools | RMM Software
Proactively monitor, manage, and support client networks with ConnectWise Automate

Out-of-the-box scripts. Around-the-clock monitoring. Unmatched automation capabilities. Start doing more with less and exceed service delivery expectations.

Learn More
10

TensorFlowOnSpark

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters

By combining salient features from the TensorFlow deep learning framework with Apache Spark and Apache Hadoop, TensorFlowOnSpark enables distributed deep learning on a cluster of GPU and CPU servers. It enables both distributed TensorFlow training and inferencing on Spark clusters, with a goal to minimize the amount of code changes required to run existing TensorFlow programs on a shared grid.

Downloads: 0 This Week

Last Update: 2024-08-05
See Project
11

Apache MXNet (incubating)

A flexible and efficient library for deep learning

Apache MXNet is an open source deep learning framework designed for efficient and flexible research prototyping and production. It contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations. On top of this is a graph optimization layer, overall making MXNet highly efficient yet still portable, lightweight and scalable.

Downloads: 0 This Week

Last Update: 2023-12-13
See Project
12

JRecord

Read Cobol data files in Java

... some code that allows ftping RDW files directly from the Mainframe into Hadoop/HDFS as a mapreduce job or standalone client.

9 Reviews

Downloads: 33 This Week

Last Update: 2024-08-22
See Project
13

MyCAT

Active, high-performance open source database middleware

MyCAT is an Open-Source software, “a large database cluster” oriented to enterprises. MyCAT is an enforced database which is a replacement for MySQL and supports transaction and ACID. Regarded as MySQL cluster of enterprise database, MyCAT can take the place of expensive Oracle cluster. MyCAT is also a new type of database, which seems like a SQL Server integrated with the memory cache technology, NoSQL technology and HDFS big data. And as a new modern enterprise database product, MyCAT...

Downloads: 2 This Week

Last Update: 2021-06-28
See Project
14

HSRA

Hadoop spliced read aligner for RNA-seq data

... is a distributed computing framework for scalable Big Data processing. HSRA currently supports single-end and paired-end read alignments from FASTQ/FASTA datasets. Moreover, our tool uses the Hadoop Sequence Parser (HSP) library (link above) to efficiently read the input datasets stored on the Hadoop Distributed File System (HDFS), being able to process datasets compressed with Gzip and BZip2 codecs.

Downloads: 0 This Week

Last Update: 2019-01-23
See Project
15

hmrjp-maven-plugin

Hadoop mapreduce maven plugin

hmrjp-maven-plugin is a maven plugin which helps creating, running and verifying hadoop mapreduce jobs remotely just like any other java project which is built using maven.

Downloads: 0 This Week

Last Update: 2015-09-30
See Project
16

CliqueSquare

Distributed RDF Processing over Hadoop

CliqueSquare is a system for storing and querying large RDF graphs relying on Hadoop’s distributed file system (HDFS) and Hadoop’s MapReduce open-source implementation. It provides a novel partitioning and storage scheme that permits 1-level joins to be evaluated locally using efficient map-only joins. In addition, CliqueSquare is equipped with a unique optimization algorithm based on graphs and cliques capable of generating highly parallelizable flat query plans relying on n-ary equality joins.

Downloads: 0 This Week

Last Update: 2016-11-16
See Project
17

HDFSFileTransfer

File transfer from local FS to HDFS

The HDFSFileTransfer project was created and developed to ease Hadoop users quickly copying varied files such as: flat, structured, unstructured, big and small from linux to Hadoop File System (HDFS). It allows users to transfer files: - within the same physical machine - from local file system (linux) into HDFS - between two physical machines - copy files from local file system (linux) with HDFS cluster installed to another HDFS cluster. Sample - one can have two single clustered Hadoop...

Downloads: 0 This Week

Last Update: 2014-09-03
See Project
18

pydoop

Pydoop is a Python MapReduce and HDFS API for Hadoop.

Downloads: 0 This Week

Last Update: 2014-07-30
See Project
19

Flamingo Project

Workflow Designer, Hive Editor, Pig Editor, File System Browser

Flamingo is a open-source Big Data Platform that combine a Ajax Rich Web Interface + Workflow Engine + Workflow Designer + MapReduce + Hive Editor + Pig Editor. 1. Easy Tool for big data 2. Use comfortable in Hadoop EcoSystem projects 3. Based GPL V3 License Supporting Pig IDE, Hive IDE, HDFS Browser, Scheduler, Hadoop Job Monitoring, Workflow Engine, Workflow Designer, MapReduce.

3 Reviews

Downloads: 0 This Week

Last Update: 2016-11-29
See Project
20

Standalone HDFS

Hadoop is a great project for deep analytics based on the MapReduce features. It also includes a powerful distributed file system designed to ensure that the analytics workloads can locally access the data to be processed to minimize the network bandwidth impact. I found this filesystem very useful to leverage storage from all my PCs and even from some of my online storage such as S3. However i did not want to deploy the full hadoop stack. Hence my decidion to create a standalone...

Downloads: 0 This Week

Last Update: 2016-07-27
See Project
21

epiHadoop

A next gen sequencing analysis pipeline designed to run on hadoop/hdfs written in java and PIG. For more info, contact Zack Ramjan at USC

Downloads: 0 This Week

Last Update: 2014-04-15
See Project
22

Jxtadoop

This project aims to provide P2P capabilities with Hadoop DFS.

Hadoop is designed to work in large datacenters with thousands of servers connected to each others in the Hadoop cloud. This project focuses on the Distributed File System part of Hadoop (HDFS). The goal of this project is to provide an alternative to direct IP connectivity required for Hadoop. Instead, the DFS layer has been modified to use a Peer-2-Peer framework which allows direct connectivity in datacenters as well as indirect connectivity to bypass firewall constraints. The typical...

Downloads: 0 This Week

Last Update: 2015-08-12
See Project
23

DynamicMR

A Dynamic Slot Allocation and Scheduling System for MapReduce Clusters

DynamicMR is a dynamic slot allocation and scheduling framework aiming to improve the performance of Hadoop under Hadoop Fair Scheduler (HDFS) by maximizing the slots utilization while guaranteeing the fairness across pools. It consists of three levels of scheduling components, namely, Dynamic Hadoop Fair Scheduler (DHFS), Dynamic Speculative Task Scheduler (DSTS), and Data Locality Maximization Scheduler (DLMS).

Downloads: 0 This Week

Last Update: 2013-11-23
See Project
24

BroadData

Integrated to system status data based on the HDFS

Downloads: 0 This Week

Last Update: 2013-05-31
See Project
25

HadoopFileManager

Console File Manager for Hadoop, written on java.

Console File Manager for Hadoop, written on java. For Linux only. Left panel contains local files, right - files from HDFS. For run execute: hadoop jar HadoopFileManager-0.1.0-DEMO.jar Lanterna library as UI. For avoid additional classpath, included into main jar. Current version is just demo, for check display possibility.

Downloads: 0 This Week

Last Update: 2013-05-30
See Project