Showing 33 open source projects for "hdfs"

View related business solutions
  • A CRM and Sales Data Management Platform for Multi-Line Sales Teams Icon
    A CRM and Sales Data Management Platform for Multi-Line Sales Teams

    The CRM, sales reporting, and commission tracking tool uniquely tailored to the needs of manufacturers, sales reps, and distributors.

    Repfabric is a customer relationship management (CRM) software designed specifically for multi-line sales teams (i.e. reps, distributors, wholesalers, dealers, and manufacturers). It streamlines and simplifies the sales process by providing deep integration with email, contacts, calendars, and deal tracking. The platform enables users to track commissions from CRM to sale, make updates directly from mobile devices, and document sales calls using voice-to-text features.
  • Finance Automation that puts you in charge Icon
    Finance Automation that puts you in charge

    Tipalti delivers smart payables that elevate modern business.

    Our robust pre-built connectors and our no-code, drag-and-drop interface makes it easy and fast to automatically sync vendors, invoices, and invoice payment data between Tipalti and your ERP or accounting software.
  • 1
    Apache Druid

    Apache Druid

    A high performance real-time analytics database

    Druid is designed for workflows where fast ad-hoc analytics, instant data visibility, or supporting high concurrency is important. As such, Druid is often used to power UIs where an interactive, consistent user experience is desired. Druid streams data from message buses such as Kafka, and Amazon Kinesis, and batch load files from data lakes such as HDFS, and Amazon S3. Druid supports most popular file formats for structured and semi-structured data. Druid has been benchmarked to greatly...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 2
    Apache Impala

    Apache Impala

    Apache Impala

    Impala provides low latency and high concurrency for BI/analytic queries on the Hadoop ecosystem, including Iceberg, open data formats, and most cloud storage options. Impala also scales linearly, even in multitenant environments. Impala is integrated with native Hadoop security and Kerberos for authentication, and via the Ranger module, you can ensure that the right users and applications are authorized for the right data. Utilize the same file and data formats and metadata, security, and...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    DVC

    DVC

    Data Version Control | Git for Data & Models

    DVC is built to make ML models shareable and reproducible. It is designed to handle large files, data sets, machine learning models, and metrics as well as code. Version control machine learning models, data sets and intermediate files. DVC connects them with code and uses Amazon S3, Microsoft Azure Blob Storage, Google Drive, Google Cloud Storage, Aliyun OSS, SSH/SFTP, HDFS, HTTP, network-attached storage, or disc to store file contents. Version control machine learning models, data sets...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Apache Drill

    Apache Drill

    Apache Drill is a distributed MPP query layer for self describing data

    ..., Qlikview, MicroStrategy, Spotfire, Excel and more. Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores. For example, you can join a user profile collection in MongoDB with a directory of event logs in Hadoop.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Free and Open Source HR Software Icon
    Free and Open Source HR Software

    OrangeHRM provides a world-class HRIS experience and offers everything you and your team need to be that HR hero you know that you are.

    Give your HR team the tools they need to streamline administrative tasks, support employees, and make informed decisions with the OrangeHRM free and open source HR software.
  • 5
    CubeFS

    CubeFS

    cloud-native file store

    CubeFS is a new generation cloud-native storage that supports access protocols such as S3, HDFS, and POSIX. It is widely applicable in various scenarios such as big data, AI/LLMs, container platforms, separation of storage and computing for databases and middleware, data sharing and protection, etc. Compatible with various access protocols such as S3, POSIX, HDFS, etc., and the access between protocols can be interoperable. Support replicas and erasure coding engines, users can choose flexibly...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Apache HBase

    Apache HBase

    Get random, realtime read/write access to your Big Data

    ... HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options. Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX. Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    JuiceFS

    JuiceFS

    JuiceFS is a distributed POSIX file system built on top of Redis

    A POSIX, HDFS and S3 compatible distributed file system for cloud. JuiceFS is designed to bring back the gold-old memories and experience of file systems in local disks to the cloud. JuiceFS is POSIX compliant and is fully compatible with HDFS and S3. Cloud app building or migrating, file sharing cross-geo and cross-cloud has become easier than ever before. Whether it's a public cloud, private cloud, or hybrid cloud, JuiceFS is available on any cloud of your choice and delivers flexibility...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Apache Hudi

    Apache Hudi

    Upserts, Deletes And Incremental Processing on Big Data

    Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Luigi

    Luigi

    Python module that helps you build complex pipelines of batch jobs

    ... jobs, dumping data to/from databases, running machine learning algorithms, or anything else. You can build pretty much any task you want, but Luigi also comes with a toolbox of several common task templates that you use. It includes support for running Python mapreduce jobs in Hadoop, as well as Hive, and Pig, jobs. It also comes with file system abstractions for HDFS, and local files that ensures all file system operations are atomic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Automated RMM Tools | RMM Software Icon
    Automated RMM Tools | RMM Software

    Proactively monitor, manage, and support client networks with ConnectWise Automate

    Out-of-the-box scripts. Around-the-clock monitoring. Unmatched automation capabilities. Start doing more with less and exceed service delivery expectations.
  • 10
    TensorFlowOnSpark

    TensorFlowOnSpark

    TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters

    By combining salient features from the TensorFlow deep learning framework with Apache Spark and Apache Hadoop, TensorFlowOnSpark enables distributed deep learning on a cluster of GPU and CPU servers. It enables both distributed TensorFlow training and inferencing on Spark clusters, with a goal to minimize the amount of code changes required to run existing TensorFlow programs on a shared grid.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Apache MXNet (incubating)

    Apache MXNet (incubating)

    A flexible and efficient library for deep learning

    Apache MXNet is an open source deep learning framework designed for efficient and flexible research prototyping and production. It contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations. On top of this is a graph optimization layer, overall making MXNet highly efficient yet still portable, lightweight and scalable.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12

    JRecord

    Read Cobol data files in Java

    ... some code that allows ftping RDW files directly from the Mainframe into Hadoop/HDFS as a mapreduce job or standalone client.
    Downloads: 33 This Week
    Last Update:
    See Project
  • 13
    MyCAT

    MyCAT

    Active, high-performance open source database middleware

    MyCAT is an Open-Source software, “a large database cluster” oriented to enterprises. MyCAT is an enforced database which is a replacement for MySQL and supports transaction and ACID. Regarded as MySQL cluster of enterprise database, MyCAT can take the place of expensive Oracle cluster. MyCAT is also a new type of database, which seems like a SQL Server integrated with the memory cache technology, NoSQL technology and HDFS big data. And as a new modern enterprise database product, MyCAT...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14

    HSRA

    Hadoop spliced read aligner for RNA-seq data

    ... is a distributed computing framework for scalable Big Data processing. HSRA currently supports single-end and paired-end read alignments from FASTQ/FASTA datasets. Moreover, our tool uses the Hadoop Sequence Parser (HSP) library (link above) to efficiently read the input datasets stored on the Hadoop Distributed File System (HDFS), being able to process datasets compressed with Gzip and BZip2 codecs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    hmrjp-maven-plugin

    hmrjp-maven-plugin

    Hadoop mapreduce maven plugin

    hmrjp-maven-plugin is a maven plugin which helps creating, running and verifying hadoop mapreduce jobs remotely just like any other java project which is built using maven.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    CliqueSquare

    CliqueSquare

    Distributed RDF Processing over Hadoop

    CliqueSquare is a system for storing and querying large RDF graphs relying on Hadoop’s distributed file system (HDFS) and Hadoop’s MapReduce open-source implementation. It provides a novel partitioning and storage scheme that permits 1-level joins to be evaluated locally using efficient map-only joins. In addition, CliqueSquare is equipped with a unique optimization algorithm based on graphs and cliques capable of generating highly parallelizable flat query plans relying on n-ary equality joins.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    HDFSFileTransfer

    File transfer from local FS to HDFS

    The HDFSFileTransfer project was created and developed to ease Hadoop users quickly copying varied files such as: flat, structured, unstructured, big and small from linux to Hadoop File System (HDFS). It allows users to transfer files: - within the same physical machine - from local file system (linux) into HDFS - between two physical machines - copy files from local file system (linux) with HDFS cluster installed to another HDFS cluster. Sample - one can have two single clustered Hadoop...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Pydoop is a Python MapReduce and HDFS API for Hadoop.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Flamingo Project

    Flamingo Project

    Workflow Designer, Hive Editor, Pig Editor, File System Browser

    Flamingo is a open-source Big Data Platform that combine a Ajax Rich Web Interface + Workflow Engine + Workflow Designer + MapReduce + Hive Editor + Pig Editor. 1. Easy Tool for big data 2. Use comfortable in Hadoop EcoSystem projects 3. Based GPL V3 License Supporting Pig IDE, Hive IDE, HDFS Browser, Scheduler, Hadoop Job Monitoring, Workflow Engine, Workflow Designer, MapReduce.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Standalone HDFS
    Hadoop is a great project for deep analytics based on the MapReduce features. It also includes a powerful distributed file system designed to ensure that the analytics workloads can locally access the data to be processed to minimize the network bandwidth impact. I found this filesystem very useful to leverage storage from all my PCs and even from some of my online storage such as S3. However i did not want to deploy the full hadoop stack. Hence my decidion to create a standalone...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    A next gen sequencing analysis pipeline designed to run on hadoop/hdfs written in java and PIG. For more info, contact Zack Ramjan at USC
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Jxtadoop

    Jxtadoop

    This project aims to provide P2P capabilities with Hadoop DFS.

    Hadoop is designed to work in large datacenters with thousands of servers connected to each others in the Hadoop cloud. This project focuses on the Distributed File System part of Hadoop (HDFS). The goal of this project is to provide an alternative to direct IP connectivity required for Hadoop. Instead, the DFS layer has been modified to use a Peer-2-Peer framework which allows direct connectivity in datacenters as well as indirect connectivity to bypass firewall constraints. The typical...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    DynamicMR

    A Dynamic Slot Allocation and Scheduling System for MapReduce Clusters

    DynamicMR is a dynamic slot allocation and scheduling framework aiming to improve the performance of Hadoop under Hadoop Fair Scheduler (HDFS) by maximizing the slots utilization while guaranteeing the fairness across pools. It consists of three levels of scheduling components, namely, Dynamic Hadoop Fair Scheduler (DHFS), Dynamic Speculative Task Scheduler (DSTS), and Data Locality Maximization Scheduler (DLMS).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    BroadData

    Integrated to system status data based on the HDFS

    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    HadoopFileManager

    Console File Manager for Hadoop, written on java.

    Console File Manager for Hadoop, written on java. For Linux only. Left panel contains local files, right - files from HDFS. For run execute: hadoop jar HadoopFileManager-0.1.0-DEMO.jar Lanterna library as UI. For avoid additional classpath, included into main jar. Current version is just demo, for check display possibility.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next