Scala Data Profiling Tools

View 3423 business solutions

Browse free open source Scala Data Profiling Tools and projects below. Use the toggles on the left to filter open source Scala Data Profiling Tools by OS, license, language, programming language, and project status.

  • SysAid multi-layered ITSM solution Icon
    SysAid multi-layered ITSM solution

    For organizations spanning all industries and sizes from SMBs to Fortune 500 corporations

    SysAid is an ITSM, Service Desk and Help Desk software solution that integrates all of the essential IT tools into one product. Its rich set of features include a powerful Help Desk, IT Asset Management, and other easy-to-use tools for analyzing and optimizing IT performance.
  • Integrate in minutes with our email API and trust your emails reach the inbox | SendGrid Icon
    Integrate in minutes with our email API and trust your emails reach the inbox | SendGrid

    Leverage the email service that customer-first brands trust for reliable inbox delivery at scale.

    Email is the backbone of your customer engagement. The Twilio SendGrid Email API is the email service trusted by developers and marketers for time-savings, scalability, and delivery expertise. Our flexible Email API and proprietary Mail Transfer Agent (MTA), intuitive console, powerful features, and email experts make it easy to ensure all your email gets delivered in seconds and without interruption.
  • 1
    DISTOD

    DISTOD

    Distributed discovery of bidirectional order dependencies

    The DISTOD data profiling algorithm is a distributed algorithm to discover bidirectional order dependencies (in set-based form) from relational data. DISTOD is based on the single-threaded FASTOD-BID algorithm [1], but DISTOD scales elastically to many machines outperforming FASTOD-BID by up to orders of magnitude. Bidirectional order dependencies (bODs) capture order relationships between lists of attributes in a relational table. They can express that, for example, sorting books by publication date in ascending order also sorts them by age in descending order. The knowledge about order relationships is useful for many data management tasks, such as query optimization, data cleaning, or consistency checking. Because the bODs of a specific dataset are usually not explicitly given, they need to be discovered. The discovery of all minimal bODs (in set-based canonical form) is a task with exponential complexity in the number of attributes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    apache spark data pipeline osDQ

    apache spark data pipeline osDQ

    osDQ dedicated to create apache spark based data pipeline using JSON

    This is an offshoot project of open source data quality (osDQ) project https://sourceforge.net/projects/dataquality/ This sub project will create apache spark based data pipeline where JSON based metadata (file) will be used to run data processing , data pipeline , data quality and data preparation and data modeling features for big data. This uses java API of apache spark. It can run in local mode also. Get json example at https://github.com/arrahtech/osdq-spark How to run Unzip the zip file Windows : java -cp .\lib\*;osdq-spark-0.0.1.jar org.arrah.framework.spark.run.TransformRunner -c .\example\samplerun.json Mac UNIX java -cp ./lib/*:./osdq-spark-0.0.1.jar org.arrah.framework.spark.run.TransformRunner -c ./example/samplerun.json For those on windows, you need to have hadoop distribtion unzipped on local drive and HADOOP_HOME set. Also copy winutils.exe from here into HADOOP_HOME\bin
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next