Crawl websites, sync to vector databases, and power RAG applications. Pre-built integrations for LLM pipelines and AI assistants.
Build data pipelines that feed your AI models and agents without managing infrastructure. Crawl any website, transform content, and push directly to your preferred vector store. Use 10,000+ tools for RAG applications, AI assistants, and real-time knowledge bases. Monitor site changes, trigger workflows on new data, and keep your AIs fed with fresh, structured information. Cloud-native, API-first, and free to start until you need to scale.
Try for free
Nonprofit Budgeting Software
Martus Solutions provides seamless budgeting, reporting, and forecasting tools that integrate with accounting systems for real-time financial insights
Martus' collaborative and easy-to-use budgeting and reporting platform will save you hundreds of hours each year. It's designed to make the entire budgeting process easier and create unlimited financial transparency.
SemaRule Navigator is an Integrated Suite of Open-Source and Free-License Software, placing Semantic and Text Analysis Technologies in the toolbox of Researchers, Students, and Enterprises.
The name of this project is DuruBI. It is Enterprise Reporting Tool allows DB(Data Base) and OLAP(Online analytical processing) and DM(Data Mining) to query and reporting from various data sources.
OpenEphyra is an open framework for question answering (QA). It retrieves answers to natural language questions from the Web and other sources. Visit http://www.ephyra.info/ for more details and information on joining this open research initiative.
GeoMondrian is a "spatially-enabled" version of Mondrian. GeoMondrian brings to the Mondrian OLAP server what PostGIS brings to the PostgreSQL DBMS, i.e. a consistent and powerful support for geospatial data. It also provides geo-extensions to MDX.
ActiveInsight provides real-time detection and reaction to events and patterns. It is a platform that enables the detection of meaningful events within multiple, high frequency, event streams.
DISMOD Core Open Source Project: DISMOD Core is the core library of DISMOD, an SCO Application developed by Fraunhofer IML. DISMOD has been used over years by our specialists to solve optimization problems inside the transportation domain.
XIForge is a team of IT volunteer to explore new free open source technology framework and platform. We focus Pentaho and OpenBravo ERP. Our current hosted project includes Pentaho Data Integration Parse JSON String plugin. Team founder is Reid Lai.
For companies looking to automate their consolidation and financial statement function
The software is cloud based and automates complexities around consolidating and reporting for groups with multiple year ends, currencies and ERP systems with a slice and dice approach to reporting. While retaining the structure, control and validation needed in a financial reporting tool, we’ve managed to keep things flexible.
easyDE is an Enterprise Business Intelligence platform that facilitates timely and effective business decision for companies to gain competitive advantage. Allow a wide range of end-users to quickly deploy rich analyses with a single integrated product.
Data mining tool for sequences (e.g. trajectories on a map, visited web pages, etc.) that creates a succinct description of the sequences, given a taxonomy (e.g. regions and sub-regions in the map, categories and sub-categories of pages, etc.).
The aim of ALIVE is to develop new approaches to the engineering of flexible, adaptable distributed service-oriented systems based on the adaptation of social coordination and organisation mechanisms.
Open Force QST is a Query and Schema Tool for Salesforce. View and keep historical records of your schema. Compare schema with history to find changes. Query your Salesforce instance using SOQL and display results. Create reports from saved Queries.
A generic SQL driven data audit tool for detecting differences between any JDBC accessible database tables and other data sources. Platform independent. It's a unix like diff for databases. Produces key values with the differing column name and data
The MCAS Project ( Metrics Correlation and Analysis service ) provides integral solution for system operators or VO users to uniformly access, transform and represent disjoint metrics data generated by distributed middle ware or user services.
PanBI is a collection of analytics modules for existing information systems. For each IS, it provides data extraction, transformation and loading logic coupled with an OLAP schema, delivering OLAP functionality to an unprecedented user base.
Advanced Analysis Services is a Business Intelligence (BI) tool to let users analyze OLAP sources like Pentaho, Mondrian, Microsoft Analysis Services (MSAS) or Hyperion, in an intuitive way, based on analysis templates like Paretto, Ranking and BCG
iSURF: An Interoperability Service Utility for Collaborative Supply Chain Planning across Multiple Domains Supported by RFID Devices. iSURF (http://www.srdc.com.tr/isurf/) project is funded under ICT-2007-1.3 objective of FP7 of European Commission.
SplitPDF -SplitPDF.jar- is a ‘command-line driven’ Java-program, it splits a PDF-file by bookmarks into separated PDF’s. The bookmark is used as title for the newly created PDF. Extremely usefull and fast in a batch processing environment.
Ajanta is a Java API to solve linear programming problems. Linear programming is a method for determining a way to achieve the best outcome (such as maximum profit or lowest cost) in a given list of constraints.
The Pentaho Personalizer is based on the "dead" project PentahoLooker. The Pentaho Personalizer is sponsored by Lizacom and have a "commercialy" financed core of developers to ensure that the project do not "die" by lack of time or interest.
A group a subprojects for Data Cleaning projects, mainly as a step of a Data Mining Project. Visit www.datacleaningopensource.com to review our current applications or if you want to add yours. NOTE: PROGRAMMING SKILLS ARE REQUIRED.
This project aims at providing a centralized system to store, retrieve, and execute BIRT reports in a server environment so that applications using BIRT reports do not have to sore the reports by themselves, and rely on this project for management.
weka outlier is an implementation of outlier detection algorithms for WEKA.
CODB (Class Outliers: Distance-Based) Algorithm is the first algorithm developed using WEKA framework.