Crawl websites, sync to vector databases, and power RAG applications. Pre-built integrations for LLM pipelines and AI assistants.
Build data pipelines that feed your AI models and agents without managing infrastructure. Crawl any website, transform content, and push directly to your preferred vector store. Use 10,000+ tools for RAG applications, AI assistants, and real-time knowledge bases. Monitor site changes, trigger workflows on new data, and keep your AIs fed with fresh, structured information. Cloud-native, API-first, and free to start until you need to scale.
Try for free
Comet Backup - Fast, Secure Backup Software for MSPs
Fast, Secure Backup Software for Businesses and IT Providers
Comet is a flexible backup platform, giving you total control over your backup environment and storage destinations.
The application Crimeblips provides up-to-date crime statistics for Berlin (Germany). It maps and visualizes crimes, allowing users to identify crime hot spots, trends and general patterns. Bayesian algorithms are used to extract relevant information.
A real-time graph plotter. While your application is computing and logging results to a CSV file using the LiveGraph Writer API, the plotter lets you visualise and monitor the results live - by instantly plotting charts and graphs of the data.
RDF-DocMan is a document manager based on a Sesame (RDF repository) backend. Documents are stored in the filesystem and their metadata in a Sesame repository.
It was developed for porQual web content generator (also in sf.net).
The Most Powerful Software Platform for EHSQ and ESG Management
Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.
Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.
KNeTS (Knowledge Elicitation Tools) is a survey tool to create multi-agent models based on local knowledge using pattern analysis to identify rules that are iteratively validated with the informant. The final output is a knowledge-based multi-agent model
ESSE is a flexible, efficient and easy to use search engine for data mining in environmental data archives. ESSE will help you find useful data even if you don't know exactly what you are looking for.
This is a suite of several software agents to provide a complete architecture of lexical base as proposed in Didier Schwab's PhD. thesis. It will be used for automatic translation, information retrieval and other natural language processing tasks.
Automotive Quality complaint Management to record Customer incident with all data, including reporting and easy serching. It is designed to use for technically automotive Customer complaints and to use as less screens as possible.(Automobil Reklamation)
GDS is a project to check generic measured earth science data. It is also an interactive database management system to store all data in MySQL-DB. GDS rules as project based management system with unlimited stations. Each station manages 20 sensors.
Visualization of the contact network and user data from the popular business network XING.com. The web-based software can be used by every registered user from XING.
oBrowse is a web based ontology browser developed in java. oBrowse parses OWL files of an ontology and displays ontology in a tree view. Protege-API, JSF are used in development.
A Trial Workbench for Facilitating Best Practices from Prospective to Acute Care in Respiratory Medicine. We aim to provide a set of information management toolboxes that facilitates decision support applications in medical information systems.
Similarity Evaluator is a tool to analyse similarity function implementations and algorithms, where is possible to compare several APIs on performance, best result, similarity and discernability values.
febrl-gen is a Java-based frontend to Febrl, an open-source data linkage system written in Python. Users can configure the parameters of a linkage project through the frontend, and febrl-gen will generate a Febrl-ready configuration file.
OntoExtractor is a way of building ontologies that proceeds in a bottom-up fashion, defining concepts as clusters of concrete XML objects. From a set of XML documents the application generates a taxonomy. OntoExtractor has been developed so far by the Kn
LACE means "Lucene Analyzer for CJK (Chinese/Japanese/Korean) & English". It's a simple tokenizer that can handle English-CJK mixed text. Chinese words are handled using a dictionary based method.
The aim of MIEX (Metadata and Information Extractor from small XML documents) is to create a wrapper for the Stanford Parser, to extract and store metadata (syntactic structures, relationships among words...) from simple XML documents.
The UIMA Annotator (called BRUTUS - Business Rules from Unstructured Text and Unstructured Sources) is a component for the UIMA Framework that allows for capturing business knowledge formalized in Structured English syntax (based on OMG's SBVR) with MOF
Visualization of finite state machines as a network graph. Accepted input files at the moment are: net files exported from xfst (Xerox Finite-State Tool) and lexc files (Finite-State Lexicon Compiler).
hypKNOWsys aims at developing a Java-based workbench for knowledge discovery and knowledge management. Currently, hypKNOWsys has released two intermediate tools: DIAsDEM Workbench (text mining for semantic tagging) and WUMprep (Web mining pre-processing)
Trauma registry suite; Data collection application and server scripts to build trauma data warehouse and perform web-based analysis reporting. Cross-platform compatible for Windows, Apple, Unix, or Linux.
JWebPro: A Java tool that can interact with Google search and then process the returned Web documents in a couple of ways. The outputs can serve as inputs for NLP, IR, infor extraction, Web mining, online social network extraction/analysis applications.