Crawl websites, sync to vector databases, and power RAG applications. Pre-built integrations for LLM pipelines and AI assistants.
Build data pipelines that feed your AI models and agents without managing infrastructure. Crawl any website, transform content, and push directly to your preferred vector store. Use 10,000+ tools for RAG applications, AI assistants, and real-time knowledge bases. Monitor site changes, trigger workflows on new data, and keep your AIs fed with fresh, structured information. Cloud-native, API-first, and free to start until you need to scale.
Try for free
Powerful cloud-based licensing solution designed for fast-growing software businesses.
A single-point of license control for desktop, SaaS, and mobile applications, APIs, VMs and devices.
10Duke Enterprise is a cloud-based, scalable and flexible software licensing solution enabling software vendors to easily configure, manage and monetize the licenses they provide to their customers in real-time.
Bi-gram applications based on language models produced by SRILM from Chinese Wikipedia corpus, include Chinese word segmenter, word-based (not character-based) Traditional-Simplified Chinese converter and Chinese syllable-to-word converter.
aaid (short for Assignment Aid) is a template tool designed with school (esp. high school & college) assignments in mind. It\'s design is closely wrapped in HTML, allowing you freedom in your templates, while also not requiring you to type numerous tags.
Count words or Find words with specified counts in a file.
As a tiny but handy program written in standard C++, it's supposed to complement command 'wc' from another aspect.
manipulation of text based documents by using the stdin/stdout stream designed to complement sed, lex and awk. It has no help except for the readme file and the example.
Generative Al is shaping brand discovery. AthenaHQ ensures your brand leads the conversation.
AthenaHQ is a cutting-edge platform for Generative Engine Optimization (GEO), designed to help brands optimize their visibility and performance across AI-driven search platforms like ChatGPT, Google AI, and more.
sedlexlist is a stream editor in that it reads from stdin and writes to stdout, similar to sed, lex and awk. The difference is this, sedlexlist works with lists of words and is designed to compliment the other tools.
PerlPoint generates presentation slides, docs, brochures and more from an easy to learn but powerful text format. It allows distributed team documentation and can be extended to produce any target format you like.
The project is intended to be a text preproccesor that works using user-generated xml files describing certain rules to be applied. It's thought for general use on any text file where a pattern can be specified to apply an "xml rule" over it.
Chordeus is yet another Chord Pro to PDF converter that creates nice looking guitar chord sheets. You can create single sheets or a whole songbook using the commandline tool or a simple wizard-like GUI.
Next-Gen Encryption for Post-Quantum Security | CLEAR by Quantum Knight
Lock Down Any Resource, Anywhere, Anytime
CLEAR by Quantum Knight is a FIPS-140-3 validated encryption SDK engineered for enterprises requiring top-tier security. Offering robust post-quantum cryptography, CLEAR secures files, streaming media, databases, and networks with ease across over 30 modern platforms. Its compact design, smaller than a single smartphone image, ensures maximum efficiency and low energy consumption.
Random name generator library with rule files. Fast, lightweight and easy. Can generate rule files from custom text, you can easily use it into your application and use the commandline tool to generate the best suitable name for your needs.
This is a small command-line program to split a phone bill into several seperate ones to categorise the calls (e.g. for illustration which family member phones how much). It generates HTML files and is able to send them to specified email adresses.
The software Tabua is a very, very simple python script code for table creation and manipulation. It is intended to be an easy way to build, change, manipulate and extract tables in many (language) formats.
EsTexte is a text-to-HTML based on an intuitive text format akin to various wiki formats and ascii text files. Written in Java, it can be used from the command-line or from other Java programs.
Strip out useless tags and other junk from HTML files. Shrink files, enhance readability of HTML source, promote privacy, and clean HTML exported from Microsoft Word (MS-Word). Run HTMLStrip as-is or customize it with your own regular expressions.
A set of LaTeX packages for different purposes. facsimile is for creating faxes with LaTeX, blacklettert1 lets you use Fraktur fonts, and retro is for typrewriter-based LaTeX documents.
The converter performs automatically the full process of converting the files of a C project into the equivalent C++ files. Classes are created, var and functions becomes attributes and methods and the changes are propagated into all files.
Tabfmt is a commandline utility to format tabular data. It reads lines from one or more files or from standard input, breaks the lines into fields given a set of field delimiters, and prints a table with constant-width columns to standard output.
Xindent, is a small standalone XML indenter.
It's written in pure Ansi C (C90) and released under MIT Licence.
I've began to wrote it, as i did'nt find usefull tool to format Docbook
files with text editor like vim.
Got any emails with obnoxious inline text? Long text stories with bad formatting? Files that an OCR didn't quite translate right? RTF format files and no easy way to read or modify them? Then eBookFormatter is for you!
This project utilizes the iPod's ability to store and display short text files to allow you to view RSS Feeds, Weather Forecasts, Movie Showtimes, and other text documents on your iPod when you are away from your computer
A graphical MS Windows version of the ever useful "tail" command in *nix. Features RegEx highlighting, multiple notification methods (Flash, Beep, Email, Balloon), alternating line colors for readability, Threshold Seperators, and simple XML Config