With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.
You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Try free now
Cloud tools for web scraping and data extraction
Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.
Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.
This package contains different tools to add NLP capabilities for Lucene 4.x (it has been tested using Lucene version from 4.6.x to 4.8.1). Although it was originally developed for German, it is, mostly, language independent.
It allows the user to lemmatize words to be indexed, to weight termy ba their parts of speech (e.g. weighting nouns mor hevaily than pronouns), and to add synonyms taken from GermaNet or a list you provide to the search index and thereby increase recall of lucene.
Open Source SEO & SEM Text Creation Tools for free Article Writer
Open Source Tool for Search Engine Optimization (SEO & SEM) used for automatic content processing.
These SEO Content Genrators and Article Writers based on Text Writer:
https://www.artikelschreiber.com/en/
https://www.unaique.net/en/
https://www.unaique.com/
https://www.artikelschreiben.com/
https://www.buzzerstar.com/
https://googleduplicatecontentsolver.sourceforge.io/
https://inkassos.github.io/inkasso/
https://www.artikelschreiber.com/opensource/
https://www.sebastianenger.com/
https://www.artikelschreiber.com/marketing/review/
https://muckrack.com/markus-muller
https://linktr.ee/textgenerator
Code Contains:
- Perl Source code, language databases and more
This project aims to build a suite of Natural Language Processing tools. Modules will include corpus indexing and access tools, a part-of-speech tagger, tokenisers, text classification software, etc.
The Wikipedia Miner toolkit provides simplified access to Wikipedia. This open encyclopedia represents a vast, constantly evolving multilingual database of concepts and semantic relations; a promising resource for nlp and related research.
A system to perform analysis of large documents for the purpose of cataloging similar documents. Similarity is based upon contextual analysis of these documents done by identifying common words and proper nouns.
Provide a robust and efficient implementation of n-gram based classifiers to Java. N-Gram algorithms have shown to be surprisingly good at tasks like guessing the language/encoding from an arbitrary text file. And there are many more applications.
Pypes is a framework which allows users to break complex data processing logic down into a series of smaller less complex tasks. These tasks, referred to as components, can then be connected so that the output of one becomes the input to another.
Create and convert pipeline at scale through industry leading SMS campaigns, automation, and conversation management.
TextUs is the leading text messaging service provider for businesses that want to engage in real-time conversations with customers, leads, employees and candidates. Text messaging is one of the most engaging ways to communicate with customers, candidates, employees and leads. 1:1, two-way messaging encourages response and engagement. Text messages help teams get 10x the response rate over phone and email. Business text messaging has become a more viable form of communication than traditional mediums. The TextUs user experience is intentionally designed to resemble the familiar SMS inbox, allowing users to easily manage contacts, conversations, and campaigns. Work right from your desktop with the TextUs web app or use the Chrome extension alongside your ATS or CRM. Leverage the mobile app for on-the-go sending and responding.
SYRAH si propone di far emergere e rappresentare i concetti espressi per mezzo di un linguaggio naturale. SYRAH aims to discover and represent concepts expressed in natural languages. NLP, lemma, lemmario, italiano, rete, semantica, clustering, semantic
PDFBox is a Java PDF Library. This project will allow access to all of the components in a PDF document. More PDF manipulation features will be added as the project matures. This ships with a utility to take a PDF document and output a text file.
Switchboard is a conceptual-level interface to many web and network related functions (SOAP, REST, XML parsing, screen-scraping, FTP, network sniffing), designed for the Processing environment.
(Almost) all a scholar in the Humanities needs (polytonic Greek fonts, stylistic and metrical analysis tools, search engines on TLG and PHI) concentrated in only one Linux Live CD, ready to use everywhere at home or at University, without installation
Estraier is a personal full-text search system for web sites, local file systems, mail boxes, and so on. Estraier has flexible interface and it can handle multilingual documents and various file formats with external plug-ins.
The Infomap NLP software performs automatic indexing of words and documents from free-text corpora, using a variant of LSA to enable information retrieval and other applications. It was developed by the Infomap Project at Stanford University's CSLI.
TM4J is a topic map engine implemented entirely in Java. Topic maps are a standard paradigm for the interchange of knowledge structures. This project aims to produce a complete suite of tools for creating, processing and publishing topic map information.
SimpleRDF/XSL template simplifies RDF/XML sources as much as possible to allow easy processing. SimpleRDF/PHP5 parser takes advantage of SimpleRDF/XSL. It has extremly simple API. You can parse any RDF/XML compatible document (incl. RSS) and much more...
The "Universal Content Evaluation and Categorisation Software" is a program for analysing a websites, or more generally, a texts content. The text is arranged in dozens of categories, permitting more efficient web searches and information processing.
The DocConversion project provides a distributed document conversion solution with a well defined API which makes use of existing convstion tools and/or a centralized conversion server. This is part of the PRONIR research at http://www.pronir.nl
This code supplies miniature pedagogical Java implementations of information retrieval, spidering, and text-processing software. It was initially developed for an introductory course on Intelligent Information Retrieval and Web Search in UT Austin.
palbum is a perl script which turns a directory (or directory hierarchy) full of images into a nice image gallery. It generates thumbnails and index.html files, and requires no configuration. It uses the ubiquitous netpbm library for image processing.
...Uses regular expressions to search a set of DOM nodes, and transparently handles highlighting matches that span multiple elements. Highlight events are passed to a user defined highlighter for processing.