Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.
Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
Explore 10,000+ tools
Dun and Bradstreet Connect simplifies the complex burden of data management
Our self-service data management platform enables your organization to gain a complete and accurate view of your accounts and contacts.
The amount, speed, and types of data created in today’s world can be overwhelming. With D&B Connect, you can instantly benchmark, enrich, and monitor your data against the Dun & Bradstreet Data Cloud to help ensure your systems of record have trusted data to fuel growth.
DICOM Tag Slayer lets you view, modify, export and find differences in DICOM-format files. Program is multiplatform, written in Python+PyQT4+PyDICOM gives you both GUI and CLI for easier script integration.
HTTP functional and non-functional (load and performance) toolkit based on jython/grinder (http://grinder.sf.net) ...includes capabilities to support: SOA services, REST, json/xml encoding, AES and WS security ... and a stub to collect requests
This is a CSV (Camma Separated Values) to ARFF file format converter script written in python. Just 'Save As' your csv file as 'test.csv' and this script will convert it to 'test.arff' !
BibteXML is a bibliography schema for XML that expresses the content model of BibTeX – the bibliographic system for use with LaTeX. Stylesheets and conversion tools are provided.
Companies searching for an Employer of Record solution to mitigate risk and manage compliance, taxes, benefits, and payroll anywhere in the world
With G-P's industry-leading Employer of Record (EOR) and Contractor solutions, you can hire, onboard and manage teams in 180+ countries — quickly and compliantly — without setting up entities.
now here: https://github.com/plastex/plastex
plasTeX is a Python-based LaTeX document processing framework. It gives DOM-like access to a LaTeX document, as well as the ability to generate mulitple output formats (e.g. HTML, DocBook, tBook, etc.).
Redland is a set of object-based, modular and portable C RDF libraries providing RDF APIs for the graph, triple storage (librdf), RDF/XML parsing and serializing (Raptor), SPARQL RDF querying (Rasqal). Language APIs in Perl, PHP, Python, Ruby and others.
html2wordml is a python application for converting HTML pages to a WordML Microsoft Word XML document. The application can be used to create a new WordML document or to merge content into an existing template.
Award-Winning Medical Office Software Designed for Your Specialty
Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.
RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.
This is an ETL software which loads data from DBF/XBase files into MySQL. This utility has commandline interface, designed to work without user interaction.
This Python 3.1 tool manipulates the coordinate system of CNC GCODE for machining or engraving. It can Flip X, Y, or Z coordinates, mirror X, Y, or Z coordinates, flip or Mirror both XY coordinates, or insert Z motions into a GCODE file devoid of them.
Make AsciiDoc part of your literate programming tool set. With eWEB you can weave and tangle literate programs written as AsciiDoc documents, using embedded WEB code snippets.
Add-ons to the ECMWF GRIB API.
This project is about developing and maintaining add-ons to the GRIB API, like language bindings or documentation.
The main GRIB API page is at http://www.ecmwf.int/products/data/software/grib_api.html
Goldify is a set of tools that allow automated addition of links into electronic documents. Its main purpose is to allow such addition of links into documents that wish to link to the IUPAC GoldBook (http://goldbook.iupac.org).
libiptcdata is a standalone C-library for reading and writing the International Press Telecommunications Council (IPTC) metadata contained in various data files such as images.
TeXas assists the building process of LaTeX files and provides useful scripted features. It mainly acts as an automated build system. TeXbooklet creates booklets out of LaTeX files while TeXlayout creates a LaTeX file with a standard layout in it.
This project provide scripts for automatically generating man pages from wiki web based sources. So it consists with scripts which download wiki source files from wiki web server, convert it from wiki to roff format end then make archive of man pages.