Showing 693 open source projects for "government web scrape"

View related business solutions
  • Payroll Services for Small Businesses | QuickBooks Icon
    Payroll Services for Small Businesses | QuickBooks

    Save up to 50% on QuickBooks Online! Keep the Accounting and Book Keeping for your Small Business up to date!

    Easily pay your team and access powerful tools, employee benefits, and supportive experts with the #1 online payroll service provider. Manage payroll and access HR and employee services in one place. Pay your team automatically once your payroll setup is complete. We'll calculate, file, and pay your payroll taxes automatically.
    Learn More
  • Save hundreds of developer hours with components built for SaaS applications. Icon
    Save hundreds of developer hours with components built for SaaS applications.

    The #1 Embedded Analytics Solution for SaaS Teams.

    Whether you want full self-service analytics or simpler multi-tenant security, Qrvey’s embeddable components and scalable data management remove the guess work.
    Try Developer Playground
  • 1
    Web Experience Toolkit

    Web Experience Toolkit

    Open source code library for building innovative websites

    Web Experience Toolkit (WET): Open source code library for building innovative websites that are accessible, usable, interoperable, mobile-friendly and multilingual. This collaborative open source project is led by the Government of Canada. A collection of flexible and themeable templates and reusable components. A collaborative open source project led by the Government of Canada. HTML5-first approach (leveraging native HTML5 support and filling support gaps with “polyfills”) Supporting a wide...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 4
    Roach

    Roach

    The complete web scraping toolkit for PHP

    Roach is a complete web scraping toolkit for PHP. It is a shameless clone heavily inspired by the popular Scrapy package for Python. Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well. It’s your all-in-one resource for web scraping in PHP. Roach doesn’t depend on a specific framework. Instead, you can use the core package...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Red Hat Ansible Automation Platform on Microsoft Azure Icon
    Red Hat Ansible Automation Platform on Microsoft Azure

    Red Hat Ansible Automation Platform on Azure allows you to quickly deploy, automate, and manage resources securely and at scale.

    Deploy Red Hat Ansible Automation Platform on Microsoft Azure for a strategic automation solution that allows you to orchestrate, govern and operationalize your Azure environment.
    Learn More
  • 5
    jsoup

    jsoup

    Java library for working with real-world HTML

    ... attempt to create a clean parse from the HTML you provide, regardless of whether the HTML is well-formed or not. You have HTML in a Java String, and you want to parse that HTML to get at its contents, or to make sure it's well formed, or to modify it. The String may have come from user input, a file, or from the web.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    Parsera

    Parsera

    Lightweight library for scraping web-sites with LLMs

    Scrape data from any website with only a link and column descriptions. Parsera is a tool designed to scrape web content, specifically handling poorly structured or messy websites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    CKAN

    CKAN

    CKAN is an open-source DMS for powering data hubs

    CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work with data. It's a data management system that provides a powerful platform for cataloging, storing and accessing datasets with a rich front-end, full API (for both data and catalog), visualization tools and more.CKAN is used by national and regional government organizations throughout the European Union, the Americas, Asia, and Oceania to power a variety of official and community data...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    rvest

    rvest

    Simple web scraping for R

    rvest helps you scrape (or harvest) data from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup and RoboBrowser. If you’re scraping multiple pages, I highly recommend using rvest in concert with polite. The polite package ensures that you’re respecting the robots.txt and not hammering the site with too many requests.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    crawlee

    crawlee

    A web scraping and browser automation library for Node.js

    Crawlee is a web scraping and browser automation library. It helps you build reliable crawlers. Fast. Crawlee won't fix broken selectors for you (yet), but it helps you build and maintain your crawlers faster. When a website adds JavaScript rendering, you don't have to rewrite everything, only switch to one of the browser crawlers. When you later find a great API to speed up your crawls, flip the switch back. It keeps your proxies healthy by rotating them smartly with good fingerprints...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Never Get Blocked Again | Enterprise Web Scraping Icon
    Never Get Blocked Again | Enterprise Web Scraping

    Enterprise-Grade Proxies • Built-in IP Rotation • 195 Countries • 20K+ Companies Trust Us

    Get unrestricted access to public web data with our ethically-sourced proxy network. Automated session management and advanced unblocking handle the hard parts. Scale from 1 to 1M requests with zero blocks. Built for developers with ready-to-use APIs, serverless functions, and complete documentation. Used by 20,000+ companies including Fortune 500s. SOC2 and GDPR compliant.
    Get Started
  • 10
    Ferret

    Ferret

    Declarative web scraping

    A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses Chrome...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    JobFunnel

    JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you have...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Rod

    Rod

    A Devtools driver for web automation and scraping

    Rod is a high-level driver for DevTools Protocol. It's widely used for web automation and scraping. Rod can automate most things in the browser that can be done manually. Chained context design, intuitive to timeout or cancel the long-running task. Auto-wait elements to be ready. Debugging friendly, auto input tracing, remote monitoring headless browser. Thread-safe for all operations. Automatically find or download browser. High-level helpers like WaitStable, WaitRequestIdle, HijackRequests...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    SurveyJS

    SurveyJS

    JavaScript Survey and Form Library

    SurveyJS Form Library is distributed as npm packages and as scripts and style sheets that you can reference on your page. You can use it in any React, Angular, Vue, Knockout, or jQuery application. React, Angular, Knockout, and Vue3 are supported natively. To communicate with the server, the libraries use JSON objects that represent form schemas (content and layout of a form) and form results (answers). You have the option to build dynamic JSON-driven forms using our free full-featured...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    AutoScraper

    AutoScraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    This project is made for automatic web scraping to make scraping easy. It gets a URL or the HTML content of a web page and a list of sample data that we want to scrape from that page. This data can be text, URL or any HTML tag value of that page. It learns the scraping rules and returns similar elements. Then you can use this learned object with new URLs to get similar content or the exact same element of those new pages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Nagios Core

    Nagios Core

    Nagios network monitoring software is enterprise server monitoring

    Nagios network monitoring software is a powerful, enterprise-class host, server, application, and network monitoring tools. Designed to be fast, flexible, and rock-solid stable. Nagios runs on *NIX hosts and can monitor Windows, Linux/Unix/BSD, Netware, and network devices.
    Leader badge
    Downloads: 2,002 This Week
    Last Update:
    See Project
  • 17
    OpenKM Document Management - DMS

    OpenKM Document Management - DMS

    Document Management System and Content Management System

    OpenKM is a electronic document management system and record management system EDRMS ( DMS, RMS, CMS ). It provides modern and flexible architecture that meet today's IT demands, based on open technology (Java, Tomcat, GWT, Lucene, Hibernate, Spring and jBPM), powerful and scalable multiplatform application. OpenKM is a Web 2.0 application that works with Internet Explorer, Firefox, Safari and Opera. Can be configured in major DMBS like Oracle, PostgreSQL and MySQL among others...
    Leader badge
    Downloads: 1,008 This Week
    Last Update:
    See Project
  • 18
    Pentaho from Hitachi Vantara

    Pentaho from Hitachi Vantara

    End to end data integration and analytics platform

    Pentaho Community Edition can now be downloaded from https://www.hitachivantara.com/en-us/products/pentaho-platform/data-integration-analytics/pentaho-community-edition.html Join the Community at https://community.hitachivantara.com/communities/community-pentaho-home?CommunityKey=e0eaa1d8-5ecc-4721-a6a7-75d4e890ee0 Pentaho couples data integration with business analytics in a modern platform to easily access, visualize and explore data that impacts business results. Use it as a full...
    Leader badge
    Downloads: 930 This Week
    Last Update:
    See Project
  • 19
    OpenMRS

    OpenMRS

    Open source Health IT for the planet

    OpenMRS is a community-developed, open source, enterprise electronic medical record system. Our mission is to improve health care delivery in resource-constrained environments by coordinating a global community to creates and support this software.
    Leader badge
    Downloads: 670 This Week
    Last Update:
    See Project
  • 20
    openMAINT

    openMAINT

    Open source solution for the Property & Facility Management

    openMAINT is an enterprise open source solution for the Property & Facility Management (CMMS). openMAINT helps to know and manage the inventory, maintenance, logistic and economic information related to buildings, plants and movable assets. openMAINT is a ready-to-use solution, configured with databases, workflows, reports and dashboards. The software can be gradually activated according to the needs of each organization and the available resources. openMAINT includes the...
    Leader badge
    Downloads: 321 This Week
    Last Update:
    See Project
  • 21
    LogicalDOC Document Management - DMS

    LogicalDOC Document Management - DMS

    smart and open source document management system

    ... to reduce costs significantly. Check out https://www.logicaldoc.com to learn more. The design of LogicalDOC is based on best-of-breed Java technologies in order to provide a reliable DMS platform. The main interface is web-based, no need to install anything else; users can access the system through their browser. LogicalDOC CE is 100% free software and is packaged with an open source database; while it supports all major DBMS, developers still recommend MySQL for production systems.
    Leader badge
    Downloads: 350 This Week
    Last Update:
    See Project
  • 22
    CMDBuild -Platform for  Asset Management

    CMDBuild -Platform for Asset Management

    Environment for configuring customized applications Asset Management

    CMDBuild is the open source web environment for the configuration of custom applications for the Asset Management. With CMDBuild you can build and extend your own CMDB, modeling it according to the needs of your Organization. You can configure workflows, reports, dashboards, schedule operations and checks, manage documents, georeference your asset in maps or view them in 3D models. You can also interoperate with external solutions through webservices. Or you can choose one...
    Leader badge
    Downloads: 201 This Week
    Last Update:
    See Project
  • 23
    Network Security Toolkit (NST)

    Network Security Toolkit (NST)

    A network security analysis and monitoring toolkit Linux distribution.

    ... in the toolkit. An advanced Web User Interface (WUI) is provided for system/network administration, navigation, automation, network monitoring, host geolocation, network analysis and configuration of many network and security applications found within the NST distribution. In the virtual world, NST can be used as a network security analysis, validation and monitoring tool on enterprise virtual servers hosting virtual machines.
    Leader badge
    Downloads: 211 This Week
    Last Update:
    See Project
  • 24
    Bonita

    Bonita

    A DPA process-based application platform with a workflow engine

    Bonitasoft fully supports digital operations and IT modernization with Bonita, an open-source and extensible platform for automation and optimization of business processes. The Bonita platform accelerates development and production with clear separation between capabilities for visual programming and for coding. Bonita integrates with existing information systems, orchestrates heterogeneous systems, and provides deep visibility into processes across the organization. Learn more at...
    Leader badge
    Downloads: 147 This Week
    Last Update:
    See Project
  • 25

    htmLawed

    PHP code to purify & filter HTML

    The htmLawed PHP script makes HTML more secure and standards- & policy-compliant. The customizable HTML filter/purifier can balance tags, ensure proper nestings, neutralize XSS, restrict HTML, beautify code like Tidy, implement anti-spam measures, etc.
    Downloads: 90 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next