Showing 28 open source projects for "python web crawler"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Your top-rated shield against malware and online scams | Avast Free Antivirus Icon
    Your top-rated shield against malware and online scams | Avast Free Antivirus

    Browse and email in peace, supported by clever AI

    Our antivirus software scans for security and performance issues and helps you to fix them instantly. It also protects you in real time by analyzing unknown files before they reach your desktop PC or laptop — all for free.
    Free Download
  • 1
    EpiDoc: Epigraphic Documents in TEI XML

    EpiDoc: Epigraphic Documents in TEI XML

    XML text markup for ancient documents

    The EpiDoc Collaborative is developing specifications and tools for standards-based, digital publication and interchange of scholarly and educational editions of documentary and literary texts like inscriptions and papyri. The link below will take you to the EpiDoc home page on this site.
    Leader badge
    Downloads: 18 This Week
    Last Update:
    See Project
  • 2
    PyXB (“pixbee”) is a pure Python package that generates Python source code for classes that correspond to data structures defined by XMLSchema. In concept it is similar to JAXB for Java and CodeSynthesis XSD for C++.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 3
    DoCookBook

    DoCookBook

    Cookbook Style Document for DocBook Customizations

    This project has been moved to GitHub: https://github.com/tomschr/dbcookbook/ The DoCookBook project aims to create an open source book about DocBook and the DocBook XSL stylesheets written as a cookbook and released under a Creative Commons license.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Meresco is both an OAI Data Provider and a Service Provider. SourceForge is only used to host the source control (subversion). Sources: http://sources.meresco.org/ Binaries: http://repository.cq2.org/ Mail: http://groups.google.com/group/meresco
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • 5
    Aurora Application Server is a new Python Web Application Server and Framework. The main goal of the project is to provide the developer with a complete set of tools to speed up the application development process. See project wiki for more information.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Python XML Serialization
    pyxser stands for python xml serialization and is a python object to XML serializer that validates every XML deserialization against the pyxser 1.0 XML Schema. pyxser is written entirely in C as a python extension.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    The London Datastore (http://data.london.gov.uk) was created by the Greater London Authority (GLA) as an innovation towards freeing London’s data. This SourceForge Project will be used to Open Source our development efforts surrounding data formats
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Methanol is a scriptable multi-purpose web crawling system with an extensible configuration system and speed-optimized architectural design. Methabot is the web crawler of Methanol.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    The Java Sitemap Parser can parse a website's Sitemap (http://www.sitemaps.org/). This is useful for web crawlers that want to discover URLs from a website that is using the Sitemap Protocol. This project has been incorporated into crawler-commons (https://github.com/crawler-commons/crawler-commons) and is no longer being maintained.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Powering the best of the internet | Fastly Icon
    Powering the best of the internet | Fastly

    Fastly's edge cloud platform delivers faster, safer, and more scalable sites and apps to customers.

    Ensure your websites, applications and services can effortlessly handle the demands of your users with Fastly. Fastly’s portfolio is designed to be highly performant, personalized and secure while seamlessly scaling to support your growth.
    Try for free
  • 10
    XForms Validator is the open source version of the online XForms Validator, available at http://xformsinstitute.com/validator/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    sabnzbd-xmlgui is an Ajax based frontend built around sabnzbdplus. It also provides an xml based API for other applications to easily connect with sabnzbd while at the same time maintaining the existing web based ajax gui.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    This project will provide translation of mathematical content, from TeX to MathML and vice-versa, and to graphics formats, as a web service. TeX, running as a daemon, is used for mathematical typography.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    4Suite is a platform for XML processing and knowledge-management, consisting of a library of integrated tools for XML processing, and an XML data repository and server with a rules-based engine.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 14
    BugEye is an XML-based unit test creation framework. Being XML-based, it can be easily translated to almost any language. The current translations are C#, Java, JavaScript, and Visual Basic. Future translations include C++, Python, Perl, and PHP.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Cathnet is developing the infrastructure for the Catholic Semantic Web. Technologies involved include, but are not limited to, XML, RDF, NLP, Zope, Plone and Plone products.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    XSDB XML is to DATA as HTML is to DOCUMENT. Publish and combine data as easily as HTML format and web browsers publish and view documents. Implementations in Python, javascript, java, C#/.NET.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    wxBrowser is an application browser based on the wxWidgets GUI framework. It's similar to a regular old web browser only, instead of reading HTML and displaying content it reads XML and executes presentation logic (wxPython) in a client side application.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Modular, network based system for integrating separate multimedia systems
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Splice is a Python-based content aggregation and publishing platform. It provides all of the features of a common weblog combined with synchronization capabilities, allowing content to be slurped in from external sources, classified, and published.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Python wrappers for the way(s) you think. Mindwrapper is a framework for the rapid development of custom, data-centric, GUI applications with wxPython.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    PySOS, a python-based implementation of the OGC SOS standard. PySOS is a lightweight set of scripts that work in conjunction with a web server to serve data from a relational database.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    This is a collection of REST specifications, and implementations of those specs, for very low-level information sharing and workflow operations using REST actions over HTTP. Implementations are in various languages, mainly Java, Python, and Ruby.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Planing events, and comercials for TV Broadcast stations, Interfaces to PBS(tm), Pinnacle-Vortex(tm) with Web frontend
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Is a replicator system writing in python for heterogeneous databases.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    PHP + mod_python GUI for athenaCL
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.