Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.
Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
Start Free Trial
Build Securely on AWS with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.
Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
A fast, high-level web crawling and web scraping framework
Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD.
Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
The best free open source website change detection and restock service
Loved by smart shoppers, data journalists, research engineers, data scientists, security researchers, and more. From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. Using the...
LinkChecker is a free, GPL licensed website validator. LinkChecker checks links in web documents or full websites. It runs on Python 3 systems, requiring Python 3.8 or later. The version in the pip repository may be old, to find out how to get the latest code, plus platform-specific information and other advice see doc/install.txt in the source code archive. If you do not want to install any additional libraries/dependencies you can use the Docker image which is published on GitHub...
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud
Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
HyperSQL is like a doxygen plus javadoc for SQL, hypermapping SQL views, packages, procedures, and functions to HTML source code listings and showing all code locations where these are used.
pyMantis is a data-management system for (systems) biology build on the web2py framework. It features: tree based file explorer, relational db table wizzard with automated creation of user interfaces, internal and external access management, wiki, ..
New Homepage: http://wummel.github.io/linkchecker/
Linkchecker features:
- recursive and multithreaded checking and site crawling
- output in colored or normal text, HTML, SQL, CSV, XML or a sitemap graph in different formats
- HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file links support
- restrict link checking with regular expression filters for URLs
- proxy support
- username/password authorization for HTTP, FTP
Reporting engine library written in C. Create one XML file and generate PDF, HTML, TXT, and CSV reports based on queries. Has support for MySQL, PostgreSQL, ODBC. Bindings for PHP, Java, Python.
Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
This python script takes an exported wordpress xml file and outputs a single html document containing all posts in order of entry, and a table of contents broken down by Category. CSS tags added for easy formatting.
A HTML scraper that uses machine learning frameworks to extract labelled fields from raw HTML. The project also involves the development of a tool to display the semi structured data generated by the scraper component.
zSearch is a simple python based crawler and search engine. Raw HTML are stored in bzip2 archives, the index is created using pylucene, and twsited is used to provide internal http server. Results are sent back as XML over HTTP.
XSDB XML is to DATA as HTML is to DOCUMENT. Publish and combine data as easily as HTML format and web browsers publish and view documents. Implementations in Python, javascript, java, C#/.NET.
wxBrowser is an application browser based on the wxWidgets GUI framework. It's similar to a regular old web browser only, instead of reading HTML and displaying content it reads XML and executes presentation logic (wxPython) in a client side application.
A content generation engine written in Python used for generating content for HTML and textual output. Integrates with Apache to form a web framework that uses XML templates and can embed Python.
LegionWeb is a Python (CGI) website engine. Initially designed simply to allow a site to be templated, it is now developing more mature features. Data is stored in HTML and XML files, although MySQL support is planned.
POST (Python Obviously Simple Text) provides support for
simple, flexible dynamic document generation in multiple output
formats. Supports inputs in text or XML, outputs
in HTML, PDF, RTF, LaTeX source, nroff source, postscript,
and plain text.
SchemaDoc is a XML-based markup language for documenting XML schemas. The work products include both the vocabulary and a set of tools for combining it with the schema source (e.g. a DTD) to produce documentation in HTML, XML DocBook, LaTeX, etc.
The Python Active Scripting Server provides an environment for creating dynamic web content as a combination of Python and pure HTML. Additionally it provides uniform database access, persistent sessions and XML based configuration files.
Reptile is a web server made in Python. It supports server side scripting with "Embedded Python", PHP, and CGI scripts.
It has an integrated HTML/XML validator that checks the pages before publication and others handy features.
This project produces software to convert the Moby Shakespeare texts to XML and HTML. It also provides miscellaneous software for maintaing literature Web sites. The software is currently used to produce the Web site http://tech-two.mit.edu/Shakespeare/
This is a collection of REST specifications, and implementations of those specs, for very low-level information sharing and workflow operations using REST actions over HTTP.
Implementations are in various languages, mainly Java, Python, and Ruby.
A standalone cross-platform GUI interface for retrieving files from the web via web services, API's, RSS feeds, html, and xml. Makes use of plugins to parse the source data, and make it easily retrievable. Utilizses Python and wxPython technologies. Cur