Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Formats and Protocols
Data Formats
HTML/XHTML Software
Search Results

Search Results for "ofn-extract-objects.py"

x

Sort By:

Relevance

Clear All Filters

OS

Windows 16
Linux 14
Mac 12
More...
BSD 11
ChromeOS 9

Category

Formats and Protocols 19
- Data Formats 19
Internet 8
Software Development 6
Scientific/Engineering 3
System 2
Business 1
Desktop Environment 1
Security 1

License

OSI-Approved Open Source 19

Translations

English 2
Italian 1

Programming Language

Java 6
JavaScript 3
Python 3
C 2
More...
C++ 1
Free Pascal 1
PHP 1
Visual Basic 1
Visual Basic .NET 1
XSL (XSLT/XPath/XSL-FO) 1

Status

Production/Stable 5
Pre-Alpha 3
Beta 3
Alpha 2
More...
Planning 1

Showing 19 open source projects for "ofn-extract-objects.py"

View related business solutions

HTML/XHTML Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Keep company data safe with Chrome Enterprise
Protect your business with AI policies and data loss prevention in the browser

Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.

Download Chrome
1

Critical

Extract & Inline Critical-path CSS in HTML pages

Critical extracts & inlines critical-path (above-the-fold) CSS from HTML. Generate and inline critical-path CSS. Generate critical-path CSS. Generate and minify critical-path CSS. Generate, minify and inline critical-path CSS. Generate and return output via callback. Generate and return output via promise. When your site is adaptive and you want to deliver critical CSS for multiple screen resolutions this is a useful option. note, (your final output will be minified as to eliminate duplicate...

Downloads: 0 This Week

Last Update: 2024-09-23
See Project
2

jsoup

Java library for working with real-world HTML

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make...

Downloads: 0 This Week

Last Update: 2025-08-24
See Project
3

WebHarvest - web data extraction tool

Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.

14 Reviews

Downloads: 22 This Week

Last Update: 2025-10-25
See Project
4

pandas-datareader

Extract data from a wide range of Internet sources

Up-to-date remote data access for pandas. Works for multiple versions of pandas. Install using pip and then import and use one of the data readers. This example reads 5-years of 10-year constant maturity yields on U.S. government bonds. Stable documentation is available on github.io. A second copy of the stable documentation is hosted on read the docs for more details.

Downloads: 0 This Week

Last Update: 2023-04-20
See Project
Pest Control Management Software
Pocomos is a cloud-based field service solution that caters to businesses

Built for the pest control industry, but also works great for Mosquito Control, Bin Cleaning, Window Washing, Solar Panel Cleaning, and other Home Service Businesses in need of an easy-to-use software that helps you simplify routing, scheduling, communications, payment processing, truck tracking, time tracking, and reporting.

Learn More
5

Tailwindo

Convert Bootstrap CSS code to Tailwind CSS code

This tool can convert Your CSS framework (currently Bootstrap) classes in HTML/PHP (any of your choice) files to equivalent Tailwind CSS classes. Made to be easy to add more CSS frameworks in the future (currently Bootstrap). Can convert single files/code snippets/folders. Can extract changes to a separate CSS file as Tailwind components and keep old classes names.

Downloads: 0 This Week

Last Update: 2023-04-28
See Project
6

unfluff

Automatically extract body content (and other cool stuff) from HTML

unfluff is a Node.js library designed to automatically extract the main content from an HTML document — stripping away navigation bars, ads, footers and other boilerplate to leave you with the “body content”, metadata (title, author, date) and other useful fields. It’s a tool very much aimed at content-analysis, web scraping, building datasets, or repurposing article text for downstream processing (like machine-learning or summarization).

Downloads: 0 This Week

Last Update: 2025-11-14
See Project
7

htmlpicker

Picks up text from a web page using a html template.

A java html picker - text extractor Picks up text from a web page using a html template. Useful if you have regularly data to extract from the same site. You may use the same url or you may build urls having parameters. These parameters are fetch from a text file.

Downloads: 0 This Week

Last Update: 2015-03-17
See Project
8

HXPath

XPath HTML parser

HXPath is a command line tool useful to extract data from HTML documents. HXPath can select sub trees, like the standard xpath tool, but is also able to read contents and attributes and output them in a bash friendly format. HTML Tidy and HTTP/HTTPS get are built in too.

Downloads: 0 This Week

Last Update: 2016-05-26
See Project
9

xWebScraper

This is an advanced web scraper with user friendly GUI which let the user define rules and web addresses to extract data from one time or periodically and a target database filed that the data should be saved in.

Downloads: 0 This Week

Last Update: 2014-07-13
See Project
AI-First Supply Chain Management
Supply chain managers, executives, and businesses seeking AI-powered solutions to optimize planning, operations, and decision-making across the supply

Logility is a market-leading provider of AI-first supply chain management solutions engineered to help organizations build sustainable digital supply chains that improve people’s lives and the world we live in. The company’s approach is designed to reimagine supply chain planning by shifting away from traditional “what happened” processes to an AI-driven strategy that combines the power of humans and machines to predict and be ready for what’s coming. Logility’s fully integrated, end-to-end platform helps clients know faster, turn uncertainty into opportunity, and transform the supply chain from a cost center to an engine for growth.

Learn More
10

HtmlList

A python package to find repetitive format pattern in HTML pages and extract information from them using this pattern. The idea is that in pages that have some kind of a list, there will be a repetitive pattern for the human eye (the page format).

Downloads: 0 This Week

Last Update: 2013-04-23
See Project
11

Textract

The Textract Project consists of C++ source code to extract text from a growing assortment of file formats. Output is indexing-ready. The Textract Project is intended as a foundation to support research-quality search engines.

Downloads: 0 This Week

Last Update: 2013-04-24
See Project
12

COM Markup Language Parser

Create or parse ANY Mark-up Language (HTML XML X3D VRML MathML XAML XDP CDA SCORM COLLADA XBRL) file or string into a simple and versatile MLDocument, MLElement, MLParameter hierarchical object model, written in VB 6 (Win32). Alternative to using DOM.

Downloads: 0 This Week

Last Update: 2013-04-15
See Project
13

Take notes!

Questo script consente di evidenziare, estrarre e condividere contenuti da una pagina web tramite la semplice selezione col mouse. This script allows you to highlight, extract and share content from a web page simply by mouse selecting.

Downloads: 2 This Week

Last Update: 2013-04-11
See Project
14

Galateia HTML Extractor

A HTML scraper that uses machine learning frameworks to extract labelled fields from raw HTML. The project also involves the development of a tool to display the semi structured data generated by the scraper component.

1 Review

Downloads: 0 This Week

Last Update: 2013-05-14
See Project
15

ASP .Net viewstate decoder / encoder +

viewstate is a decoder and encoder for ASP .Net viewstate data. It supports the different viewstate data formats and can extract viewstate data direct from web pages. viewstate will also show any hash applied to the viewstate data.

Downloads: 1 This Week

Last Update: 2013-04-24
See Project
16

Syncopate

Syncopate is an extension module to the Apache JMeter testing tool. It enhances JMeter's HTTP proxy server by adding functionality to extract variables and create assertions during HTTP request recording.

Downloads: 0 This Week

Last Update: 2013-04-08
See Project
17

DataExtractor - HTMLtoXML

The DataExtractor (HTMLtoXML) extracts data from a HTML page according to a configuration file and puts the data into an XML file according to a specified structure. It is a tool to extract data from HTML pages and to store the data in XML files.

Downloads: 0 This Week

Last Update: 2014-03-05
See Project
18

Xidel

Xidel is a cli webpage scraping tool supporting XPath/XQuery 3 and CSS

...The extracted values can then be exported as plain text/XML/JSON, or assigned to variables to use in other extract expressions. It also provides an online CGI service for testing of XPath / XQuery 3.0 expression. (Xidel is a part of the VideLibri project, so its project page just redirects there )

3 Reviews

Downloads: 0 This Week

Last Update: 2017-05-12
See Project
19

pdftohtml

This is a tool to convert pdf files to html/text files and extract images.

Downloads: 0 This Week

Last Update: 2014-06-28
See Project

Previous
You're on page 1
Next

Related Searches

jsoup

web scraper

data recovery

url extractor

html parser

xbrl c#

asp net portal cms source code

extract links in html files

sudoku humans

web scraping

Related Categories

Formats and Protocols

Internet

Software Development

Scientific/Engineering

System

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2025 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise

×

Thanks for helping keep SourceForge clean.

X

You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Briefly describe the problem (required):

Upload screenshot of ad (required):

Select a file, or drag & drop file here.

✔

✘

Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL: