Menu

PHP Simple HTML DOM Parser / News: Recent posts

PHP Simple HTML DOM Parser 2.0 Release Candidate 2

I am very happy to announce the second release candidate for the next major version of simplehtmldom. It brings very important bug fixes, performance improvements and a few new features.

Important: This is a release candidate, which means some features might not yet be stable or emit unexpected behavior. Please don't hesitate to report broken or unstable features.

Here are the most notable changes:... read more

Posted by LogMANOriginal 2019-11-09

Composer package

This has been requested many times and now it's here. The new composer package is available for current master:

composer require simplehtmldom/simlehtmldom dev-master
<?php

require_once 'vendor/autoload.php';
use simplehtmldom\HtmlWeb;

echo (new HtmlWeb())->load('https://google.com/')->find('title', 0)->plaintext;

Unfortunately it doesn't seem possible to automate the package with sourceforge, so I connected it with the GitHub fork instead.

Posted by LogMANOriginal 2019-10-20

PHP Simple HTML DOM Parser 2.0 Release Candidate 1

I am happy to announce the first release candidate for the next major version of the parser. It brings exciting new features and performance improvements.

Important: This is a release candidate, which means some features might not yet be stable or emit unexpected behavior. Please don't hesitate to report broken or unstable features.

Here are the most notable changes:

  • Missing optional end tags like </tr> are being handled more efficiently. This results in much faster seek operations, especially on large documents. A performance boost of 10x or higher compared to version 1.9 is possible (when working with a lot of unclosed end tags).... read more
Posted by LogMANOriginal 2019-10-20

PHP Simple HTML DOM Parser 1.9.1 released

This is a bug fix release which fixes support for "text" selectors.

Download version 1.9.1 at https://sourceforge.net/projects/simplehtmldom/files/simplehtmldom/1.9.1/

Posted by LogMANOriginal 2019-10-20

PHP Simple HTML DOM Parser 1.9

I'm happy to announce the release of PHP Simple HTML DOM Parser 1.9!

This release is focused on bug fixes and updates to the manuals but also brings a few new functions.
Please note that this will be the last 1.x release (except for bug fixes maybe). More details will be made available in the future.

Most notable changes in this version... read more

Posted by LogMANOriginal 2019-05-30

Secure transfer to simplehtmldom.sourceforge.io and more!

Great news to anyone who aims for secure data transmission!

The project page at http://simplehtmldom.sourceforge.net now redirects to https://simplehtmldom.sourceforge.io, which is much more secure (using HTTPS) and reliable (PHP 7.x) than the "old" server (HTTP + PHP 5.4)!

But there is more!

For the past weeks I've been working on updating the existing documentation.
It is not yet available on the main page, but you can take a look at https://simplehtmldom.sourceforge.io/docs... read more

Posted by LogMANOriginal 2019-04-16

PHP Simple HTML DOM Parser 1.8.1

Important Version 1.8 was replaced by 1.8.1 in order to fix critical bugs.

PHP Simple HTML DOM Parser 1.8.1 is now officially available at https://sourceforge.net/projects/simplehtmldom/files/simplehtmldom/1.8.1/

This release introduces lots of bug fixes and adds support for many exciting CSS features we have been longing for!

Most notable changes:

  • Universal selectors (*) now works as expected.
  • All primary CSS features are now supported with the addition of these:
    • CSS combinators (>, +, ~)
    • Attribute selectors (|=, ~=)
    • Multiclass selectors (.class.class.class)
    • Multiattribute selectors ([attr1][attr2][attribute3])
    • Case sensitivity selectors in attributes (i and s)... read more
Posted by LogMANOriginal 2019-01-13

PHP Simple HTML DOM Parser 1.7

PHP Simple HTML DOM Parser 1.7 is now officially available at https://sourceforge.net/projects/simplehtmldom/files/simplehtmldom/1.7/

This release introduces bug fixes to the DOM parser and most importantly makes the project compatible to the most recent release of PHP 7.3, for which compatibility issues have been reported!

Most notable changes:

  • Compatible to PHP 7.3+
  • Major performance improvement of about 30% for the parser alone!
  • Improved handling of void tags and optional closing tags
  • Lots of bug fixes for the parser (selectors will be targeted in the next release)
  • Unit tests for reported bugs, using PHPUnit (so you can perform tests on your own fork now!)... read more
Posted by LogMANOriginal 2018-12-10

PHP Simple HTML DOM Parser 1.6

PHP Simple HTML DOM Parser 1.6, formally located at https://sourceforge.net/projects/simplehtmldom/files/simple_html_dom.php/download (and labeled as 1.5) is now an official release located in the releases folder https://sourceforge.net/projects/simplehtmldom/files/simplehtmldom/1.6/

This step is neccessary for getting back on track with new releases. The upcoming release 1.7 will be made available shortly!

Posted by LogMANOriginal 2018-12-10

The PHP Simple HTML DOM Parser has just been migrated from SVN to Git!

Find the Git repository on the Repository tab.

With the Git repository you can fork the project, browse the commit history and open merge requests!... read more

Posted by LogMANOriginal 2018-11-26

Looking for a graphic design volunteer

I'd love to find someone to revamp the simplhtmldom.sourceforge.net "help"/"manual" html pages. I hate the current look, and would love to see a far more readable easy to follow set of pages. Once I can get the look, and structure overhauled, I have a number of features that are at present undocumented, that I can add to the documentation.

I'm not looking for an ongoing commitment to this project, merely an overhaul of a set of html pages that are the "manual" for the project.... read more

Posted by John Schlick 2012-10-16

A update pass...

I've migrated to the new sourceforge project format. It doesn't appear that anything was lost, and the documentation homepage has stayed the same. I'm planning on making a pass thru the documentation sometime soon to make it way more up to date. If anyone wants to help make the formatting of those pages nicer, I'd be happy to take some help. Email me at John_Schlick@hotmail.com

I've also changed the debugging code inside of simple_html_dom to support the sourceforge debugobject project (download it at: https://sourceforge.net/projects/debugobject/ it's cool!).... read more

Posted by John Schlick 2012-10-10

More little changes.

I have added a lot of little features and enhancements over the last year.

A number of internal issues have been ironed out, and a few new features have been added (the ability to search for specific text inside of a tag, the ability to discover the original display size of an IMG tag, and a few other little things.

Please download the code from the repository as thats ALWAYS the most current.

Many thanks to the person that emailed me the very comprehensive list of changes to support alternate character sets, the ->plaintext output is MUCH better now.
John.

Posted by John Schlick 2012-04-20

Simple_html_dom.php 1.5 is released.

Sourceforge just allowed me to take over the project. As such, I have updated the source that I have spent the last year working on.

Memory leak is fixed.

simple_html_dom now detects the character set.

plaintext looks better since it understands more about newlines in html and what things ought to look like.

All changes are fully configurable.

Many more little changes. Docs to come over the next week or two.

Posted by John Schlick 2011-07-14

PHP Simple HTML DOM Parser v1.11 is released

  1. Supports xpath generated from Firebug.
  2. New method "dump" of "simple_html_dom_node".
  3. New attribute "xmltext" of "simple_html_dom_node".
  4. remove preg_quote on selector match function: [attribute*=value];
  5. Element "Comment" will treat as children.
  6. Fixed the problem with <pre>.
  7. Fixed bug #2207477 (does not load some pages properly).
  8. Fixed bug #2315853 (Error with character after < sign).
Posted by S. C. Chen 2008-12-18

PHP Simple HTML DOM Parser v1.10 is released

  1. Negative indexes supports of "find" method, thanks for Vadim Voituk.
  2. Constructor with automatically load contents either text or file/url, thanks for Antcs.
  3. Fully supports wildcard in selectors.
  4. Fixed bug of confusing by the < symbol inside the text.
  5. Fixed bug of dash in selectors.
  6. Fixed bug of <nobr>.
  7. Fixed bug #2155883 (Nested List Parses Incorrectly).
  8. Fixed bug #2155113 (error with unclosed html tags).
Posted by S. C. Chen 2008-10-25

PHP Simple HTML DOM Parser v1.00 is released

  1. New method "getAllAttributes" of "simple_html_dom_node".
  2. Fix the bug of selector in some critical conditions.
  3. Fix the bug of striping php tags.
  4. Fix the bug of remove_noise().
  5. Fix the bug of noise in attributes.
  6. Supports full javascript string in selector: $e->find("a[onclick=alert('hello')]").
  7. Change selector filter: "*=" to case-insentive.
Posted by S. C. Chen 2008-09-05

PHP Simple HTML DOM Parser v0.99 is released

  1. Performance turning (boost 10%).
  2. Memory requirement reduce 25%.
  3. Change function name from "file_get_dom()" to "file_get_html()".
  4. Change function name from "str_get_dom()" to "str_get_html()".
  5. Fixed bug #2011286 (Error with unclosed html tags).
  6. Fixed bug #2012551 (Error parsing divs).
  7. Fixed bug #2020924 (Error for missed tag.).
  8. Fixed bug (problem with <body> tag's innertext).
Posted by S. C. Chen 2008-08-03

PHP Simple HTML DOM Parser v0.98 is released

  1. Performance turning (boost 20%).
  2. Supports "multiple calss" selector feature: <div class="a b c"></div>.
  3. New "callback function" feature.
  4. New "multiple selectors" feature: $dom->find('p,a,b');
  5. New examples.
  6. Supports extract contents from HTML features: $dom->plaintext;
  7. Fix the bug of $dom->clear().
  8. Fix the bug of text nodes' innertext.
  9. Fix the bug of comment nodes' innertext.
  10. Fix the bug of decendent selector with optional tags.
  11. Change simple_html_dom_node method name from "text()" to "makeup()".
Posted by S. C. Chen 2008-06-24

PHP Simple HTML DOM Parser v0.97 is released

  1. Important!! file and class name changed (html_dom_parser->simple_html_dom)!
  2. Important!! ($dom->save_file) will not support anymore.
  3. New node type "comment" (eg. $dom->find('comment')).
  4. Add self-closing tags: 'base', 'spacer'.
  5. Fix the bug of outertext (th).
  6. Fix the bug of regular expression escaping chars ($dom->find).
  7. Fix the bug while line-breaker and "\t" in tags.
  8. Remove example "example_customize_parser.php".
  9. New example "simple_html_dom_utility.php".
Posted by S. C. Chen 2008-05-09

PHP Simple HTML DOM Parser v0.96 is released

  1. (Request #1936000) New DOM operations(first_child, last_child, next_sibling, previous_sibling).
  2. New method to remove attribute.
  3. Add the solution while server behind proxy in FAQ (Thanks to Yousuke Shaggy).
  4. Add traverse section in manual.
  5. Now file_get_dom supports full file_get_contents parameters.
  6. Fix the bug of self-closing tags in the end of file.
  7. Fix the bug of blanks in the end of tag.
  8. Add Reference section in manual.
Posted by S. C. Chen 2008-04-27

PHP Simple HTML DOM Parser v0.95 is released

  1. New attribute filters (Thanks to Yousuke Kumakura).
  2. Fix the bug of optional-closing tags.
  3. Fix the bug of parsing the line break next to the tag's name.
  4. Supports tag name with namespace.
Posted by S. C. Chen 2008-04-13

PHP Simple HTML DOM Parser v0.94 is released

  1. Stop infinity loop while tthe source content is BAD HTML.
  2. Fix the bug of adding new attributes to self closing tags.
  3. Fix the bug of customize parser without $dom->remove_noise();
  4. Add FAQ section in manual.
Posted by S. C. Chen 2008-04-06

PHP Simple HTML DOM Parser v0.93 is released

Fix the bug of parsing end-tag.
Fix the bug of endless "<".
Fix the bug of "remove_noise" method while stripping out tags.
Modify "example_customize_parser.php" with better regular expressions.
Add some guidelines for parser customization.

Posted by S. C. Chen 2008-03-31

PHP Simple HTML DOM Parser v0.91 is released

  1. Fix the bug of <p></p> problem.
  2. New feature: "plaintext" attribute to scrape pure text.
  3. Three scraping examples.
Posted by S. C. Chen 2008-03-25