Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

PHP Simple HTML DOM Parser / News: Recent posts

Looking for a graphic design volunteer

I'd love to find someone to revamp the simplhtmldom.sourceforge.net "help"/"manual" html pages. I hate the current look, and would love to see a far more readable easy to follow set of pages. Once I can get the look, and structure overhauled, I have a number of features that are at present undocumented, that I can add to the documentation.

I'm not looking for an ongoing commitment to this project, merely an overhaul of a set of html pages that are the "manual" for the project.... read more

Posted by John Schlick 2012-10-16

A update pass...

I've migrated to the new sourceforge project format. It doesn't appear that anything was lost, and the documentation homepage has stayed the same. I'm planning on making a pass thru the documentation sometime soon to make it way more up to date. If anyone wants to help make the formatting of those pages nicer, I'd be happy to take some help. Email me at John_Schlick@hotmail.com

I've also changed the debugging code inside of simple_html_dom to support the sourceforge debugobject project (download it at: https://sourceforge.net/projects/debugobject/ it's cool!).... read more

Posted by John Schlick 2012-10-10

More little changes.

I have added a lot of little features and enhancements over the last year.

A number of internal issues have been ironed out, and a few new features have been added (the ability to search for specific text inside of a tag, the ability to discover the original display size of an IMG tag, and a few other little things.

Please download the code from the repository as thats ALWAYS the most current.

Many thanks to the person that emailed me the very comprehensive list of changes to support alternate character sets, the ->plaintext output is MUCH better now.
John.

Posted by John Schlick 2012-04-20

Simple_html_dom.php 1.5 is released.

Sourceforge just allowed me to take over the project. As such, I have updated the source that I have spent the last year working on.

Memory leak is fixed.

simple_html_dom now detects the character set.

plaintext looks better since it understands more about newlines in html and what things ought to look like.

All changes are fully configurable.

Many more little changes. Docs to come over the next week or two.

Posted by John Schlick 2011-07-14

PHP Simple HTML DOM Parser v1.11 is released

  1. Supports xpath generated from Firebug.
  2. New method "dump" of "simple_html_dom_node".
  3. New attribute "xmltext" of "simple_html_dom_node".
  4. remove preg_quote on selector match function: [attribute*=value];
  5. Element "Comment" will treat as children.
  6. Fixed the problem with <pre>.
  7. Fixed bug #2207477 (does not load some pages properly).
  8. Fixed bug #2315853 (Error with character after < sign).
Posted by S. C. Chen 2008-12-18

PHP Simple HTML DOM Parser v1.10 is released

  1. Negative indexes supports of "find" method, thanks for Vadim Voituk.
  2. Constructor with automatically load contents either text or file/url, thanks for Antcs.
  3. Fully supports wildcard in selectors.
  4. Fixed bug of confusing by the < symbol inside the text.
  5. Fixed bug of dash in selectors.
  6. Fixed bug of <nobr>.
  7. Fixed bug #2155883 (Nested List Parses Incorrectly).
  8. Fixed bug #2155113 (error with unclosed html tags).
Posted by S. C. Chen 2008-10-25

PHP Simple HTML DOM Parser v1.00 is released

  1. New method "getAllAttributes" of "simple_html_dom_node".
  2. Fix the bug of selector in some critical conditions.
  3. Fix the bug of striping php tags.
  4. Fix the bug of remove_noise().
  5. Fix the bug of noise in attributes.
  6. Supports full javascript string in selector: $e->find("a[onclick=alert('hello')]").
  7. Change selector filter: "*=" to case-insentive.
Posted by S. C. Chen 2008-09-05

PHP Simple HTML DOM Parser v0.99 is released

  1. Performance turning (boost 10%).
  2. Memory requirement reduce 25%.
  3. Change function name from "file_get_dom()" to "file_get_html()".
  4. Change function name from "str_get_dom()" to "str_get_html()".
  5. Fixed bug #2011286 (Error with unclosed html tags).
  6. Fixed bug #2012551 (Error parsing divs).
  7. Fixed bug #2020924 (Error for missed tag.).
  8. Fixed bug (problem with <body> tag's innertext).
Posted by S. C. Chen 2008-08-03

PHP Simple HTML DOM Parser v0.98 is released

  1. Performance turning (boost 20%).
  2. Supports "multiple calss" selector feature: <div class="a b c"></div>.
  3. New "callback function" feature.
  4. New "multiple selectors" feature: $dom->find('p,a,b');
  5. New examples.
  6. Supports extract contents from HTML features: $dom->plaintext;
  7. Fix the bug of $dom->clear().
  8. Fix the bug of text nodes' innertext.
  9. Fix the bug of comment nodes' innertext.
  10. Fix the bug of decendent selector with optional tags.
  11. Change simple_html_dom_node method name from "text()" to "makeup()".
Posted by S. C. Chen 2008-06-24

PHP Simple HTML DOM Parser v0.97 is released

  1. Important!! file and class name changed (html_dom_parser->simple_html_dom)!
  2. Important!! ($dom->save_file) will not support anymore.
  3. New node type "comment" (eg. $dom->find('comment')).
  4. Add self-closing tags: 'base', 'spacer'.
  5. Fix the bug of outertext (th).
  6. Fix the bug of regular expression escaping chars ($dom->find).
  7. Fix the bug while line-breaker and "\t" in tags.
  8. Remove example "example_customize_parser.php".
  9. New example "simple_html_dom_utility.php".
Posted by S. C. Chen 2008-05-09

PHP Simple HTML DOM Parser v0.96 is released

  1. (Request #1936000) New DOM operations(first_child, last_child, next_sibling, previous_sibling).
  2. New method to remove attribute.
  3. Add the solution while server behind proxy in FAQ (Thanks to Yousuke Shaggy).
  4. Add traverse section in manual.
  5. Now file_get_dom supports full file_get_contents parameters.
  6. Fix the bug of self-closing tags in the end of file.
  7. Fix the bug of blanks in the end of tag.
  8. Add Reference section in manual.
Posted by S. C. Chen 2008-04-27

PHP Simple HTML DOM Parser v0.95 is released

  1. New attribute filters (Thanks to Yousuke Kumakura).
  2. Fix the bug of optional-closing tags.
  3. Fix the bug of parsing the line break next to the tag's name.
  4. Supports tag name with namespace.
Posted by S. C. Chen 2008-04-13

PHP Simple HTML DOM Parser v0.94 is released

  1. Stop infinity loop while tthe source content is BAD HTML.
  2. Fix the bug of adding new attributes to self closing tags.
  3. Fix the bug of customize parser without $dom->remove_noise();
  4. Add FAQ section in manual.
Posted by S. C. Chen 2008-04-06

PHP Simple HTML DOM Parser v0.93 is released

Fix the bug of parsing end-tag.
Fix the bug of endless "<".
Fix the bug of "remove_noise" method while stripping out tags.
Modify "example_customize_parser.php" with better regular expressions.
Add some guidelines for parser customization.

Posted by S. C. Chen 2008-03-31

PHP Simple HTML DOM Parser v0.91 is released

  1. Fix the bug of <p></p> problem.
  2. New feature: "plaintext" attribute to scrape pure text.
  3. Three scraping examples.
Posted by S. C. Chen 2008-03-25

PHP Simple HTML DOM Parser v0.8 is released

PHP Simple HTML DOM Parser v0.8 is released

Posted by S. C. Chen 2008-03-14

PHP Simple HTML DOM Parser v0.6 is released

PHP Simple HTML DOM Parser v0.6 is released.

  1. Fix bug with getting outertext on HTML optional closing tags (eg. <hr>).
  2. Now supports string quotes in attribute selector(eg. $dom->find('div[id="title"]')).
Posted by S. C. Chen 2008-03-04