#24 DOM returns '@��V�' from Gawker Media Sites


I'm seeing a weird issue using the Simple HTML DOM file_get_html function.

I've got a URL of an external site I want to create a DOM for, and I've loaded it up real simply like this:

$passedLink = 'http://kotaku.com/the-first-infamous-second-son-gameplay-is-the-last-i-n-512741703';
$html = file_get_html($passedLink);

On my site the $passedLink is via a Javascript Bookmarklet, but that's the gist of it. This works perfectly fine 99% of the time. However, for some reason I haven't figured out yet, I occasionally but regularly get junk back instead of a proper DOM.

Here's a random line of the junk source dumped to the command line:

����Њ�Swl�Q��2T�@��V�|� ��n�4-�Y Zs44,SEk�FVg��1;��9PA�f(f�`$m� ��?�䯭�j�/������@�(��%�־�_�Mǫ��AK�}���X�!�@k��_�Z�q���b�GB[��[� �IW�*X�k��<l��k=^�C`��Ǿ�@�<�_<|�|�7��z�_�+O���^

(It dumps many many lines like this in a row when failing.)

When this does happen, triggering the bookmark again will usually work successfully as if nothing was wrong. As far as I can tell this is something wonky with how Gawker media serves their pages (so it happens on sites like Kotaku, Gizmodo, etc). I haven't been able to figure out why it fails so terribly sometimes, but not others, since when I re-run everything usually works fine. (I would say it fails about 2/3 of the time, but once it works, it will keep working for some short but undetermined period.)

Any tips on how to work around this? I'd be happy to work and do some tests if that would help track down the problem.



  • Nick

    Nick - 2014-01-22

    I never had time to test any changes on my end (at least any that I remember working). They must have done something server side to fix the Gawker media problem because I haven't had any issues recently. Sorry I can't help more.


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks