PHP Simple HTML DOM Parser / Support Requests / #24 DOM returns '@��V�' from Gawker Media Sites

#24 DOM returns '@��V�' from Gawker Media Sites

Milestone: v1.0_(example)

Status: closed

Owner: nobody

Labels: None

Priority: 5

Updated: 2019-04-15

Created: 2013-07-06

Creator: Nick

Private: No Discussion Disabled

I'm seeing a weird issue using the Simple HTML DOM file_get_html function.

I've got a URL of an external site I want to create a DOM for, and I've loaded it up real simply like this:

$passedLink = 'http://kotaku.com/the-first-infamous-second-son-gameplay-is-the-last-i-n-512741703';
$html = file_get_html($passedLink);

On my site the $passedLink is via a Javascript Bookmarklet, but that's the gist of it. This works perfectly fine 99% of the time. However, for some reason I haven't figured out yet, I occasionally but regularly get junk back instead of a proper DOM.

Here's a random line of the junk source dumped to the command line:

����Њ�Swl�Q��2T�@��V�|� ��n�4-�Y Zs44,SEk�FVg��1;��9PA�f(f�`$m� ��?�䯭�j�/������@�(��%�־�_�Mǫ��AK�}���X�!�@k��_�Z�q���b�GB[��[� �IW�*X�k��<l��k=^�C`��Ǿ�@�<�_<|�|�7��z�_�+O���^

(It dumps many many lines like this in a row when failing.)

When this does happen, triggering the bookmark again will usually work successfully as if nothing was wrong. As far as I can tell this is something wonky with how Gawker media serves their pages (so it happens on sites like Kotaku, Gizmodo, etc). I haven't been able to figure out why it fails so terribly sometimes, but not others, since when I re-run everything usually works fine. (I would say it fails about 2/3 of the time, but once it works, it will keep working for some short but undetermined period.)

Any tips on how to work around this? I'd be happy to work and do some tests if that would help track down the problem.

Thanks!
-Nick

Discussion

winumber - 2014-01-22

Hello Nick

Did you find some solution for that even I am facing the same problem and the url to test is: http://assaltnews.net/index.php?module=news&id=19274&category=71

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nick - 2014-01-22

I never had time to test any changes on my end (at least any that I remember working). They must have done something server side to fix the Gawker media problem because I haven't had any issues recently. Sorry I can't help more.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

LogMANOriginal - 2019-04-15

status: open --> closed

discussion: enabled --> disabled
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

LogMANOriginal - 2019-04-15

From the discussion it looks liket his topic is closed. Please don't hesitate to open a new request for further discussion.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DOM returns '@��V�' from Gawker Media Sites

A php based DOM parser.

Group

Searches

Help

#24 DOM returns '@��V�' from Gawker Media Sites

Discussion