Menu

#3 ^M character in XML kills titles

open
5
2001-02-23
2000-10-30
Pieter Krul
No

I've received a lot of messages from Cron like this:

WebFetch: error:
not well-formed at line 54, column 51, byte 2084 at /usr/lib/perl5/site_perl/5.005/i386-linux/XML/Parser.pm
line 185

Within the generated HTML that follows, only a part of the HTML code is being displayed correctly:
* Trustix Security Advisory - ping gnupg ypbind
* _
* Conectiva Linux Security Announcement: gnupg
* KDE.org: The People behind KDE: Reginald Stadlbauer
* eWeek: Dev tool goes open source
* _

This '_' is a correct link to an article, without title,
and probably a space.
Upon examining the LinuxToday XML file, only where
the <title> contains has a ^M char it behaves like this.
Example:

<story>
<title>ZDNet UK: Will free software come to the rescue of the UK's health^M
service?</title>
<url>http://linuxtoday.com/news_story.php3?ltsn=2000-10-30-015-21-PS-BZ-CY</url>
<time>Oct 30, 2000, 19:34:25</time>
<author>kreichard</author>
<topic>Press,Business,Community</topic>
<comments>0</comments>
</story>

Because a .diff looks really awkward with the ^M character:
I've inserted the following in WebFetch.pm, line 1341:
$title =~ s/\^M/ /go; # remove dos newlines

WebFetch still works, but I can't tell if it's a fix
though, nor if it should be fixed in WebFetch.

WebFetch 0.10 / XML-Parser-2.29 / Perl 5.005_03

Regards,

Pieter

Discussion

  • Ian Kluft

    Ian Kluft - 2001-02-23
    • assigned_to: nobody --> ikluft
    • labels: 100100 --> WebFetch core
     
  • Ian Kluft

    Ian Kluft - 2001-02-23

    Sorry about the delay in assigning this. We're getting the project moving again...

     

Log in to post a comment.