Thanks Simon, a few thoughts...
On 20/05/13 22:10, Simon Kågedal Reimer wrote:
> Hi John and others!
> I agree that there are two issues here: 1. being more tolerant with
> invalid input; 2. giving useful error reports.
> As for #1, I guess opinions differ as to what extent parsers should
> deal with invalid input. In my opinion, the user experience should be
> the first priority - if the user finds that his favourite feed doesn't
> work in Liferea, but works in Google Reader (which I've verified that
> this particular feed does) or other common RSS aggregators, he or she
> will conclude that Liferea is bad. It would be interesting to hear the
> developers' view on this!
I put the URL into Firefox, and that doesn't display the feed, so
Liferea is not alone.
> If we, then, would like Liferea to be more liberal in this instance,
> can we get libxml2 to do this? I was browsing around the
> documentation, but I couldn't find any way to get it to behave more
> tolerant towards invalid characters. An alternative is to strip out
> any invalid characters before passing them to the libxml2 parser. I
> have written some code now to do this, will submit a patch later.
Although I suggested the parser might be too strict, I'm also a little
wary. This strictness is part of the design of XML for good reasons. In
a different context Vint Cerf the designer of the IP protocol now
regrets his famous dictum "be liberal in what you accept and strict in
what you send".
The proper place to solve this is in Wordpress, it should strip invalid
characters from the feed before sending it. However although I'm not an
XML expert, I imagine that the only valid byte values less than 0x20 in
XML are carriage return, newline, tab, and perhaps form feed and
vertical tab. If that's so, I can't see any problem with stripping
anything else below 0x20 before parsing it (and perhaps the equivalent
bytes between 0x80 and 0x9F, as long as these aren't part of a unicode
> As for #2, I think what happens is: Liferea first tries to parse the
> input as a feed. If that fails, it goes on to try feed auto-discovery.
> When that fails, error is reported. I think it would make more sense
> if it first looked at the input, concluded that "yep, this seems to be
> meant as RSS (or Atom or whatever)" (it could look at the MIME type
> and the beginnings of the content, see if there's an <rss> tag), and
> if parsing fails, just report that error.
> I've found a similar situation: if you give Liferea a subscription URI
> that 404s, it will report both the 404 and a parse error, that it
> couldn't find any embedded feeds on the 404 page. The latter seems
> unnecessary and confusing in what I otherwise think is a very nice
> system for error reports that Liferea has got.
Yes, I'm happy for Liferea to report an error, but its report in this
situation is misleading, which is unhelpful to say the least.
> By the way, you can your feed working John with a little hack - right
> click on the subscription, select properties, under the "Source" tab
> click "Use conversion filter" and put the following in the text box:
> perl -pe 's/\x1f//'
Thanks so much, that works - that's a byway of Liferea I've never
explored. I'm wondering if something like
perl -pe 's/[\x00-\x08\x0D-\x1F]//' would be a more general work-around,
but It's a bit late at night to experiment.
> Simon Kågedal Reimer
> Den 19 maj 2013 17.02, "John Dablin" <jdablin@...> skrev:
> I forgot to mention, I'm using Liferea 1.8.10 on Ubuntu 13.04.
> John Dablin
> AlienVault Unified Security Management (USM) platform delivers complete
> security visibility with the essential security capabilities. Easily and
> efficiently configure, manage, and operate all of your security controls
> from a single console and one unified framework. Download a free trial.
> Liferea-devel mailing list