Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#4781 http::geturl fails for at least 1 RSS feed

obsolete: 8.6b1
open
Pat Thoyts
5
2011-01-25
2011-01-25
Anonymous
No

The following script fails when run on tcl-8.6.0.0b4:

set url {http://www.npr.org/templates/rss/podlayer.php?id=13}
set file "[ clock seconds ].xml"
set out [ open $file w ]
http::geturl $url -channel $out
close $out
set channel [ tDOM::xmlOpenFile $file ]
set doc [ dom parse -channel $channel ]
chan close $channel

The [dom parse ...] fails because the file retrieved by the [http::geturl ...] is missing a chunk of data starting somewhere around line 67.

_____________________________________________________________________________________________

Configuration information:
----------------------------------------

OS: Windows 7 (Version 6.1 (Build 7600))
Tcl: ActiveTcl 8.6.0.0b4

% info tclversion
8.6
% info patchlevel
8.6b1.2
% package require http
2.8.2
% package require tdom
0.8.3

Discussion


  • Anonymous
    2011-01-25

    script which fails - http-tcl-8.6.0.0b4-defect.tcl

     

  • Anonymous
    2011-01-25

    corrupted version of the RSS feed

     
    Attachments

  • Anonymous
    2011-01-25

    valid version of the RSS feed

     
    Attachments
  • Pat Thoyts
    Pat Thoyts
    2011-03-06

    Your problem is in handling the encoding. You should probably be setting the file to binary otherwise we apply the channel encoding to the data as we write it to the disk file. In this case you likely later expect it to be utf8.

    However, when we retrieve the resource using the http package we handle the encoding as declared on the remote site. In this case there is no http content-encoding header and its type is text/xml so we are treating the inbound stream as iso8859-1. The file internally declares xml utf-8 but we will never look inside the file.

    I suggest you use [open $file wb] and see how that works.

     

  • Anonymous
    2011-03-08

    Thanks for the response.

    I'll take a look at the specified URLs.

    One thing that confuses me about this issue is that the application I excerpted the code from has been working fine on Tcl 8.5.6, 8.5.8, & 8.5.9; but when I tried it with an 8.6 beta release, I found that a chunk of data missing near the 67th line in the temp file of captured data.

    Is the behavior of http::geturl expected to be different in 8.6?

     

  • Anonymous
    2011-03-08

    Notes:
    1) The attached XML file labeled "valid version of the RSS feed" was generated by running the code on Tcl 8.5.9.
    2) The attached XML file labeled "corrupted version of the RSS feed" was generated by running the code on Tcl 8.6.0.0b4 (ActiveTcl version #).