Share

YARFRAW - Yet Another RSS Feed Reader A

The forum address has changed, you have been automatically redirected. Please update any bookmarks to use the new URL.

Subscribe

Create a FeedReader with a InputStream

  1. 2007-09-13 18:59:27 UTC
    Hi,

    Your library looks good.
    Unfortunnatelly, I need to be able to create a FeedReader object with an InpuStream because I want read the remote feed by myself with org.apache.commons.httpclient.

    FeedReader reader = New FeedReader (get.getResponseBodyAsStream());

    Do you plan to include such is feature in a really soon futur ?

    Thanks.

    Dominique
  2. 2007-09-20 18:12:22 UTC
    if you supply a httpClient HttpURL class to the feed reader, it will automatically read the remote feed as a stream for you.

    FeedReader r = new FeedReader(new HttpURL("http://youtube.com/rss/global/recently_added.rss"));
    see http://yarfraw.sourceforge.net/io.html

    if you look at the source codes, it's doing exactly what you wanted to do manually. Is there a reason why you want to open the remote stream yourself instead of having the reader class manages it for you?

    there is also a static method in FeedReader to read from any abitarily stream, so you can also use that.

    public static ChannelFeed readChannel(FeedFormat format, InputStream inputStream) throws YarfrawException{
    ...
    }

    Note the above method is a static method, so you need to specify the format of the feed, the Feedreader doesnt perform format detection with this method (see format detection under http://yarfraw.sourceforge.net/util.html ). I dont think keeping a live ref to a opened stream inside FeedReader is a good idea, because a stream should be closed a disposed as soon as you are finished with it.
  3. 2007-09-20 18:51:05 UTC
    Hi,

    I want open the remote stream myself because, I want implement the conditional get with httpclient and control various httpclient parameters as socket timeout and connection timeout.

    I need the format detection too, so readChannel doesn't help :)

    Thank you.


  4. 2007-09-21 04:58:15 UTC
    You can add httpclient parameters using the alternative constructor:

    public FeedReader(HttpURL httpUrl, HttpClientParams params) throws YarfrawException, IOException{}

    or
    add the params after construction by calling the setter method:

    public void setHttpClientParams(HttpClientParams httpClientParams) {}

    so if you want to change the connection timeout, write:

    HttpClientParams params = new HttpClientParams();
    params.setSoTimeout((int)DateUtils.MILLIS_PER_MINUTE);

    public FeedReader(new HttpURL("http://www.blah.com", params);

    or

    reader.setHttpClientParams(params);

    i am not sure what you meant by conditional get...

    To do format detection, use the util class called "FeedFormatDetector". After you detect the format, then call static readChannel method with the detected format and the input stream.

    see
    http://yarfraw.sourceforge.net/util.html for notes about the FeedFormatDetector.
  5. 2007-09-21 12:30:41 UTC
    Hi,

    For conditional get info, you can read http://fishbowl.pastiche.org/2002/10/21/http_conditional_get_for_rss_hackers

    In order to implement it, I need to get back 2 fields from response header (Last-Modified and ETag).

    Dominique
  6. 2007-09-21 14:19:57 UTC
    i see.

    in that case there are a few ways to do that:

    1. you can use the static read channel method
    2. you can extend the FeedReader class and override the method:
    protected InputStream getStream() throws IOException{}

    perhaps throw a special runtime exception when you decide not to read the stream.

    3. you can write a wrapper class to wrap the FeedReader class and manage the http stream in that class and pass it to FeedReader.

    again, holding a live stream within the FeedReader is not a good idea. my personal preference would be to use the static method, but the details are up to you.
  7. 2007-09-23 03:00:47 UTC
    I took my own suggestion (suggestion 2) and implemented a CachedFeedReader class that supports conditional get.
    This cached reader will keep a cached version of the last read feed, and performs conditional get to the remote sever using the 2 http headers, if the server responses a 304 not modified status code, the reader will return the cached feed, otherwise it reads and parse the response as normal.

    the new CachedFeedReader is in the latest snapshot jar, if you want to try it, you can download the 09/22 snapshot jar at the snapshot repo:

    http://www.cs.drexel.edu/~zl25/maven2/snapshot/

    or you can take a look at the source code and do something similar:

    http://yarfraw.svn.sourceforge.net/viewvc/yarfraw/yarfraw/yarfraw/src/main/java/yarfraw/io/CachedFeedReader.java?view=log

    This class will be included in the 0.9 release, I still need to write some unit tests and documentation for this class, and see what other features i can manage to also add to 0.9. So, 0.9 will be in perhaps a week or two.

    some examples codes for how to use the cached reader:

    FeedReader cacheFeedReader = new CachedFeedReader(
    new HttpURL("http://fishbowl.pastiche.org/index.rdf"));
    first = cacheFeedReader.readChannel();
    second = cacheFeedReader.readChannel();
    System.out.println(first == second);

    first is the same reference as second because the cached feed was returned when the feed is read immediately the second time.
  8. 2007-10-21 22:08:27 UTC
    Hi,

    I come back after one month. I continued my project on a lot of other aspects and now I am still not happy with the RSS library I am using (Rome).

    However, I don't think the way you implemented conditional get will help me. I understand that implemented a cache of feeds and you save in memory the Last-Modified and ETag values returned by each server.

    But, il tou stop and restart the process, the cache is lost !
    If you have to manage more then 100.000 feeds (as in my project), with some feeds checked each five minutes and others once a day, it is mandatory to memorise Last-Modified and ETag values in a database. So, I need to be able to get back http headers from feeds and disable your cache.

    By.

    Dominique
  9. 2007-10-21 22:21:55 UTC
    Oops,

    sorry, as the Last-Modified and ETag values are available through the CachedFeedReader class, it looks like this class will allow me to implement what I need.

    Thank you

    Dominque
  10. 2007-10-22 17:12:04 UTC
    i am glad that it does what you need.

    i realize that I should be my own user and write an feed reader using the Yarfraw so i can better adjust it to meet real user needs. however i still cant find time to do that. Your feedback is really valuable to the project, so please dont hesitate to let me know what other features you would like to see, i will be more than happy to add them to the library.

    patches are welcome too, of course.
  11. 2007-10-23 14:26:07 UTC
    Hi,

    I had to add two methods in order to be able to set Last-Modified and ETag properties.
    Unfortunnately, as the properties are private in cacheFeedReader, it was not possible to just extend cacheFeedReader, but I made a cut/paste of cacheFeedReader into cacheFeedReader2.

    Dominique
  12. 2007-10-23 15:22:14 UTC
    i made a design decisions to exclude the setters for those 2 header values, but since you are storing those values in a database, i guess you do need them. so i will be adding some setters for them and push that to the next release.
< Previous | 1 | Next >

Add a Reply

This forum does not allow anonymous participation.

Log in to add a reply. Not registered? Create an account to participate and receive email updates when replies are posted to this topic.