From: Nick M. <mat...@ai...> - 2001-04-16 19:58:01
|
Greetings, My name is Nick Matsakis, and I'm a graduate student at MIT's AI Laboratory. Lately, I've been working on some problems in information retrieval, and have become very interested in Slash-based sites as a potential source of data to try out some techniques. I've know that Slashdot (and others) serve lists of headlines in XML, and these are (presumably) the basis of many of the headline viewing programs out there. However, what I am interested in is the threaded comment discussions that each new article spawns. I've looked around a bit, and have not been able to find out whether or not Slash is able to provide third parties direct access to these discussions, rather than as HTML. Ideally, what I would want would be an XML-like document which lists each comment with a unique identifier, the identifier of the comment it is a response to, and the text of the comment. Other metadata (author, rating, subject) is useful but not necessary. I realize that this question is off-topic for a developers list, but it seemed like the most direct way to get an answer to the yes or no question of "is this possible?" Regards, Nick Matsakis |
From: CertIndex.com W. <web...@ce...> - 2001-04-17 23:13:45
|
Even if this isn't a direct function of Slash, I think you could easily get this same end by writing a perl/php/whatever script to spider the site, parse the articles' comments and generate .xml files. What do you think of this? ps: kudos on working in such an interesting field, glad to help ya wherever possible. ----- Original Message ----- From: "Nick Matsakis" <mat...@ai...> To: <sla...@li...> Sent: Monday, 16 April, 2001 12:58 Subject: [Slashcode-development] Q: Can Slash discussions be served in a format other than HTML? > > Greetings, > > My name is Nick Matsakis, and I'm a graduate student at MIT's AI > Laboratory. Lately, I've been working on some problems in information > retrieval, and have become very interested in Slash-based sites as a > potential source of data to try out some techniques. > > I've know that Slashdot (and others) serve lists of headlines in XML, and > these are (presumably) the basis of many of the headline viewing programs > out there. However, what I am interested in is the threaded comment > discussions that each new article spawns. I've looked around a bit, and > have not been able to find out whether or not Slash is able to provide > third parties direct access to these discussions, rather than as HTML. > > Ideally, what I would want would be an XML-like document which lists each > comment with a unique identifier, the identifier of the comment it is a > response to, and the text of the comment. Other metadata (author, rating, > subject) is useful but not necessary. > > I realize that this question is off-topic for a developers list, but it > seemed like the most direct way to get an answer to the yes or no question > of "is this possible?" > > Regards, > > Nick Matsakis > > > > _______________________________________________ > Slashcode-development mailing list > Sla...@li... > http://lists.sourceforge.net/lists/listinfo/slashcode-development > |
From: Brian A. <br...@ta...> - 2001-04-17 23:22:21
|
"CertIndex.com Webmaster" wrote: > > Even if this isn't a direct function of Slash, I think you could easily get this same end by writing > a perl/php/whatever script to spider the site, parse the articles' comments and generate .xml files. A lot of the information inside of a site can already be pulled out into XML. I would eventually like to see everything be able to work like this. I have been thinking about comments and how you would do an XML export. First, I d not think XML::RSS is the best option in this case since it has no ability to keep track of thread information. Anyone have any good ideas? -Brian |
From: Chris N. <pu...@po...> - 2001-04-18 19:25:26
|
At 16:13 -0700 2001.04.17, Brian Aker wrote: >I have been thinking about comments and how you would do an XML export. >First, I d not think XML::RSS is the best option in this case >since it has no ability to keep track of thread information. Actually, RSS 1.0 (and XML::RSS) can do this just fine. http://groups.yahoo.com/group/rss-dev/files/Modules/Proposed/mod_threading.html The problem is that you need an RDF module to describe your contents; as you can see, there is one for threading. You would also need one for describing the comments themselves. There has been discussion on the rss-dev list of how to do it (though I've not kept track in awhile). -- Chris Nandor pu...@po... http://pudge.net/ Open Source Development Network pu...@os... http://osdn.com/ |