[Jscheme-user] hack-of-the-day: URL directory listing

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

{

In one project, we keep our data in .zip files accesssible through an
Apache web server.  While one can read from a particular URL easily,
there isn't a direct way to get a directory listing via HTTP.

However, you can get an HTML page containing the directory listing.
Then you could use javax.swing.text.html.HTMLEditorKit to read the
page and then walk it, but that seems like a lot of work.  Then I
realized (.split) could be used to remove the HTML markup, and split
lines so it would be easy to extract the information i wanted.

Though, one problem is that you need to write extraction code specific
to each web server.

The procedure (urlDir) grinds up the HTML page to provide a directory
listing, as a list of strings.  It works for Apache and Tomcat servers.
If you know of URLs that list directories on other servers, let me know.

Example usage:
(urlDir "http://openmap.bbn.com/~kanderso/")
(urlDir "http://tat.cs.brandeis.edu:8090/sum04/cs2a/")
}

(load "elf/basic.scm")
(load "using/run.scm")

(define (urlDir url)
  (define (listApache url)
    (map (lambda (v) (vector-ref v 2))
         (cdr (filter (lambda (v) (= (vector-length v) 4))
                      (stripHtml url)))))
  (define (listTomcat url)
    (map (lambda (s) (vector-ref (second s) 2))
         (reverse
          (cdr
           (reverse (by 6 (cdddr (cdddr (cdddr (cdr (stripHtml url)))))))))))
  (let ((server (.get (.get (.getHeaderFields (.openConnection (URL. url)))
                            "Server")
                      0)))
    (cond ((.startsWith server "Apache-Coyote") (listTomcat url))
          ((.startsWith server "Apache") (listApache url))
          (else (error {unknown server: [server]})))))

(define (stripHtml url)
  "Strip the HTML out of a url.  Returns a list of vectors of strings.
One for each line."
  (map*
   (lambda (line) (.split line " *<[^<]*> *"))
   (BufferedReader (.openStream (URL. url)))))