grabbing URLs

Status: Alpha

Brought to you by: ussj4gohan

#1 grabbing URLs

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2004-06-14

Created: 2004-06-14

Creator: Anonymous

Private: No

I don't know if you had this planned in mind for the
future or not, but anyway...

When a website is grabbed, the program should also
follow links on the page that was grabbed, and grab
that page, follow links on it and grab those pages,
etc. etc. so on and so forth. This would make it really
crawl, if you know what I mean. Obviously, it can't go
to all of those links on the original page at once, so
maybe save them in a file for later crawling, like say
the crawler hits a dead end, then it picks a link from
the file of saved links and starts again there, thus
travelling through more branches of the tree, and
forever extending those branches. However, it may only
be able to follow links that are put in as their full
address (such as a
href="http://site.com/page2.html") rather than a
shortened directory pointer (such as a
href="/page2.html" or a href="page2.html")

Let me know what you think, my email is
support@4lancer.net

grabbing URLs

Group

Searches

Help

#1 grabbing URLs

Discussion