i tried the "set-cookie" header and still unable to crawl. below is the reqest and response header i've gathered from ieinspector, what should the exact set-cookies syntax be like for jcrawler to login and crawl? thanks.
request post header (for login)
(Request-Line):POST /web/guest/en/websys/webArch/login.cgi HTTP/1.1
Accept:image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/pdf, */*
User-Agent:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; iOpus-I-M; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
Cookie:risessionid=074711352911134; cookieOnOffChecker=on; wimsesid=--
(Status-Line):HTTP/1.0 302 Moved Temporarily
Date:Mon, 12 Jun 2006 20:18:14 GMT
Expires:Mon, 12 Jun 2006 20:18:14 GMT
irakli <irakli@...> wrote: Following answers your questions in reverse order
2. JCrawler does not support frames-based web-sites. New web standards do not encourage using frames and, in our experience, there are very few web sites that still use frames.
1. JCrawler allows to set HTTP Header information. "Logging in", on the web, means setting a cookie (which is part of http headers), therefore you can indirectly allow crawling of a website that requires "log-in".
These are steps to take:
What you need to do:
1) Find out what cookie variables are set during authentication with
2) Find-out their names and values as they are set during the
authentication of the user you are interested it
3) edit conf/crawleConfig.xml and tell it to set those cookies to those values.
4) Example setting:
Assuming authentication sets a cookie named "user" to
value " dmagda" and cookie named "password" to value
"3497313EFDA923453" which stands for a password crypted by md5() or
sha1() or whatever other crypting algorithm is being used.
Then in the config you would add the following line:
<header name="Cookie">user=dmagda; password=3497313EFDA923453</header>
Hope this solves your problem.
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around