From: Umanda D. <abe...@gm...> - 2014-01-02 17:36:13
|
Hello, I have crawled http://www.cse.mrt.ac.lk/ and I.m trying to recreate this from wayback 1.8. But following javascript isin the html. <script type="text/javascript">RokStoriesImage.push('/images/stories/demo/rokstories/rs4.jpg');RokStoriesImage.push('/images/stories/demo/rokstories/rs3.jpg');RokStoriesImage.push('/images/stories/demo/rokstories/rs4.jpg'); window.addEvent('domready', function() { new RokStories('.feature-block', {'startElement': 0,'thumbsOpacity': 0.5,'mousetype': 'click','autorun': 0,'delay': 5000,'startWidth': 615 }); });</script><div class="feature-block"> <div class="image-container"> <div class="image-full"></div> <div class="image-small"> <img src=" /images/stories/demo/rokstories/rs4_thumb.jpg<http://www.cse.mrt.ac.lk/images/stories/demo/rokstories/rs4_thumb.jpg>" class="feature-sub" alt="image" /> <img src=" /images/stories/demo/rokstories/rs3_thumb.jpg<http://www.cse.mrt.ac.lk/images/stories/demo/rokstories/rs3_thumb.jpg>" class="feature-sub" alt="image" /> <img src=" /images/stories/demo/rokstories/rs4_thumb.jpg<http://www.cse.mrt.ac.lk/images/stories/demo/rokstories/rs4_thumb.jpg>" class="feature-sub" alt="image" /> </div> </div> <div class="desc-container "> In the start up of the web site,rs4.jpg is loaded into the image-full div block.But this is not working in the wayback. Is there a special reason for that? Please help me to find this. Regards On Wed, Dec 18, 2013 at 5:31 AM, Noah Levitt <nl...@ar...> wrote: > > > Hello, > > Basically yeah that's what hops means, except the seed is hop=0, and > the links from seed are hop=1, I think. > > By "max-depth" do you mean the property maxPathDepth of > org.archive.modules.deciderules.TooManyPathSegmentsDecideRule? If so, > you have the right idea. "TooManyPathSegmentsDecideRule... Rule > REJECTs any CrawlURIs whose total number of path-segments (as > indicated by the count of '/' characters not including the first '//') > is over a given threshold." > > http://builds.archive.org/javadoc/heritrix-3.x-snapshot/org/archive/modules/deciderules/TooManyPathSegmentsDecideRule.html > > On "Problem2", the wayback issue, the wayback mailing list might be a > better place to ask. > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > You can cc this list if you want. Please include relevant information > your wayback setup and the behavior you are seeing as precisely as you > can. > > Noah > > > On Sat, Dec 14, 2013 at 10:38 PM, Umanda Dikwatta <abe...@gm...> > wrote: > > > > > > Hi Noah, > > > > Thank you so much for your reply. To get more clear idea, I have > explained, > > what I understood here. Please tell is it correct? > > > > Problem1 > > > > If we consider http://www.mrt.ac.lk/web/ as a seed and then if we > specify > > max-hops = 3 and max-depth=7. > > > > Is it mean, http://www.mrt.ac.lk/web/ is hop=1. Then all the links in > the > > http://www.mrt.ac.lk/web/ has hop=2. > > All the links inside those links has hop=3. Since max-hops=3, links > inside > > these will not crawled. Then what > > is the max-depth? Is this the correct definition for hops? > > > > According to this hops definition > > > http://www.mrt.ac.lk/web/sites/default/files/styles/slideshow/public/field/slideshow/ERU%202013%204.jpg > > is in http://www.mrt.ac.lk/web/ and therefore it is in hop=2. But if we > > consider number of slashes, it has more than 7 > > (max-depth) slashes. > > So is this slashes indicates the max-depth. As I could see in my crawl > log, > > number of slashes >=7 has not crawled. > > Only other links have been crawled. > > > > Is this what do mean Noah? > > > > Problem2 > > > > I tried this with wayback 1.6 and wayback 1.8. But still the issue is > there > > with the duplicate content. Is there any solution for this? > > > > Thank you and Regards > > > > > > > > > > __._,_.___ > Reply via web post<http://groups.yahoo.com/group/archive-crawler/post;_ylc=X3oDMTJwNWRsaWphBF9TAzk3MzU5NzE0BGdycElkAzg3NTk4NjcEZ3Jwc3BJZAMxNzA1MDA0OTI0BG1zZ0lkAzg0MzYEc2VjA2Z0cgRzbGsDcnBseQRzdGltZQMxMzg3MzI0ODc2?act=reply&messageNum=8436> Reply > to sender > <nl...@ar...?subject=Re%3A%20%5Barchive-crawler%5D%20Heritrix%203%2E1%2E0%20and%20wayback%20questions> Reply > to group > <arc...@ya...?subject=Re%3A%20%5Barchive-crawler%5D%20Heritrix%203%2E1%2E0%20and%20wayback%20questions> Start > a New Topic<http://groups.yahoo.com/group/archive-crawler/post;_ylc=X3oDMTJlbHFtc3Y1BF9TAzk3MzU5NzE0BGdycElkAzg3NTk4NjcEZ3Jwc3BJZAMxNzA1MDA0OTI0BHNlYwNmdHIEc2xrA250cGMEc3RpbWUDMTM4NzMyNDg3Ng--> Messages > in this topic<http://groups.yahoo.com/group/archive-crawler/message/8423;_ylc=X3oDMTM0ZGRxcmVkBF9TAzk3MzU5NzE0BGdycElkAzg3NTk4NjcEZ3Jwc3BJZAMxNzA1MDA0OTI0BG1zZ0lkAzg0MzYEc2VjA2Z0cgRzbGsDdnRwYwRzdGltZQMxMzg3MzI0ODc2BHRwY0lkAzg0MjM->(4) > Recent Activity: > > - New Members<http://groups.yahoo.com/group/archive-crawler/members;_ylc=X3oDMTJmb29jYXV1BF9TAzk3MzU5NzE0BGdycElkAzg3NTk4NjcEZ3Jwc3BJZAMxNzA1MDA0OTI0BHNlYwN2dGwEc2xrA3ZtYnJzBHN0aW1lAzEzODczMjQ4NzY-?o=6> > 1 > > Visit Your Group<http://groups.yahoo.com/group/archive-crawler;_ylc=X3oDMTJlZmljZ3RzBF9TAzk3MzU5NzE0BGdycElkAzg3NTk4NjcEZ3Jwc3BJZAMxNzA1MDA0OTI0BHNlYwN2dGwEc2xrA3ZnaHAEc3RpbWUDMTM4NzMyNDg3Ng--> > [image: Yahoo! Groups]<http://groups.yahoo.com/;_ylc=X3oDMTJkOWhsZW1hBF9TAzk3MzU5NzE0BGdycElkAzg3NTk4NjcEZ3Jwc3BJZAMxNzA1MDA0OTI0BHNlYwNmdHIEc2xrA2dmcARzdGltZQMxMzg3MzI0ODc2> > Switch to: Text-Only<arc...@ya...?subject=Change+Delivery+Format:+Traditional>, > Daily Digest<arc...@ya...?subject=Email+Delivery:+Digest>• > Unsubscribe<arc...@ya...?subject=Unsubscribe>• Terms > of Use <http://info.yahoo.com/legal/us/yahoo/utos/terms/> • Send us > Feedback > <ygr...@ya...?subject=Feedback+on+the+redesigned+individual+mail+v1> > . > > __,_._,___ > |