<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to Home</title><link>https://sourceforge.net/p/crawlstat/wiki/Home/</link><description>Recent changes to Home</description><atom:link href="https://sourceforge.net/p/crawlstat/wiki/Home/feed" rel="self"/><language>en</language><lastBuildDate>Fri, 19 Oct 2018 10:17:59 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/crawlstat/wiki/Home/feed" rel="self" type="application/rss+xml"/><item><title>Discussion for Home page</title><link>https://sourceforge.net/p/crawlstat/wiki/Home/?limit=25#98df</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Crawlstat&lt;/p&gt;
&lt;p&gt;There are two major portions of crawlstat code.&lt;br/&gt;
1 backend coding in python&lt;br/&gt;
2 frontend coding in php and jquery&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Backend&lt;/strong&gt; &lt;br/&gt;
There are multiple files written in python. the file names are self descriptive so from name we can understand what a file code do.&lt;br/&gt;
&lt;strong&gt;Path&lt;/strong&gt;&lt;br/&gt;
/home/hduser/crawlstatJobs_hafiz&lt;br/&gt;
1  make_dump.py =&amp;gt; this file get url and cld2_str from solr and make a dump file.&lt;br/&gt;
2 python.py  =&amp;gt; this files reads domains from dump5.txt and write domains in domains.txt file.&lt;br/&gt;
3 crawlstatJobs.sh =&amp;gt; this file run all the crawl stat jobs&lt;br/&gt;
4 domains_DB.py =&amp;gt; this code reads file from domains.txt and write domains in DB.domains table&lt;br/&gt;
5 sld_mapper.py &amp;amp; sld_reducer.py =&amp;gt;  thses codes reads sld and then write in DB&lt;br/&gt;
6 tld_mapper.py &amp;amp; tld_reducer.py  =&amp;gt;  same as above&lt;br/&gt;
7 Language_Extraction.py =&amp;gt; This code reads first,second and third language from dump.txt and then write in first, second and third_language.txt files respectively.&lt;br/&gt;
8 first, second , third_language_DB.py  =&amp;gt; these codes reads first,second and third_languages from text files and save them in DB.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Path&lt;/strong&gt;&lt;br/&gt;
/home/hduser/crawlstatJobs_hafiz/fetch_phase_stats/&lt;br/&gt;
1 daily_job_runner.sh  =&amp;gt; this file runn the code which we need to run on daily basis&lt;br/&gt;
2  solrClass.py =&amp;gt; this is class for using solr through single point. i.e. In future we can change solr path through single line of code&lt;br/&gt;
3  DB.py =&amp;gt; this is class for Database. Through this file we will be able to change the db path from single line of code.&lt;br/&gt;
4 index_doc_info.py =&amp;gt;  this file get total documents indexed and total number of web_group and then store it into mysqli in index_doc_info table&lt;br/&gt;
5 language_detection_info.py =&amp;gt; this file get cle_score and cld2_score from solr and save it to mysqli in Language_detector table&lt;br/&gt;
6 main.py =&amp;gt; this file total no. of docs appeared in fetch , no.of docs successfully fetched, low urdu_contents docs, timeout and other errors from jobHistory server and then save it to DB to fetch_info table.&lt;br/&gt;
7 jobCounterClass.py  =&amp;gt;  this is helping class use in main.py&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">hafiz naser aslam</dc:creator><pubDate>Fri, 19 Oct 2018 10:17:59 -0000</pubDate><guid>https://sourceforge.netc8c7d3382ed1fb0806b62210a94821cb78f0647b</guid></item><item><title>Home modified by hafiz naser aslam</title><link>https://sourceforge.net/p/crawlstat/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Welcome to your wiki!&lt;/p&gt;
&lt;p&gt;This is the default page, edit it as you see fit. To add a new page simply reference it within brackets, e.g.: &lt;span&gt;[SamplePage]&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;The wiki uses &lt;a class="" href="/p/crawlstat/wiki/markdown_syntax/"&gt;Markdown&lt;/a&gt; syntax.&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;&lt;h6&gt;Project Members:&lt;/h6&gt;
	&lt;ul class="md-users-list"&gt;
		&lt;li&gt;&lt;a href="/u/hafiznaser/"&gt;hafiz naser aslam&lt;/a&gt; (admin)&lt;/li&gt;
		
	&lt;/ul&gt;&lt;br/&gt;
&lt;p&gt;&lt;span class="download-button-5bc9409be3960153bbe09a8f" style="margin-bottom: 1em; display: block;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">hafiz naser aslam</dc:creator><pubDate>Fri, 19 Oct 2018 02:25:31 -0000</pubDate><guid>https://sourceforge.net151265c0fb43daaaccec469a7babdcf2b7d81676</guid></item></channel></rss>