Discussion for Home page

hafiz naser aslam — Fri, 19 Oct 2018 10:17:59 -0000

Crawlstat

There are two major portions of crawlstat code.
1 backend coding in python
2 frontend coding in php and jquery

Backend
There are multiple files written in python. the file names are self descriptive so from name we can understand what a file code do.
Path
/home/hduser/crawlstatJobs_hafiz
1 make_dump.py => this file get url and cld2_str from solr and make a dump file.
2 python.py => this files reads domains from dump5.txt and write domains in domains.txt file.
3 crawlstatJobs.sh => this file run all the crawl stat jobs
4 domains_DB.py => this code reads file from domains.txt and write domains in DB.domains table
5 sld_mapper.py & sld_reducer.py => thses codes reads sld and then write in DB
6 tld_mapper.py & tld_reducer.py => same as above
7 Language_Extraction.py => This code reads first,second and third language from dump.txt and then write in first, second and third_language.txt files respectively.
8 first, second , third_language_DB.py => these codes reads first,second and third_languages from text files and save them in DB.

Path
/home/hduser/crawlstatJobs_hafiz/fetch_phase_stats/
1 daily_job_runner.sh => this file runn the code which we need to run on daily basis
2 solrClass.py => this is class for using solr through single point. i.e. In future we can change solr path through single line of code
3 DB.py => this is class for Database. Through this file we will be able to change the db path from single line of code.
4 index_doc_info.py => this file get total documents indexed and total number of web_group and then store it into mysqli in index_doc_info table
5 language_detection_info.py => this file get cle_score and cld2_score from solr and save it to mysqli in Language_detector table
6 main.py => this file total no. of docs appeared in fetch , no.of docs successfully fetched, low urdu_contents docs, timeout and other errors from jobHistory server and then save it to DB to fetch_info table.
7 jobCounterClass.py => this is helping class use in main.py

Home modified by hafiz naser aslam

hafiz naser aslam — Fri, 19 Oct 2018 02:25:31 -0000

Welcome to your wiki!

This is the default page, edit it as you see fit. To add a new page simply reference it within brackets, e.g.: [SamplePage].

The wiki uses Markdown syntax.

Project Members:

hafiz naser aslam (admin)

Recent changes to Home

Discussion for Home page

Home modified by hafiz naser aslam

Project Members: