Menu

Tree [5ed855] master /
 History

HTTPS access


File Date Author Commit
 dist 2014-08-18 sandeepst sandeepst [ffe3b3] updated url.xlsx under dist folder
 Readme.txt 2016-05-02 sandeep sandeep [5ed855] minor changes
 htmlcomments.py 2014-08-06 sandeepst sandeepst [b9a2f4] Initial commit
 htmlcomments.spec 2014-08-06 sandeepst sandeepst [b9a2f4] Initial commit
 keysrch.py 2014-08-06 sandeepst sandeepst [b9a2f4] Initial commit
 url.xlsx 2014-08-18 sandeepst sandeepst [9fccfc] updated url.xlsx and corrected Readme.txt's wit...
 urlread.py 2014-08-06 sandeepst sandeepst [b9a2f4] Initial commit

Read Me

/***************************************************************
*
*
* First Author	: Sandeep Tuppad
* website	: https://in.linkedin.com/in/sandeep-tuppad-b4b87840
* Licence	: MIT
* contact	: sandeep.tuppad@gmail.com
*
***************************************************************/
About the software :
The software visits the user provided url's and extracts the comments(HTML style, javascript style single line and multi line) and writes the summary to a file. 
Among the extracted comments it will also search for user provided keywords(like password,pswd, author etc.) and writes those lines containing them to 
a file. This is useful when we want to test if any sensitive information is part of the comments. Manually visiting every page and checking for the 
comments consumes more time and this software provides a solution by automating the check.



Source Files:
There are three source files written in python. 
a) htmlcomments.py
	This is a main file which uses the functions exported by the below two modules to achieve the purpose of the software. The software reads the list of 
	of url's specified in the spread sheet file. It visits a url and extracts the comments from the page and writes them to a file. Then searches the contents 
	of the file written with each keyword specified the spread sheet. If the keyword found the line containing it is written to another file. This is repeated
	for each url and the keywords specified in the spread sheet. the end result is two output log files, one containing the comments from from each url and 
	another containing the comments containing the keywords for each url.
b) urlread.py
	The file is a module which has function(s) to read the specified column from a specified spread sheet file.
c) keysrch.py
	The file is module which has function(s) to read lines from a specified file and search for the specified keyword and write the line containing it 
	to a specified file.
	

	
Input Files:
a) url.xlsx
	The spread sheet contains a column(column A of sheet 1) of url's in one page and column(column A of sheet 2) of keywords in another page. Edit the variable WHAT in cell A[2] 
	of sheet 3 to configure the type of comments you would like to extract.This file is an input for the software.


	
Generated Log Files:
a) comlog
	The file(a text file) contains the extracted comments from each url page in formatted way.
b) keysearch 
	The file(a text file) contains the comments containing each keyword for each url.


	
Configuration:
a) The software searches for the comments based on the SOM(start of the message) and EOM(End of the message) specified in sheet 3 of url.xlsx file. To search for 
	1) 0: HTML style comments : SOM="<!--" and EOM="-->"
	2) 1: javascript multi line comments: SOM="/*" and EOM="*/"
	3) 2: javascript single line comments: SOM="//" and EOM="\r\n"
	4) In future custom option will be added where SOM and EOM can be edited to extract any other line(s) of url page not just the comments.
b) Create file named url.xlsx. Edit sheet 1, column A with the url's to be processed. Edit sheet 2, column A with the keywords to be searched for.Edit the variable 
   "WHAT" in cell A[2] of sheet 3 in url.xlsx to configure the type of comments you would like to extract. It's 0,1 and 2 for HTML comments,   javascript multiline 
   comments and javascript single line comments respectively.
	

Folder structure:
1) The pyhon source files, url.xlsx files are in main folder 
2) The folder "dist" contains the executable and dependent dll amd other files generated from the python source files.

	
Limitations of the software:
a) The software searches for SOM and then starts searching for EOM. Everything in between is considered as comment. So there are instances when the software 
wrongly extracts the lines as comments though they are active code.

Dependencies:
1) The python 2.7 needs to be installed(if running the python source files).
2) python xlrd package compatible with python version 2.7 needs to be installed(if running the python source files).
3) The software is developed and tested on windows 7 platform even though it should be possible to port easily with or without very little changes to other platforms. 

How to Run:
1) Edit the url.xlsx file as described above
2) Run the file htmlcomments.exe under "dist\htmlcomments" folder 
3) Alternatively run the python main script htmlcomments.py(if the dependent software installed)
3) The output files comlog and keysrch generated.These are text files. 
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.