Home
Name Modified Size InfoDownloads / Week
Readme.txt 2016-05-02 3.1 kB
Totals: 1 Item   3.1 kB 0
/***************************************************************
*
*
* First Author	: Sandeep Tuppad
* website	: https://in.linkedin.com/in/sandeep-tuppad-b4b87840
* Licence	: MIT
* contact	: sandeep.tuppad@gmail.com
*
***************************************************************/
About the software :
The software visits the user provided url's and extracts the comments(HTML style, C style single line and multi line) and writes the summary to a file. 
Among the extracted comments it will also search for user provided keywords(like password,pswd, author etc.) and writes those lines containing them to 
a file. This is useful when we want to test if any sensitive information is part of the comments. Manually visiting every page and checking for the 
comments consumes more time and this software provides a solution by automating the check.



Input Files:
a) url.xlsx
	The spread sheet contains a column(column A of sheet 1) of url's in one page and column(column A of sheet 2) of keywords in another page. The cell A[2]
	in sheet 3 specifies which kind of comments to be extracted. This file is an input for the software.


	
Generated Log Files:
a) comlog
	The file(a text file) contains the extracted comments from each url page in formatted way.
b) keysearch
	The file(a text file) contains the comments containing each keyword for each url.


	
Configuration:
a) The software searches for the comments based on the SOM(start of the message) and EOM(End of the message) specified in sheet 3 of url.xlsx file. To search for 
	1) 0: HTML style comments : SOM="<!--" and EOM="-->"
	2) 1: C style multi line comments: SOM="/*" and EOM="*/"
	3) 2: C style single line comments: SOM="//" and EOM="\r\n"
	4) In future custom option will be added where SOM and EOM can be edited to extract any other line(s) of url page not just the comments.
b) Create file named url.xlsx. Edit sheet 1, column A with the url's to be processed. Edit sheet 2, column A with the keywords to be searched for.
c) Edit the variable WHAT in cell A[2] of sheet 3 in url.xlsx to configure the type of comments you would like to extract. It's 0,1 and 2 for HTML comments,
   C style multiline comments and C style single line comments respectively.
	

	
Limitations of the software:
a) The software searches for SOM(Start of the message) and then starts searching for EOM(End of the message). Everything in between is considered as comment. So there are instances when the software 
wrongly extracts the lines as comments though they are active code.

Dependencies:
1) The software executable is built and tested on 64 bit windows 7 platform. If you like to run the HCE on windows 32 bit platform you have to rebuild the source code on that platform using necessary build tools and software dependencies installed. The source code is at the location "https://sourceforge.net/p/htmlcommentsparser/code/ci/master/tree/"


How to Run:
1) Edit the url.xlsx file as described above
2) Run the file htmlcomments.exe
3) the comlog and keysrch generated. they are text files.




Source: Readme.txt, updated 2016-05-02