OSPC Wiki

Open Software Plagiarism Checker

Status: Beta

Brought to you by: zaczek

HowItWorks

Authors:

How it works

Each file is split up into tokens and symbols by a Tokenizer. Comments and whitespaces are ignored.

Example:

if(a < b)
{
    /* Yes! */
    printf("Yes");
}

The result would be:

Token	Description
if	Token
(	Symbol
a	Token
<	Symbol
b	Token
)	Symbol
{	Symbol
printf	Token
(	Symbol
"Yes"	Token (as quoted string)
)	Symbol
;	Symbol
}	Symbol

Then, each file is compared with all other files. Each Token is compared with all other token. Finally, the longest match are selected as the result match.

What is a match? A match is the longest chain of equal tokens, with some exceptions.

Every {max-match-distance} token must match
{min-common-token} % of token must match.

Example:

File A

if(a < b)
{
    /* Yes! */
    printf("Yes");
}

File B

if(x < y)
{
    // True
    printf("True");
}

A	B	Match
if	if	Yes
(	(	Yes
a	x	No
<	<	Yes
b	y	No
)	)	Yes
{	{	Yes
printf	printf	Yes
(	(	Yes
"Yes"	"True"	No
)	)	Yes
;	;	Yes
}	}	Yes
13	13	3

10 Token of 13 are the same, resulting in a 76.92 % similarity. It depends on your individual progarmming course, if this match count's as equal or not.

Wiki: Home

OSPC Wiki

Open Software Plagiarism Checker

HowItWorks

How it works

Example:

Example:

Related