Menu

Home

Mauro Cavalcanti

A cross-platform text analysis program written in Python which scans a whole text file (in plain text, HTML, or EPUB format) and ranks all used words according to frequency, performing a quantitative analysis of the text using Shannon-Weaver information statistic and Zipf power law function. It counts words, chars, spaces, and syllables. Also computes readability indexes (Gunning Fog, Coleman-Liau, Automated Readability Index (ARI), SMOG grade, Flesch–Kincaid grade level and Flesch Reading Ease).

Zipf's law states that the frequency of occurence of any word is approximately inversely proportional to its rank in the frequency table. When Zipf's law is applicable, plotting the frequency table on a log-log scale (i.e., log(frequency) versus log(rank order)) will typically show a linear pattern.

Shannon-Weaver information statistic gives a measure of the entropy (or the average informaton content) of the text, expressed in bits.

Gunning Fog, Coleman-Liau, Automated Readability Index, SMOG, and Flesch–Kincaid readability tests are designed to indicate comprehension difficulty when reading written materials.

Project Members: