Vulgate-Frequency-List Code

Brought to you by: ros-languages

Tree [996e83] master / History

HTTPS access

File	Date	Author	Commit
BibliaSacraVulgatae.txt	2022-08-28	ros	[4bc4fd] First working file set. Reduces inflected frequ...
README.md	2022-08-28	ros	[5598f3] Optimized output file generation from 1745 sec ...
main.jl	2022-08-30	ros	[996e83] Fixed comment referring to old solution method
output.txt	2022-08-29	ros	[937ed4] Increased number of endings, differentiated ste...
parseLatin.jl	2022-08-29	ros	[937ed4] Increased number of endings, differentiated ste...

Read Me

Vulgate Frequency List

A common way to brush up on vocabulary is to review the most common words in a
corpus. The Vulgate is a valuable resource for this kind of study, but I found
that the only available frequency lists were

published as parts of websites rather than in a convenient format like a text
file, and
published in their inflected forms.

This project aims to improve on the state of Vulgate frequency lists by reducing
the inflected frequency list (46,132 words long) down to a base word frequency
list.

To Run

This project is written in the Julia programming
language. If you would like to generate the output file, you will need to have
Julia installed.

Once Julia is installed, run it, and then from the REPL (interactive prompt),
enter include("main.jl").

Future Work

The Latin parsing module is very basic and can only grab the most common forms
of words. It does a decent job, but struggles with more obscure forms. I would
like to see this ad-hoc parser replaced with something that can handle any word
and return the principal parts.

Acknowledgements

Vulgate text file sourced from https://github.com/ojjjo/BibliaSacraVulgatae.