File | Date | Author | Commit |
---|---|---|---|
BibliaSacraVulgatae.txt | 2022-08-28 |
![]() |
[4bc4fd] First working file set. Reduces inflected frequ... |
README.md | 2022-08-28 |
![]() |
[5598f3] Optimized output file generation from 1745 sec ... |
main.jl | 2022-08-30 |
![]() |
[996e83] Fixed comment referring to old solution method |
output.txt | 2022-08-29 |
![]() |
[937ed4] Increased number of endings, differentiated ste... |
parseLatin.jl | 2022-08-29 |
![]() |
[937ed4] Increased number of endings, differentiated ste... |
A common way to brush up on vocabulary is to review the most common words in a
corpus. The Vulgate is a valuable resource for this kind of study, but I found
that the only available frequency lists were
This project aims to improve on the state of Vulgate frequency lists by reducing
the inflected frequency list (46,132 words long) down to a base word frequency
list.
This project is written in the Julia programming
language. If you would like to generate the output file, you will need to have
Julia installed.
Once Julia is installed, run it, and then from the REPL (interactive prompt),
enter include("main.jl")
.
The Latin parsing module is very basic and can only grab the most common forms
of words. It does a decent job, but struggles with more obscure forms. I would
like to see this ad-hoc parser replaced with something that can handle any word
and return the principal parts.
Vulgate text file sourced from https://github.com/ojjjo/BibliaSacraVulgatae.