Menu

Tree [996e83] master /
 History

HTTPS access


File Date Author Commit
 BibliaSacraVulgatae.txt 2022-08-28 ros ros [4bc4fd] First working file set. Reduces inflected frequ...
 README.md 2022-08-28 ros ros [5598f3] Optimized output file generation from 1745 sec ...
 main.jl 2022-08-30 ros ros [996e83] Fixed comment referring to old solution method
 output.txt 2022-08-29 ros ros [937ed4] Increased number of endings, differentiated ste...
 parseLatin.jl 2022-08-29 ros ros [937ed4] Increased number of endings, differentiated ste...

Read Me

Vulgate Frequency List

A common way to brush up on vocabulary is to review the most common words in a
corpus. The Vulgate is a valuable resource for this kind of study, but I found
that the only available frequency lists were

  1. published as parts of websites rather than in a convenient format like a text
    file, and
  2. published in their inflected forms.

This project aims to improve on the state of Vulgate frequency lists by reducing
the inflected frequency list (46,132 words long) down to a base word frequency
list.

To Run

This project is written in the Julia programming
language. If you would like to generate the output file, you will need to have
Julia installed.

Once Julia is installed, run it, and then from the REPL (interactive prompt),
enter include("main.jl").

Future Work

The Latin parsing module is very basic and can only grab the most common forms
of words. It does a decent job, but struggles with more obscure forms. I would
like to see this ad-hoc parser replaced with something that can handle any word
and return the principal parts.

Acknowledgements

Vulgate text file sourced from https://github.com/ojjjo/BibliaSacraVulgatae.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.