Menu

HebrewDeflector technical specification

Sergey A. Tachenov

The pattern recognition algorithm

Okay, here's the algorithm:

  1. Locate all possible combinations of the root letters. The rest will be the full pattern.
  2. Locate all possible combinations of prepositions, the article and endings. The rest will be the core pattern.
  3. Check the core pattern against the list of all known patterns. If not found any matches for all known patterns, assume unknown pattern.

Locating the root

  1. Take all possible combinations of two, three and four letters.
  2. Check each combination against all replacement and dropping rules. If any such rules can be applied, add possible roots to the list. Four letter roots only get replacements. Three letter roots get replacements and, maybe, droppings (just in case they were originally four letter roots). Two letter roots only get additional letters added, and then two letter combinations are removed from the list of possible roots.

At this point, the 11+11 rule can be applied. To make things simpler, it should be the 13+9 rule, so we don't have to deal with hitpael/nitpael/hitpual tet or dalet swapping at this point.

It is also possible at this point to filter out the "impossible" patterns. For example, it is quite rare to have a pattern letter between root letters (except the aforementioned swapping in hitpael-ish binyans).

Stripping the external morphemes

At this stage we're simply removing everything that looks like a preposition, the article or an ending. We should check all possible combinations.


MongoDB Logo MongoDB