The following text in html (or pdf):
A leading chocolate company recently announced that it
has succeeded in creating red chocolate. The company
_ that no food coloring was involved. Instead, they
used a __ type of cacao bean and a unique
manufacturing process.
was recognized as
A leading chocolate company recently announced that it
has succeeded in creating red chocolate. The company
that no food coloring was involved. Instead, they
used a type of cacao bean and a unique
manufacturing process.
Issue 1: The fist set of _ was converted to empty line not present in original text
Issue 2: Second set of __ was simple deleted!
Question: is there a way to preserve underscore character?
Tried to add _ along with English letters => did not work.
Attached the input screenshot and the output of the Capture2Text.
Note that in both underscore cases there are 7 and 8 underscore chars in a row.
When I tried to add _ to @ replacement I've notioced that only second underscore set was recognized but 8 _ were replaced just by two @@