Hi,
I wrote the following to someone who asked me about FarsiTeX development
privately. Here it is for people interested.
Forwarded message:
Hi,
Good to hear from you. I've been reading about all the FarsiTeX-related
efforts [1] on the web recently, and I'm honestly surprised by all the
energy out there, and more importantly by how none of them have
contacted the farsitex maintainers and mailing lists, at all. So, as I
said, good to hear.
[1] At least:
http://texfusion.blogfa.com/
http://farsilatex.blogfa.com/
http://platex.blogfa.com/
http://farsitex.blogfa.com/
https://sourceforge.net/projects/farsitek/
http://www.afzal.org.ir/irantex/
There's also Mr Vahedi's stuff that I don't have the link for now.
So, the basic question is, what's next for FarsiTeX, and indeed I've
been thinking about that too. I probably write about that in more
details in the coming months, but in this mail, I'll just drop some
random thoughts, in no particular order.
- The biggest problem with FarsiTeX and the Iranian TeX Community [1]
is, as I said, everyone is starting their own "lets fix farsitex"
project. In most cases, without studying what's already there. The
reason this happens, I guess, is that we don't have a common
communication channel. The FarsiTeX mailing lists were supposed to be
that channel, but for reasons out of my mind, they didn't work. Back in
the days (~2000..2005) the persiancomputing list played the role of
keeping all the interested parties informed. We need to restart that
list. With help from Pooya Karimian I have already created such a list:
https://lists.sourceforge.net/lists/listinfo/rira-persiancomputing
It may well be the case that sourceforge's arcane list management is the
reason farsitex lists didn't work. But I'm not quite sure. Anyway,
lets see if I can get you guys on that list and start talking there. I
don't mind jumping on your Google Group. The only thing that matters is
to get everyone there and make them talk about their work.
- Next problem is: which way to go? XeTeX+Arabi? ArabTeX? aleph?
Just port to LaTeX 2e? Make the input Unicode? My answer is: all of
them. Or if you prefer: flexibility. Let me explain: Second biggest
issue in the Persian TeX ecosystem is that none of these stuff ever made
it into CTAN (and I feel guilty about that too). That is in fact
probably why everyone reinvented their own system.
The rest of this message is what I would have done about FarsiTeX if I
had unlimited time and resources. I would love to see a group of people
start tackling these tasks. Before starting, make sure you have read my
BSc project report about FarsiTeX 1.0pre1. It's available from
farsitex.org's front page.
So, what are the main components that comprise a Persian TeX solution?
Lets see:
- Fonts: There are two families of solutions here: OpenType fonts and
METAPOST/Type1 fonts. For the former, FarsiWeb fonts and IranNastaliq
are pretty much all one needs. For the latter, FarsiTeX 1.0pre1 fonts
are a very decent start. With Omega one idea was to write tools to
convert from the former to the latter. I call such a solution broken by
design these days. Anyway.
- Output layer: This is what maps from the TeX engine's working
format to the font encoding (ie, glyph indices). For FarsiTeX this is
non-existent as the working encoding is the same as the font encoding.
For XeTeX-based solutions this is built into the engine, and uses ICU or
ATSUI. For aleph/omega-based solutions (like faanoos) this is a set of
font-specific OTP/OCP files. For TeX-e-Parsi this is, again, built into
the engine if any.
- Input layer: How to process the input? For FarsiTeX, it's the
external ftx2tex tool. Unicode-based solutions don't have much here.
For aleph/omega it's OTP/OCP files again. Part of the input layer that
some systems have and some don't is automatic marking of text as Persian
or non-Persian. FarsiTeX and TeX-e-Parsi do this, others don't. So you
have to explicitly write the macro to switch to Persian. That's in my
opinion one of the biggest advantages of FarsiTeX.
- Arabic shaping: This is the main difference of Arabic typesetting
from Hebrew. Different solutions perform it in different parts of the
pipeline. Some do it as part of the input layer (FarsiTeX,
TeX-e-Parsi), others do in output layer (aleph/omega, XeTeX).
- LaTeX customizations: This is making LaTeX work right-to-left, or
bidirectional, and to speak Persian. TeX-e-Parsi and FarsiTeX 1.0pre1
are IMO the most complete solutions here, while most other solutions
simply do this minimally to just get something running. I don't know
much about TeX-e-Parsi, but I'm very proud of the work I did in FarsiTeX
1.0pre1 in this regard. See Chapter 2 of my FarsiTeX thesis,
particularly Section 2-5.
- Persian-specific macros: Things like Iranian calendar, or
formatting Persian poetry. In FarsiTeX 1.0pre1 I separated the
FarsiTeX-specific bits from this, such that
texmf/tex/plain/misc/ircal.tex and texmf/tex/latex/misc/poem.sty already
work with any TeX/LaTeX system and should be the first to go into CTAN.
All they need is documentation. I'm not sure how well other solutions
do here, but I think other than TeX-e-Parsi, the others don't do much,
not portably at least.
That's it for the components. Now what concrete tasks one can work on,
and what combinations of engines and tools can be made to work? Tasks
that need to be done anyway:
- Cleanup, document, and submit FarsiTeX 1.0pre1 fonts to CTAN. I've
called the set of fonts parsi-fonts. In doing 1.0pre1 my first goal was
to keep backward compatibility in the fonts, with a long term plan to
rearrange glyphs in the font later for porting to LaTeX 2e. I don't see
much need to do that now. The only major issues I remember is 1) some
brackets/parentheses should be mirrored, and 2) Minus and dash glyphs
should be swapped. Something like that.
Bonus: Write fontforge scripts to convert the parsi-fonts fonts to
OpenType ones.
- Cleanup, document, and submit poem.sty and ircal.tex to CTAN.
- There are some macros in FarsiTeX and other systems that are simply
bugfixes to LaTeX for bidi typesetting. Submit those upstream. This is
a huge task involving making most of LaTeX packages bidi aware. But it
can start small.
- Devise a set of macros for Persian typesetting. Work out the bibel
integration and come up with a package interface and stub
implementation. This is the interface FarsiTeX users have to learn.
For example, setting footnotes right-aligned or left-aligned based on
the major direction of the footnote. The details will be filled out per
engine. (more below)
Before I continue, lemme introduce faanoos. It's a little omega-based
system I wrote between 2002 and 2004. It was pretty advanced compared
to other omega-based solutions I found. Its main feature was that it
was a transcription-based solution. You could input UTF-8
Arabic/Persian text too, but you could also write in faanoos's special
"faargilisi" dialect. Anyway, check it out:
http://behdad.org/download/faanoos-2008-01-27.tar.gz
Make sure you check out faanoos.txt, fandoc.ps, and the rest of .ps
files and their source .tex files. Faanoos was an experiment and I
think it shows how fantastic and powerful omega's OTP mechanism was/is,
and that goes beyond simple character set conversion. Where omega's OTP
mechanism failed though is the output layer. As one can see in faanoos,
I had to patch Omega's Arabic output layer in many places, simply
because it was incomplete or buggy. That kind of information really
belongs into fonts, not engines.
Per engine stuff:
- Update the ftx2tex tool to the other changes made, particularly the
new FarsiTeX macros for switching language, etc. This allows the
current FarsiTeX toolchain (ftexed -> ftx2tex -> 8bit eTeX) to continue
to work. Lets call the format of the resulting .tex file the legacy
FarsiTeX .tex format.
- Write a uftx2tex that converts from Unicode to the legacy
FarsiTeX .tex format. This tool will do Arabic shaping like ftx2tex
does, as well as automatic insertion of bidi switching directives.
- Extend XeTeX to be able to load and process omega/aleph OTP files.
- Write OTP input layer to accept FarsiTeX legacy .tex format and
convert to Unicode. This can be used by aleph/omega or XeTeX using the
above extension.
- Rip out the output layer of faanoos, and update the input layer.
- Write NFSS description for parsi-fonts and other Persian fonts.
This is mostly done in farsitex-1.0pre1.
- Write OTP output layer for parsi-fonts, to be used by aleph/omega or
XeTeX. The OpenType fonts should be usable with XeTeX without any
additional output layer.
That's the big picture. The idea is to be able to run in the following
FarsiTeX modes:
- .ftx -> ftx2tex -> 8bit eTeX (can be pdfeTeX) -> parsi-fonts
- UTF-8 -> uftx2tex -> 8bit eTeX -> parsi-fonts
- UTF-8 or faanoos or .ftx -> aleph -> parsi-fonts or OpenType fonts
- UTF-8 or faanoos or .ftx -> XeTeX -> parsi-fonts or OpenType fonts
I'll finish this mail now, hope this gives you some stuff to think
about.
behdad
[1]
http://behdad.org/download/Presentations/farsitex-slides/ftexslides.pdf
|