scriptworx Code
Status: Beta
Brought to you by:
scriptworx
| File | Date | Author | Commit |
|---|---|---|---|
| rtf2pdf | 2008-11-17 | scriptworx | [r1] Initial Check in |
| LICENSE | 2008-11-17 | scriptworx | [r1] Initial Check in |
| README | 2008-11-17 | scriptworx | [r1] Initial Check in |
| rtf2pdf.tar.gz | 2008-11-17 | scriptworx | [r1] Initial Check in |
| test.rtf | 2008-11-17 | scriptworx | [r1] Initial Check in |
* General Information
======================
This RTF2PDF converter was build by Scriptworx as an experiment and is now
put in the open source domain under the creative commons attribution 3.0
license.
* CGI Script
======================
This section describes the rtf2pdf.cgi CGI script.
The script runs on a CGI-enabled web server. Dependencies for the script are a
perl interpreter, pdflatex and TeX-libraries and the rtf2latex program.
After installation there might need to be modifications to the script in the
following lines:
- 12. Supply a location where the www (or what other user the web server might
be running as) can create files and directories.
- 125. Supply the path to the default.cls style file relative to the location of
the previous item.
- 142. Edit the path of rtflatex to a "rtf2latex" (if the executable is in the
PATH), or the correct absolute path.
- 150-152. The same as above and make sure the path where pdflatex redirects
it's output to is writable if you want to read the output from the command.
* Dependencies
=======================
- Perl 5.8 or 5.10 with CPAN-modules CGI, CGI::Carp, Cwd and Digest::MD5
- pdflatex
- g++
- A webserver that supports CGI
* Parser and docstruct
========================
This section describes the files parser.cpp and docstruct. parser.cpp is a C++
source code file that parses RTF and outputs LaTeX. Parsing of the RTF-file is
done using a strategy described in
http://latex2rtf.sourceforge.net/rtfspec_45.html.
Details on the parsing and dispatching of control words can be found in the
source code comments.
The parser fills an abstract document representation data structure that has the
following structure:
Element
- Environment
- Decoration: bold, italics, underlined
- List; (un)ordered
- Paragraph: elements followed by a blank line
- Heading: heading of a specific level
- Text: plain text
The state the parser keeps that gets modified by dispatching RTF control words
is the following:
struct State {
bool italic; // italic text
bool bold; // bold text
bool underline; //underline text
char fontsize; // font size
int list; // identifier for the current list
int level; // the current list level
Destination destination; // the current destination
} default_state, current_state;
struct Content {
Paragraph* paragraph;
Destination destination;
int list;
int level;
} content;
The current_state variable maintains the information neccesary to process
upcoming content. The content variable maintains information about previous
content that is still being scanned.
The parsing of lists is done by filling an array of 8 lists, representing the 8
levels lists can be nested in RTF. Only after the highest level of the list is
completely parsed can the list be added to the abstract document