Menu

Tree [r1] /
 History

HTTPS access


File Date Author Commit
 rtf2pdf 2008-11-17 scriptworx [r1] Initial Check in
 LICENSE 2008-11-17 scriptworx [r1] Initial Check in
 README 2008-11-17 scriptworx [r1] Initial Check in
 rtf2pdf.tar.gz 2008-11-17 scriptworx [r1] Initial Check in
 test.rtf 2008-11-17 scriptworx [r1] Initial Check in

Read Me

* General Information
======================

This RTF2PDF converter was build by Scriptworx as an experiment and is now
put in the open source domain under the creative commons attribution 3.0 
license. 

* CGI Script
======================

This section describes the rtf2pdf.cgi CGI script.

The script runs on a CGI-enabled web server. Dependencies for the script are a
perl interpreter, pdflatex and TeX-libraries and the rtf2latex program.

After installation there might need to be modifications to the script in the
following lines:

 - 12. Supply a location where the www (or what other user the web server might
   be running as) can create files and directories.

 - 125. Supply the path to the default.cls style file relative to the location of
   the previous item.

 - 142. Edit the path of rtflatex to a "rtf2latex" (if the executable is in the
   PATH), or the correct absolute path. 

 - 150-152. The same as above and make sure the path where pdflatex redirects
   it's output to is writable if you want to read the output from the command.

* Dependencies
=======================

- Perl 5.8 or 5.10 with CPAN-modules CGI, CGI::Carp, Cwd and Digest::MD5
- pdflatex
- g++
- A webserver that supports CGI

* Parser and docstruct
========================

This section describes the files parser.cpp and docstruct. parser.cpp is a C++
source code file that parses RTF and outputs LaTeX. Parsing of the RTF-file is
done using a strategy described in

http://latex2rtf.sourceforge.net/rtfspec_45.html. 

Details on the parsing and dispatching of control words can be found in the 
source code comments.

The parser fills an abstract document representation data structure that has the
following structure:

Element
- Environment
  - Decoration: bold, italics, underlined
  - List; (un)ordered
  - Paragraph: elements followed by a blank line
- Heading: heading of a specific level
- Text: plain text

The state the parser keeps that gets modified by dispatching RTF control words
is the following:

struct State {
        bool italic; // italic text
        bool bold; // bold text
        bool underline; //underline text
        char fontsize; // font size
        int list; // identifier for the current list
        int level; // the current list level
        Destination destination; // the current destination
} default_state, current_state;

struct Content {
        Paragraph* paragraph;
        Destination destination;
        int list;
        int level;
} content;

The current_state variable maintains the information neccesary to process
upcoming content. The content variable maintains information about previous
content that is still being scanned. 

The parsing of lists is done by filling an array of 8 lists, representing the 8
levels lists can be nested in RTF. Only after the highest level of the list is
completely parsed can the list be added to the abstract document