Thread: [perldoc2-developers] [RFC] Platform Specification Vs. 0.1

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi everybody,

yes - it took a little longer than expected - but here it is:

*ta-da!*

The first version of the specification for the platform is ready for 
your perusal.

Please comment, add, edit, ask ... after all these are just a few 
thoughts out of my own twisted brain.

Imagine what six twisted brains could make it!

Hope you like it.

Joergen

P.S.: I will also upload it to the blog tonight.

below is line number one :o)
########################################################################
perldoc 2.0 Platform specification
########################################################################

Version:    0.1
Date:       November 14-16, 2006
Author:     Jørgen W. Lang
Email:      jw...@wo...

########################################################################
NOTE
########################################################################

Since this is the first version of the specification there probably are 
a lot of things that should be added, changed, improved, clarified, etc.

When referring to a particular part of this document, please try to also 
quote the line number. This way everybody should be able to easily 
identify the part you are talking about.

Thanks!

########################################################################
ABSTRACT
########################################################################

This document describes the components of the perldoc 2.0 translation 
platform and repository and the workflow between its parts.

########################################################################
GENERAL CONCEPTS
########################################################################

########################################################################
- Goals
########################################################################

The final goal is to provide complete translations of

- the core documentation(1)
- the documentation of the core modules(2)

Once finished the translations could be made available via 
language-specific subomains of perldoc.perl.org, like 
'fr.perldoc.perl.org' and as part of the actual perl distribution e.g. 
in the ./pod/ directory.

In the meantime translated documents could and should be available via 
the platform website. This way they can be used and reviewed as soon as 
single documents are finished.

The main focus of the platform is aimed at the translation of the 
documentation for the programming language Perl into other natural 
languages. Since Perl6 is already in the making, the platform should be 
ready for this.

Although it is neither the primary goal nor a prerequisite, the platform 
might support the translation of documentation for other 
projects/programming languages in the future.

(1) and (2) are explained under "Terms"

########################################################################
- Audience
########################################################################

- translators (translating the docs)
- end-users   (looking up documentation in their languages)
- developers  (who can help improving the platform)

The term 'end-users' is utilized to differentiate between developers, 
(P|p)erl hackers, etc. reading the documentation and those actively 
involved in the creation, mainteneance and improvement of the platform 
itself (althought these sometimes will be the same people).

########################################################################
- Multilingual
########################################################################

Since this is a translation platform, the contents and interfaces of the 
website should be available in as many natural languages as possible.

########################################################################
- Framework
########################################################################

We should not reinvent the wheel. There already is a good choice of web 
application frameworks out there. Choosing one written in Perl might 
help us - and Perl at the same time. ;o)

########################################################################
- Adoption
########################################################################

The 'adoption' method could be an essential part of the project. By 
assigning a complete document to one physical person, this person is 
encouraged to make the translation a personal effort instead of feeling 
like an anonymous gear within the big translation machine.

This method does not exclude the possibility for splitting up one 
document between multiple individuals. Maybe the platform should have 
support for this. Using po4a might help a lot.

########################################################################
- Quality
########################################################################

To ensure the best possible quality of translations a certain set of 
guidelines should be followed. A common glossary of terms should be used 
for each language.

########################################################################
COMPONENTS
########################################################################

The following key components are needed:

########################################################################
- repository
########################################################################

The repository is the storage area for documents to be translated. This 
could be a SVN repository, a database, a directory structure or whatever.

For the moment we use an SVN repository on sourceforge to collect and 
store the documents to be translated and the already existing 
translations. This might change during the developement of the project 
as it might be more practical to store the documents within the database.

########################################################################
- database
########################################################################

The database is used to store information about the documents to be 
translated like the perl version they are based on, their translation 
status, timestamps, and other meta information.

The database will also keep track of 'available' languages. This could 
mean a general table of languages that are spoken on the planet today.

(It could also be used to store the documents themselves.)

########################################################################
- interface(s)
########################################################################

The platform might have several different interfaces. One for 
translators, one for end users and for administrators.

The primary interface is web-based. Access to documents and information 
about them will be done via a website.

The interface(s) provides the following key features:

- create and manage user accounts
- login/logout
- overview of documents available for translation
- translation status of these documents
   (see 'document specific status' for details)
- check-out of 'vacant' documents
- check-in of translated documents
   (marks document as 'pending')
- show various forms of statistics about translation status,
   available documents

- check-out for review of translated documents
- checking in after review to mark documents as 'finished'

- submit errata
- give general feedback

- maintain database/repository

The platform could also provide secondary interfaces in the form of web 
services, maybe for the communication with an editor program installed 
on the local machine of a translator.

########################################################################
- people
########################################################################

Although msot of the processing is probably done more or less 
automatically there are certain parts of the workflow that involves 
review and steering be 'real people'.

The obvious is the actual translation itself. Additionally some 
editorial staff might be needed to review and correct the translation.

Sometimes decisions have to be made wether a certain word should be 
translated one way or the other or not a all. This might need one or 
more 'referees' of some kind.

It also takes real people to review feedback and submitted errata.

To support the interaction between people forums and other means of 
communication should be available. After all, the whole platform is 
powered by mutual help.

########################################################################
Design
########################################################################

Usage of the platform should be

- simple
- fun
- cool

########################################################################
Additional features
########################################################################

Tools/Helpers/Guidelines

- lists of available translation tools
   - like editors with .po mode
   - download links
   - scripts and programs to ease translator's lives

RSS

- RSS feeds for
-- statistics
-- news
-- ?

Resources

- glossaries
- dictionaries

Mutual help

- Forums
- IRC-Channel?
- Mailing lists

Multilevel adoptions

- one key person is the adopter for one document
- the document could then be shared among multiple
   translators who take care of various parts of the
   doc.

Multiple formats

- RSS
- PDF
- XHTML
- POD
- ...

Download the whole documentation for one language as one 'book'.

Sponsorships

- A possibility for individuals or companies to sponsor the translation
   of one or more particular document. These documents could have a
   special marker to them that identifies the sponsor to the reader.

########################################################################
Workflow
########################################################################

 From the translators point of view:

- (register/create account)
- check for available ('vacant') documents/languages
- login
- pick a document to adopt
- check-out the document (marks the document as 'adopted')
- translate the document
- check-in the document (marks the document as 'updated')
- get the document reviewed

 From a reviewer's point of view:

- (register/create account)
- check for 'updated' documents
- login
- pick a document to review
- check-out the document
- review the document (remove 'fuzzy' markers)
- check-in the document (no fuzzy markers mark it as 'finished')

 From a user's point of view:

- check for documents/languages
- read the document in maybe one of several available formats
- leave feedback/errata

########################################################################
DETAILS
########################################################################

########################################################################
- Database
########################################################################

The database stores the following information:

- translation projects and their status
   (as we expect at least one more project in the future)

- registered translators
   (and some details about them)

For a project:

- list of 'supported' documents
   (plus the neccessary details)

- maybe the documents themselves
- which languages the document was or is being translated to

For a document:

- status
- meta information

What else?

Furthermore, this or another database will very likely contain 
everything that's needed to run the web application itself.

########################################################################
- Status of project
########################################################################

Meta information like maintainer, contained subprojects, statistical 
information, etc.

For the translation or the Perl5 documentation two subprojects are the 
translation of the core documents and the translation of the core modules.

The translation of the core documents could be further split by 
'importance' of translation.

########################################################################
- Status of document
########################################################################

A document, stored in an SVN repository, a database or whereever, has a 
certain status attached to it. Depending on its state of translation 
this could be one of the following:

vacant|adopted|pending|finished|abandoned

vacant:       the document has not yet been assigned to a translator
adopted:      the document has been assigned to a translator
updated:      a partially translated document
pending:      the initial translation of the document has been finished
               but the document has not yet been reviewed.
finished:     the docuement has been translated and reviewd
abandoned:    documents that have been adopted but haven't been worked
               upon for a given time will be marked as 'abandoned' before
               they will be put back into 'vacant' mode. This mode is
               to give the translator time to change the document's
               status to 'updated'.

               The translator will have to be informed by this. If she
               does not update the document within a given time the
               status will change back to 'vacant' automatically.

########################################################################
- Meta information
########################################################################

For a document certain meta information will be stored:

- project this document is part of
- (Perl) version the document is based upon.
- time of adoption
- time of update
- name of adopter
- document is already translated partially
   (implied by 'updated' flag)

########################################################################
- User registration
########################################################################

For reasons of security and consistency it is probably neccessary for 
users to register with the project as a translator/reviewer/... (aha - 
we already have different roles!)

########################################################################
- Check-out
########################################################################

A registered user will be able to assign one (or more?) documents to 
himself. This person will be the "adopter" for this document for a 
certain amount of time (the "time to live", "TTL", see "Abandoned 
documents").

########################################################################
- Review
########################################################################

Probably the only way to good  quality and correctness of the 
translations (orthography, speling, language and content) is mutual 
help, unless Mark Shuttleworth wants to sponsor this project.

A concept of peer review similar to that of wikipedia could be used. 
Other users are encouraged to review and to correct. Maybe this could 
follow the "buddy principle" as practiced with divers.

To mark a document as 'finished' it has to be reviewed by at least one 
person not being the translator itself. Maybe the review should involve 
marking the several parts of the translation as 'reviewed' (maybe using 
the 'fuzzy' flag?)

To review a document the 'buddy' needs to checkout the document and 
actually read it.  By re-submitting it the document will be marked as 
'finished'.

Using the 'fuzzy' marker avoids having to review the whole document 
which can be a great help, especially with lengthy and complex 
documents. This implies that the fuzzy marker is set by default.

########################################################################
- Abandoned documents
########################################################################

Sometimes people adopt a document but do not have the 
time/motivation/resources to update it. To ensure that these documents 
do not become "zombies" they will have a certain 'time to live' based on 
their length and (maybe) complexity (perlopentut is more complex than 
perl588delta, etc.)

If an adopted document exceeds its time to live (TTL) the following 
could happen:

- the adopter will be informed that the document has not been updated
   within the given TTL. She will then be given a certain time to react.

- In this first stage it should not be neccessary to submit actual
   changes to the document but to merely 'touch' the document. This is to
   confirm that the translator is still willing to work on the document.
   This will re-initialize the time to live. Maybe with a flag that
   indicates that this document has one 'reminder' to it.

- The second stage might require an actual update of the document. TTL
   keeps running, does not get reset.

- If the TTL has been exceeded with no reaction on the translator's
   side the document will be marked as 'vacant' again.

- If a partially translated document was abandoned this needs to be
   marked in meta information

########################################################################
- Check-in
########################################################################

The check-in or 'submit' of translated documents could be acchieved in 
one of the following ways:

- Upload via HTML-Form
- email (to a special address handling the integration of the document
   into the database).

Using the database approach would enable us to allow check-ins of 
partially translated documents.

########################################################################
- Terms
########################################################################

This document uses the following terms as follows:

- adoption

The process of assigning a document to a particular person.

- project

The translation of the perl documentation (perldoc 2.0) is a project. 
The translation of the documentation for Catalyst is another.

- subproject

The translation of the core documentation is a subproject of perldoc 
2.0. The translation of the documentation for the core modules is another.

- core documentation (of perl)

A typical installation of perl from source creates the directory 
'perl-[version_number]'. Contained in this is a directory named 'pod'. 
All documents contained in this directory and ending in '.pod' are part 
of the core documentation.

- core modules

Modules that are installed with a typical perl installation from source 
by default.

########################################################################