Thread: [perldoc2-developers] [RFC] Platform Specification Vs. 0.1
Status: Pre-Alpha
Brought to you by:
joergen_lang
From: Joergen W. L. <joe...@gm...> - 2006-11-16 00:22:31
|
Hi everybody, yes - it took a little longer than expected - but here it is: *ta-da!* The first version of the specification for the platform is ready for your perusal. Please comment, add, edit, ask ... after all these are just a few thoughts out of my own twisted brain. Imagine what six twisted brains could make it! Hope you like it. Joergen P.S.: I will also upload it to the blog tonight. below is line number one :o) ######################################################################## perldoc 2.0 Platform specification ######################################################################## Version: 0.1 Date: November 14-16, 2006 Author: Jørgen W. Lang Email: jw...@wo... ######################################################################## NOTE ######################################################################## Since this is the first version of the specification there probably are a lot of things that should be added, changed, improved, clarified, etc. When referring to a particular part of this document, please try to also quote the line number. This way everybody should be able to easily identify the part you are talking about. Thanks! ######################################################################## ABSTRACT ######################################################################## This document describes the components of the perldoc 2.0 translation platform and repository and the workflow between its parts. ######################################################################## GENERAL CONCEPTS ######################################################################## ######################################################################## - Goals ######################################################################## The final goal is to provide complete translations of - the core documentation(1) - the documentation of the core modules(2) Once finished the translations could be made available via language-specific subomains of perldoc.perl.org, like 'fr.perldoc.perl.org' and as part of the actual perl distribution e.g. in the ./pod/ directory. In the meantime translated documents could and should be available via the platform website. This way they can be used and reviewed as soon as single documents are finished. The main focus of the platform is aimed at the translation of the documentation for the programming language Perl into other natural languages. Since Perl6 is already in the making, the platform should be ready for this. Although it is neither the primary goal nor a prerequisite, the platform might support the translation of documentation for other projects/programming languages in the future. (1) and (2) are explained under "Terms" ######################################################################## - Audience ######################################################################## - translators (translating the docs) - end-users (looking up documentation in their languages) - developers (who can help improving the platform) The term 'end-users' is utilized to differentiate between developers, (P|p)erl hackers, etc. reading the documentation and those actively involved in the creation, mainteneance and improvement of the platform itself (althought these sometimes will be the same people). ######################################################################## - Multilingual ######################################################################## Since this is a translation platform, the contents and interfaces of the website should be available in as many natural languages as possible. ######################################################################## - Framework ######################################################################## We should not reinvent the wheel. There already is a good choice of web application frameworks out there. Choosing one written in Perl might help us - and Perl at the same time. ;o) ######################################################################## - Adoption ######################################################################## The 'adoption' method could be an essential part of the project. By assigning a complete document to one physical person, this person is encouraged to make the translation a personal effort instead of feeling like an anonymous gear within the big translation machine. This method does not exclude the possibility for splitting up one document between multiple individuals. Maybe the platform should have support for this. Using po4a might help a lot. ######################################################################## - Quality ######################################################################## To ensure the best possible quality of translations a certain set of guidelines should be followed. A common glossary of terms should be used for each language. ######################################################################## COMPONENTS ######################################################################## The following key components are needed: ######################################################################## - repository ######################################################################## The repository is the storage area for documents to be translated. This could be a SVN repository, a database, a directory structure or whatever. For the moment we use an SVN repository on sourceforge to collect and store the documents to be translated and the already existing translations. This might change during the developement of the project as it might be more practical to store the documents within the database. ######################################################################## - database ######################################################################## The database is used to store information about the documents to be translated like the perl version they are based on, their translation status, timestamps, and other meta information. The database will also keep track of 'available' languages. This could mean a general table of languages that are spoken on the planet today. (It could also be used to store the documents themselves.) ######################################################################## - interface(s) ######################################################################## The platform might have several different interfaces. One for translators, one for end users and for administrators. The primary interface is web-based. Access to documents and information about them will be done via a website. The interface(s) provides the following key features: - create and manage user accounts - login/logout - overview of documents available for translation - translation status of these documents (see 'document specific status' for details) - check-out of 'vacant' documents - check-in of translated documents (marks document as 'pending') - show various forms of statistics about translation status, available documents - check-out for review of translated documents - checking in after review to mark documents as 'finished' - submit errata - give general feedback - maintain database/repository The platform could also provide secondary interfaces in the form of web services, maybe for the communication with an editor program installed on the local machine of a translator. ######################################################################## - people ######################################################################## Although msot of the processing is probably done more or less automatically there are certain parts of the workflow that involves review and steering be 'real people'. The obvious is the actual translation itself. Additionally some editorial staff might be needed to review and correct the translation. Sometimes decisions have to be made wether a certain word should be translated one way or the other or not a all. This might need one or more 'referees' of some kind. It also takes real people to review feedback and submitted errata. To support the interaction between people forums and other means of communication should be available. After all, the whole platform is powered by mutual help. ######################################################################## Design ######################################################################## Usage of the platform should be - simple - fun - cool ######################################################################## Additional features ######################################################################## Tools/Helpers/Guidelines - lists of available translation tools - like editors with .po mode - download links - scripts and programs to ease translator's lives RSS - RSS feeds for -- statistics -- news -- ? Resources - glossaries - dictionaries Mutual help - Forums - IRC-Channel? - Mailing lists Multilevel adoptions - one key person is the adopter for one document - the document could then be shared among multiple translators who take care of various parts of the doc. Multiple formats - RSS - PDF - XHTML - POD - ... Download the whole documentation for one language as one 'book'. Sponsorships - A possibility for individuals or companies to sponsor the translation of one or more particular document. These documents could have a special marker to them that identifies the sponsor to the reader. ######################################################################## Workflow ######################################################################## From the translators point of view: - (register/create account) - check for available ('vacant') documents/languages - login - pick a document to adopt - check-out the document (marks the document as 'adopted') - translate the document - check-in the document (marks the document as 'updated') - get the document reviewed From a reviewer's point of view: - (register/create account) - check for 'updated' documents - login - pick a document to review - check-out the document - review the document (remove 'fuzzy' markers) - check-in the document (no fuzzy markers mark it as 'finished') From a user's point of view: - check for documents/languages - read the document in maybe one of several available formats - leave feedback/errata ######################################################################## DETAILS ######################################################################## ######################################################################## - Database ######################################################################## The database stores the following information: - translation projects and their status (as we expect at least one more project in the future) - registered translators (and some details about them) For a project: - list of 'supported' documents (plus the neccessary details) - maybe the documents themselves - which languages the document was or is being translated to For a document: - status - meta information What else? Furthermore, this or another database will very likely contain everything that's needed to run the web application itself. ######################################################################## - Status of project ######################################################################## Meta information like maintainer, contained subprojects, statistical information, etc. For the translation or the Perl5 documentation two subprojects are the translation of the core documents and the translation of the core modules. The translation of the core documents could be further split by 'importance' of translation. ######################################################################## - Status of document ######################################################################## A document, stored in an SVN repository, a database or whereever, has a certain status attached to it. Depending on its state of translation this could be one of the following: vacant|adopted|pending|finished|abandoned vacant: the document has not yet been assigned to a translator adopted: the document has been assigned to a translator updated: a partially translated document pending: the initial translation of the document has been finished but the document has not yet been reviewed. finished: the docuement has been translated and reviewd abandoned: documents that have been adopted but haven't been worked upon for a given time will be marked as 'abandoned' before they will be put back into 'vacant' mode. This mode is to give the translator time to change the document's status to 'updated'. The translator will have to be informed by this. If she does not update the document within a given time the status will change back to 'vacant' automatically. ######################################################################## - Meta information ######################################################################## For a document certain meta information will be stored: - project this document is part of - (Perl) version the document is based upon. - time of adoption - time of update - name of adopter - document is already translated partially (implied by 'updated' flag) ######################################################################## - User registration ######################################################################## For reasons of security and consistency it is probably neccessary for users to register with the project as a translator/reviewer/... (aha - we already have different roles!) ######################################################################## - Check-out ######################################################################## A registered user will be able to assign one (or more?) documents to himself. This person will be the "adopter" for this document for a certain amount of time (the "time to live", "TTL", see "Abandoned documents"). ######################################################################## - Review ######################################################################## Probably the only way to good quality and correctness of the translations (orthography, speling, language and content) is mutual help, unless Mark Shuttleworth wants to sponsor this project. A concept of peer review similar to that of wikipedia could be used. Other users are encouraged to review and to correct. Maybe this could follow the "buddy principle" as practiced with divers. To mark a document as 'finished' it has to be reviewed by at least one person not being the translator itself. Maybe the review should involve marking the several parts of the translation as 'reviewed' (maybe using the 'fuzzy' flag?) To review a document the 'buddy' needs to checkout the document and actually read it. By re-submitting it the document will be marked as 'finished'. Using the 'fuzzy' marker avoids having to review the whole document which can be a great help, especially with lengthy and complex documents. This implies that the fuzzy marker is set by default. ######################################################################## - Abandoned documents ######################################################################## Sometimes people adopt a document but do not have the time/motivation/resources to update it. To ensure that these documents do not become "zombies" they will have a certain 'time to live' based on their length and (maybe) complexity (perlopentut is more complex than perl588delta, etc.) If an adopted document exceeds its time to live (TTL) the following could happen: - the adopter will be informed that the document has not been updated within the given TTL. She will then be given a certain time to react. - In this first stage it should not be neccessary to submit actual changes to the document but to merely 'touch' the document. This is to confirm that the translator is still willing to work on the document. This will re-initialize the time to live. Maybe with a flag that indicates that this document has one 'reminder' to it. - The second stage might require an actual update of the document. TTL keeps running, does not get reset. - If the TTL has been exceeded with no reaction on the translator's side the document will be marked as 'vacant' again. - If a partially translated document was abandoned this needs to be marked in meta information ######################################################################## - Check-in ######################################################################## The check-in or 'submit' of translated documents could be acchieved in one of the following ways: - Upload via HTML-Form - email (to a special address handling the integration of the document into the database). Using the database approach would enable us to allow check-ins of partially translated documents. ######################################################################## - Terms ######################################################################## This document uses the following terms as follows: - adoption The process of assigning a document to a particular person. - project The translation of the perl documentation (perldoc 2.0) is a project. The translation of the documentation for Catalyst is another. - subproject The translation of the core documentation is a subproject of perldoc 2.0. The translation of the documentation for the core modules is another. - core documentation (of perl) A typical installation of perl from source creates the directory 'perl-[version_number]'. Contained in this is a directory named 'pod'. All documents contained in this directory and ending in '.pod' are part of the core documentation. - core modules Modules that are installed with a typical perl installation from source by default. ######################################################################## |
From: Robert 'p. S. <rs...@47...> - 2006-11-16 16:32:14
|
Hi Joergen and everyone else listening; Joergen W. Lang said: > The first version of the specification for the platform is ready for > your perusal. > > Please comment, add, edit, ask ... after all these are just a few > thoughts out of my own twisted brain. Thanks for the effort and the (convenient, for us) explicit specification= . Here are my first thoughts. It might get a bit hard to keep all the ideas and decisions in memory. So I thought, maybe we should set up a wiki or something along that for this process? Anyway, here's my braindump: --- As I have a rather lenghty answer of comments, I'll skip the line numbers and refer to the sections instead. I will also keep it rather short, as squirremail dropped my previous mail. I hereby officially curse PHP once more. ** Section "Goals": > The final goal is to provide complete translations of > > - the core documentation(1) > - the documentation of the core modules(2) As I understand it, we actually want to be "P6-proof," and able to easily make this facility available to other projects or even programming languages. I first thought this might be premature optimization. But this more affects the categorization and meta data of the projects themselves, rather than the actual translation process. So I think it might be worth trying to make the application aware of metadata in an abstract sense, and not "Perl version numbers" themselves for example. ** Section "Multilingual" > Since this is a translation platform, the contents and interfaces of th= e > website should be available in as many natural languages as possible. We might even start a translation project for it! (just a joke) Seriously though, I18N shouldn't be a problem, independent of the used web framework. ** Section "Framework" > We should not reinvent the wheel. There already is a good choice of web > application frameworks out there. Choosing one written in Perl might > help us - and Perl at the same time. ;o) Full Ack. I'd still be happy to hear from other members what their favorite web development platforms are and why. ** Section "Adoption" > The 'adoption' method could be an essential part of the project. By > assigning a complete document to one physical person, this person is > encouraged to make the translation a personal effort instead of feeling > like an anonymous gear within the big translation machine. I'd also like to propose the role of a "language adopter," who can overse= e an entire language. The document adopters are document and language speci= fic, as I understand them. ** Section "COMPONENTS - repository" > For the moment we use an SVN repository on sourceforge to collect and > store the documents to be translated and the already existing > translations. This might change during the developement of the project > as it might be more practical to store the documents within the databas= e. ++ for the database on my side. It makes enhancements, keeping track and metadata much easier than trying to set that up above another layer. ** Section "COMPONENTS - interface(s)" > The platform might have several different interfaces. One for > translators, one for end users and for administrators. I'd think two (web) interfaces might be enough. One for project members a= nd one for project users. As the administrator privileges could be integrate= d in the general translation interface. > - submit errata > - give general feedback I'd propose the terms "comment" and "errata" for submissions. These could also be accepted from non-registered users (but with Captcha for spam prevention, of course). Furthermore, I think it might be a good idea to design these in form of a small trouble-ticket system. Where these states would be possible: comments: - unread - acknowledged (with response) errata: - new - rejected (with reason) - acknowledged (or open) - closed (with comment) A public viewable list of comments/errata per document and per language would also help to prevent to duplicate issues. ** Section "COMPONENTS - people" > Sometimes decisions have to be made wether a certain word should be > translated one way or the other or not a all. This might need one or > more 'referees' of some kind. I'd much rather prefer language-specific mailing-lists. This would also provide a public viewable archive of reasonings behind certain things, and a place for "outsiders" to address larger issues, which wouldn't fit into "comments" or "errata" submissions. ** Section "Additional Features" > - RSS feeds for > -- statistics > -- news > -- ? -- documents with newly translated parts -- documents with newly approved parts -- documents with newly abandoned parts -- comments/errata The first three could be shown in a changeset style of view. > - IRC-Channel? Of course! It has more than one advantage: It eases development of the platform, it provides a direct source for help and to drop comments, it is a nice meeting point and it generally keeps the involved parties close together. - Mailing lists As stated above, I'd recommend per language lists additional to the regular ones. ** Section "Workflow" I would like to address a general issue here: There won't be core project attendees for every language. This means that specific issues in other languages can only be handled if there are "trusted" members of the translation move. I therefore would propose a slightly more complex hierarchy of roles and privileges: - Administrators (should be clear) - Language Adopters (Can "administrate" one language. These are the primary contacts for language global issues. Can influence all documents of one language) - Document Adopters (Should be per language. Can influence one document in a specific language) - Core Translators (Can check-out, check-in, etc. These are the "trusted" translators in the project. Can approve translations.) - Wingman/Reviewer (A Core Translator assigned to another one for the purpose of QA.) - Translators (Cannot approve. Can only checkout a limited number of document "parts". Can be promoted to Core Translators if found trustworthy enough. Maybe it should take one core translator to check-in changes, and a second one to approve to ensure quality.) The Administrators would be the only language-independent roles. An adopter role can only be carried by a core translator. We might also want to allow the end-users to create accounts. So they can keep public/private notes, store bookmarks etc.) ** Section "DETAILS - Database" > What else? I would suggest to wait with details about the database schema until we agreed on what has to go inside and what structure we need. ** Section "DETAILS - Status of document" > A document, stored in an SVN repository, a database or whereever, has a > certain status attached to it. I'd rather attach states to the parts of a document, rather than the whol= e document itself. By this we can dynamically accumulate states of document= s and languages. Though we'd need different states or state indicators for documents and document parts then. How about something along these: Document states: - vacant - adopted Document part states: - not translated - in translation - translated but not approved - translated and approved (read: finished) I'd say it should be possible for a translator to check out specific part= s of a document (say, parts 15 to 23). These would then be in the "in translation" state. The limit of parts to check out should be influenced by the role (core translator or just translator) of the user. As you stated, after a specified time, depending on the number of checked out parts, this state should be revoked. The claimed parts would be "not translated" again. The revocation counter should be reset on any activity in the checked out parts, so we would need to regard a checkout as an entity by itself. Of course, if a translator has problems with a specific part, it should b= e possible for him to "drop" the part back to "not translated" but keep the others checked out. ** Section "DETAILS - Meta information" > - (Perl) version the document is based upon. Wouldn't it be more open-ended if we'd make the projects hierarchical? So there wouldn't be an explicit need for perl version number metadata. E.g. - Perl 5 - Version 5.6 - Core Documentation - Core Module Documentation - Version 5.8 - Core Documentation - Core Module Documentation - Perl 6 ... ** Section "DETAILS - User registration" > For reasons of security and consistency it is probably neccessary for > users to register with the project as a translator/reviewer/... (aha - > we already have different roles!) Of course! ** Section "DETAILS - Checkout" > A registered user will be able to assign one (or more?) documents to > himself. This person will be the "adopter" for this document for a > certain amount of time (the "time to live", "TTL", see "Abandoned > documents"). I wouldn't let the adopters expire, but rather make the adoption of something an administrative/overseeing role. I'd actually propose a primary and a secondary adopter per document and language. The secondary one could be chosen by the primary adopter himself, in case he's not available. For reasons of vacation, or too much work, for example. ** Section "DETAILS - Review" > A concept of peer review similar to that of wikipedia could be used. > Other users are encouraged to review and to correct. Maybe this could > follow the "buddy principle" as practiced with divers. I agree with the "buddy" idea, except that I find the term "wingman" way cooler. This might be a personal preference though ;) > To review a document the 'buddy' needs to checkout the document and > actually read it. By re-submitting it the document will be marked as > 'finished'. If we introduce per-document-part states, the reviewing process might even go along with the translation of certain parts of the document. When the translator stores a part of his checkout as "finished," his wingman would get it on his todo list for review. He can then either approve the translation, or reject it (with a reason). The latter drops it onto the translators todo list for his checkout again. ** Section "DETAILS - Check-in" > The check-in or 'submit' of translated documents could be acchieved in > one of the following ways: > > - Upload via HTML-Form > - email (to a special address handling the integration of the document > into the database). Well, I'd see some possibilities to check-out/check-in the document parts= . A first separation would be "online" or "offline." So I'd propose to keep the checkout state canonically online, and allow status updates by checki= ng in offline edited documents. Online translation would probably be a web interface. A saving of a document part there would also be a request for review. Offline editing would need some specified formats for export, which shoul= d be able to carry metadata. These would come to mind: * XML The metadata would be no problem. They could be edited directly, or with an editor. Maybe Herbert can write us something fancy for this in Wx ;) A client application could also check-out the todo-list of claimed items via a webservice. * POD Of course I don't mean the original POD, but something specified. For example: =3Dhead1 Translation Todo List $some_generic_information =3Dhead2 ORIGINAL[$metadata] $original_text =3Dhead2 TRANSLATION[$metadata] =3Dbegin TRANSLATION PLACEHOLDER (please erase when translation finishe= d) $original_text =3Dend =3Dhead2 ORIGINAL[$metadata] ... * po4a I don't really know po4a's metadata capabilities, so I can't comment on this. The metadata in the formats allows for an easy synchronization. The check= ed out document would contain the parts the translator has on it's translati= on todo-list. He can start translating them and occasionally check the document back in. Through placeholders like "TRANSLATION PLACEHOLDER" above, the system can just extract the finished parts and send it to the wingman for review. It can also insert those parts the wingman has alread= y rejected, with his reasonings in "=3Dbegin REJECTION REASON" blocks. This would allow for offline editing and occasional synchronisation with the online application to stay up to date. --- So, please let me know what you think of these points. gr., Robert --=20 # Robert 'phaylon' Sedlacek # Perl 5/Catalyst Developer in Hamburg, Germany { EMail =3D> ' rs...@47... ', Web =3D> ' http://474.at ' } |
From: Joergen W. L. <joe...@gm...> - 2006-11-30 13:31:02
|
Hi y'all, sorry for being so quiet. I was on tour with my band. Apparently I marked Robert's mail about the specs as "read" before I actually did read it. I am currently preparing Version 0.2 of the specs. Thanks, Robert for taking the time to review the specs! Most of the suggestions have found their way into Vs. 0.2 so I will not go into much detail here and simply post the new version here and in the Blog. I can see two issues that need to be discussed: - what platform are we going to use? - DB vs. SVN I will start seperate threads for these. In principle I'd say we should try to get a working example on the way ASAP so we have something to look at. [storing meta data in document] > * po4a > I don't really know po4a's metadata capabilities, so I can't comment on > this. po4a allows the storage of meta data in two ways: - by using certain "headers" in the form of .po comments These store the meta information about the document itself like charset, date of translation... - by using so-called "addenda" files which are stored seperate from the actual document and are merged into the finished translation before publishing. These contain informations about translators involved, changes, etc. Expect more, soon, Joergen |