Architecture Information

1 Introduction

This page describes at a moderate level the design and implementation of the Content Specification Processor, Content Specification Builder and other key components. It should be noted that while a call hierarchy is provided for each component it only includes the more important functions and doesn't include reading/writing from the database.

2 CSProcessor

2.1 Design

The design of this processor is to read in all the data first (providing only validation that will cause fatal errors), then validate everything and lastly save all of the data. The processing is done primarily through processLine and processTopic.

The processLine function will read the line, then determine whether the line is a base level tag, chapter, section or topic using regex expressions. The program then processes the line depending on what matched to a regex. It also handles checking the indentation to ensure it only increases by one level at a time. When a Chapter or Section is found, the program will create a new Level object. The processor will always keep track of the current level. Each level also must always have a parent level with the exception of the “Initial Level”. When creating a Chapter/Section an Options object will also always be created for the relevant level.

The processTopic function is called by the processLine function when the line doesn't match to any generic/chapter related regex expressions. Once called the function first creates a new SpecTopic object and then gets all of the variables in the line. From there the program determines what type of Topic is being processed and throws a few errors/warnings if certain information is processed (eg tags for a existing topic). After that, it then processes the rest of the variables as Options and creates an Options object that is associated with the topic.

2.1.1 ContentSpec Class

The ContentSpec class is used as a top level container to store the base level information such as Publican tags, ID, name, etc... It also stores the “Initial Level” that is used when processing. Another component that is stored inside of the ContentSpec class is an Array for each line of text before it has been processed. This is used to create the Pre and Post content specifications. The main functions inside this class are: isValid, saveScope and generatePostContentSpecification.

The isValid function checks that the information inside of the ContentSpec is valid first. After that, it then proceeds to call each levels isValid function, which in turn then validates the topics as well. The saveScope function operates in a similar way to the isValid function. It first saves the information specific to the base level information, then saves the data for each level and in turn each creates the new topic objects. The topic objects are added to a topic pool and are saved in one REST call once all the levels have been saved. If an error is caught while saving the data then a REST call is done to delete the information that had been saved.

The saveScope function also calls the saveDuplicatedTopics function once all the levels have been saved. The saveDuplicatedTopics function sets the database ids for the newly created topics. This has to be done after the topics are saved since no database ID's will exist until the topics that are being duplicated have been saved.

The generatePostContentSpecification is called after everything is saved. It parses through each line that was processed and removes all topic options (anything inside the []) then adds the new database ID inside of the [] brackets. After the processing is done it then writes it back to the database via a REST update call.

2.1.2 Level Class

The level class is used to store either a chapter, appendix or sections data. A level may have 0 or more levels stored inside of it. The level can have 0 or more Topics stored, however the level must have at least one level or topic inside of it to be valid. The level class has three main functions: isValid, saveLevel and saveDuplicatedTopics.

The isValid functions checks all of the basic data stored first and then recursively calls itself for all of the sub-levels. After that it then calls the isValid function for a topic. The saveLevel function saves the information by creating a ContentLevel tuple for the basic information. After that it then recursively calls itself for all sub-levels. Next it calls the saveTopic function inside of SpecTopic only for New and Cloned Topics. Cloned Topics however is first cloned using the exact original data and is modified in saveTopic. The saveTopic function will return an initialised Topic object that is then added to a pool of topics to be saved later.

The last function is the saveDuplicatedTopics function which was partially explained inside the ContentSpec class. This function loops through all the topics inside of the content specification and finds the topics that match a Duplicated Topics ID. If one is found then it strips the X or XC off the front of the ID and then adds an N. This new ID is then used to find the original topic from a mapping of all topics inside the content specification. It then sets the original topics database ID for the object.

2.1.3 SpecTopic Class

The SpecTopic class is a representation of an actual Topic however it differs slightly because it also contains information about the topic in the content specification. The SpecTopic class only has two main functions which is the isValid function and the saveTopic function.

The isValid function is more complex then either of the Scope or Level functions. It has to first check the that an ID was specified. After that it checks the ID, to find what the type of the topic is. If it's a New Topic then it will check to ensure that there is a Title, Type and an Assigned Writer. After that it checks the tags to validate that multiple Tags aren't specified for a mutually exclusive categories. If the ID is an exiting topic then it checks that the ID and Title match the database. If a Type was specified then it will also validate that against the database.

If the ID is a duplicated topic then it checks that the original Topic exists in the content specification and if it does then it matches the title. Lastly if the ID is a cloned topic then it checks that the title matches the original topics title. It then does a check to see if a description or type has been specified and if so then it generates an ignore warning. Lastly it checks the tags, to validate that multiple Tags aren't specified for a mutually exclusive categories.

The saveTopic function is only called when saving a New or Cloned Topic. This is because Existing Topic's can't be modified from a content specification and Duplicated Topics are used, to use one New Topic multiple times in a content specification. The function firstly creates the new topic entry in the database and sets the type if the topic hasn't already been created The fucntion first creates the topic object and initialises the base information. After that it then sets the assigned writer for the topic. Next it iterates through all of the Tags and adds them to the topic with the AddItem property set. Cloned topics also have to iterate through a list of tags to be removed, so that if it already exists for the cloned topic then it will be removed from the election of tags. The next step that is done is to do the same thing for source URL's. However it's not possible to remove source URL's so that step is skipped.

2.2 Call Hierarchy

->readFileData
    ->processLine
        ->getLineVariables
        ->getOptions
            ->getExtraVariables
    ->processTopic
        ->getLineVariables
        ->getOptions
            ->getExtraVariables
    ->validateContentSpec
        ->ContentSpec.isValid
            ->Level.isValid
                ->Level.isValid
                ->Topics.isValid
        ->Scope.saveScope
            ->Level.saveLevel
                ->Level.saveLevel
            ->Topics.saveTopic
            ->TopicPool.savePool
            ->Level.saveDuplicatedTopics
            ->Scope.generatePostContentSpec

3 CSBuilder

3.1 Design

The CSBuilder is the component that handles building content specifications. It differs slightly from the Skynet builder in that it has its own Injection and Relationship rules. The first step the builder does is to create a Processor object in its constructor and process the Post Content Specification from the database. This is done so that very little is stored and allows for Topic Maps to be incorporated at a later stage.

From there the buildBook function can be called. This will start building the book and return the built ZIP archive. As the name suggests the buildBookBase function builds all of the basic information using data from the processor and resources stored in the resource folder of the project. Next the builder recursively processes each topic, chapter and section. Topics are processed outside of the chapters and section as a Topic only needs to be processed once then include one or more times in a section.

The createTopicXML function cleans the title and id from the Topic XML data and then ensures its is wrapped in a section block. It then calls the addImagesToBookForNode function which searches the XML data for image tags. If one is found then it calls the addImageToBook function which will get the image from the database and add it into the images folder of the book. Once the images are processed the topic is processed for injections by calling the processInjections function. This function will process any Skynet injections while making sure that the injected topics exist inside of the content specification.

Once the injections are processed the createTopicXML function will count how many times the topic is included in the content specification. If it is included more then once then the topic is added multiple times to the book each with a unique ID and file name. Before adding the topic(s) to the book the builder will also inject relationship links. These aren't validated as they are <link> XML tags and as such it can't be validated.

Once the topics are processed then the chapters/appendices/sections are created and added to the book using the createChapters and createSectionXML functions. The chapter file names are appended with their line numbers in the content specification. This is done so that duplicate chapters names can exist in a content specification. The createSectionXML function recursively iterates through the sections in a chapter and adds them to the chapter XML file, this also includes the topics.

The last step is to check if any errors occurred and if any did then add an error chapter that details the errors that occurred.

3.2 Call Hierarchy

->buildBook
->buildBookBase
    ->createTopicXML
        ->addImagesToBookForNode
            ->addImageToBook
        ->TopicInjector.processInjections
            ->TopicInjector processCustomInjectionPoints
        ->TopicValidator.validateTopicXML
->createChapters
    ->createSectionXML
        ->createSectionXML
->buildErrorChapter

4 Web Service

The Web Service is a RESTful service that uses RESTEasy to create the service. There are two main classes for the Web Service: WeBService and BaseWebService. BaseWebService holds the functions that are available to be used by the web service. The BaseWebService class also manages the queueing of Push and Build Requests. WebService extends BaseWebService and provides the URL paths to access the web service.

After those two main classes there are the REST object files contained in the com.redhat.contentspec.service.rest package. These classes are just for marshalling data from the processor into XML or JSON data that is returned by the web service. There is also an interface class that provides all of the available Web Service requests that can be used by the RESTEasy ProxyFactory class. There is also a csprocessor-rest.jardesc to package a JAR archive that can be distributed for use with CSProcessor clients.

5 Constants Files

Instead of accessing each file to change things such as error messages, regex ID, fixed file locations, fixed host addresses, string formatting and fixed object values, this information is put inside constant files for easy access and modification. Since the Processor is divided into different modules, the constants were divided in the same manner for relevance. There are four different constants files in the Processor. These files are as follows:

BuilderConstants.java
This BuilderConstants.java includes the constants for the CSBuilder. These constants include fixed file locations, docbook types and error messages for the CSBuilder. It also includes the regex ID of objects to be replaced inside the resource XML files.
DTDConstants.java
The DTD constant contains the DTD schema as a byte string for easy comparison. This is a lot faster when validating the XML against the schema as a constant than an external recourse. It can also be changed simply when the schema's are modified.
ProcessorConstants.java
The Processor Constant file contains all constants for the CSProcessor. These constants include error messages, regex ID's and fixed object values.
CSConstants.java
This file has the response error messages for the web service.

6 Client

The client is a basic Java program and doesn't do any of the processing work. It is just an interface to interact with the web service using the REST interface. It reads in arguments from the command line, then sends a REST request to the server based on the arguments passed. It then uses JAX-B marshalling to marshal the response from the server back into Java Objects. Using the objects a response is then printed to the command line or a file. There is some error checking to ensure that the commands that are used contain valid arguments/parameters. The client will also generate different exit values based on the outcome of the commands/REST request. These exit values can be found in the Constants.java file.

Wiki: Home

Content Specification Processor Wiki