Menu

Split and re-serialize graph (RSS)

Help
Dan Libby
2004-12-09
2013-03-14
  • Dan Libby

    Dan Libby - 2004-12-09

    Hi, I am trying to use RAP to parse and re-serialize some custom RSS 1.0 feeds (ie, with some custom namespaces.)

    I assume that each RSS file may have N number of channels within it, and each channel N number of related items.  Each of these channels and its items should be unrelated to the other channels.

    I would like to grab all the triples for a given channel and its elements and put them into a new model for serialization.  Thus effectively splitting the multi-channel file into several single channel files. ( or even handier would be a "serialize graph from this node" function. )

    I know how to grab the relevant channel resources.  And I know that  I can recursively forward walk the graph starting from the channel.  However, I am concerned about endless loops if someone feeds me some bogus RDF.  I guess I would have to keep a list of "already seen" nodes.

    So then for each statement that I find during the walk I would add it to a new model.  When finished walking, I serialize this new model, and voila, a file specific to this channel.

    Can anyone comment on the validity and efficiency of this approach or recommend something better?  I'd rather not re-einvent the wheel. Does code for this already exist somewhere?

    regards,

    Dan Libby

     
    • Dan Libby

      Dan Libby - 2004-12-09

      In case it is useful to someone else, I have come up with the following code that seemsto do the trick.

         function copy_node_to_model( &$model, $resource ) {
         
            $seen_list = array();
           
            copy_node_to_model_worker( $model, $resource, $seen_list );
           
            $resource->setAssociatedModel( $model );
           
            return $resource;
          }
         
          function copy_node_to_model_worker( &$model, $resource, &$seen_list ) {
         
             $uri = $resource->getURI();
             if( !isset( $seen_list[$uri] ) ) {
            
                $seen_list[$uri] = 1;
            
                $statements = $resource->listProperties();
                foreach( $statements as $s ) {

                   // Duplicate the statement, but we change the model for the sub, pred, and obj.
               
                   $sub = $s->getSubject();
                   $pred = $s->getPredicate();
                   $obj = $s->getObject();
                  
                   $sub->model =& $model;
                   $pred->model =& $model;
                   $obj->model =& $model ;
               
                   $model->addWithoutDuplicates( new Statement( $sub, $pred, $obj ) );
                  
                   $obj = $s->getObject();
                   if( is_a( $obj, 'Resource' ) ) {
                      copy_node_to_model_worker( $model, $obj, $seen_list );
                   }
                }
             }
          }

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.