Jason Harrop - 2009-06-15

The docx4j project has been using diffx for a good while now to diff 2 Microsoft Word OpenXML paragraph objects, or 2 content controls (aka sdt).

It works pretty well. Aside: maybe one of the reasons there are few posts in this forum is that diffx 'just works'? :-)

That said, I thought I'd post about an interesting case that just came up.

The case is a first paragraph containing an image X, diffed against the same paragraph containing an image Y.

In other words, the user deleted image X, and replaced it with image Y.

As an end user, you'd expect the resulting diff to show the image X deleted, and the image Y inserted.

However, diffx (correctly) identifies only a couple of del: attribute values deep within the w:drawing element - see the XML output below.  (<a:blip r:link="rId4" /> didn't even change)
     
To meet user expectations, diffx could provide a way of saying: 'if there is any difference in any of the descendants of element x', treat it as if element x
had been deleted, and a new element inserted. 

Alternatively, you can handle this by post-processing, which is probably what we'll do in docx4j.

cheers .. Jason

     
    <w:p xmlns:dfx="http://www.topologi.org/2004/Diff-X"
    xmlns:del="http://www.topologi.org/2004/Diff-X/Delete"
    xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
    xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture"
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
    xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"
    xmlns:ns6="http://schemas.openxmlformats.org/schemaLibrary/2006/main"
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
        <w:r>
            <w:drawing>
                <wp:inline>
                    <wp:extent cx="819150" cy="438150" del:cx="561975"
                        del:cy="495300" />
                    <wp:effectExtent b="0" l="19050" r="0" del:r="9525"
                        t="0" />
                    <wp:docPr id="3" name="Picture 3" del:id="1"
                        del:name="Picture 1" />
                    <wp:cNvGraphicFramePr>
                        <a:graphicFrameLocks noChangeAspect="true" />
                    </wp:cNvGraphicFramePr>
                    <a:graphic>
                        <a:graphicData
                            uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
                            <pic:pic>
                                <pic:nvPicPr>
                                    <pic:cNvPr id="0" name="Picture 3"
                                        del:name="Picture 1" />
                                    <pic:cNvPicPr>
                                        <a:picLocks
                                            noChangeArrowheads="true" noChangeAspect="true" />
                                    </pic:cNvPicPr>
                                </pic:nvPicPr>
                                <pic:blipFill>
                                    <a:blip r:link="rId4" />
                                    <a:srcRect />
                                    <a:stretch>
                                        <a:fillRect />
                                    </a:stretch>
                                </pic:blipFill>
                                <pic:spPr bwMode="auto">
                                    <a:xfrm>
                                        <a:off x="0" y="0" />
                                        <a:ext cx="819150" cy="438150"
                                            del:cx="561975" del:cy="495300" />
                                    </a:xfrm>
                                    <a:prstGeom prst="rect">
                                        <a:avLst />
                                    </a:prstGeom>
                                    <a:noFill />
                                    <a:ln w="9525">
                                        <a:noFill />
                                        <a:miter lim="800000" />
                                        <a:headEnd />
                                        <a:tailEnd />
                                    </a:ln>
                                </pic:spPr>
                            </pic:pic>
                        </a:graphicData>
                    </a:graphic>
                </wp:inline>
            </w:drawing>
        </w:r>
</w:p>