Hi,
I have been using the OpenMCDF library to help with a project inspecting the contents of OLE format Office documents. Thank you for the effort in putting it together - its saved me a huge amount of time that I would otherwise have spent with invoking OLE methods!
One issue encountered is with the interpretation of the size of directory entry structures for rather old files. As per the MS link below, we have found some files where the upper 4 bytes of the stream size are not initialised to zero.
http://msdn.microsoft.com/en-us/library/dd942175.aspx
I have adapted my local copy to use the version number to determine how the size is interpreted, hopefully you may find this adapted version of DirectoryEntry.Read(...) of some use. The attached file exhibits the type of problem that can arise, we see an OutOfMemoryException when iterating entries and calling OpenElementStream.
Failing test template for attached file:
[Test]
public void OpenStreamOutOfMemoryException2()
{
var testFile = "poWEr.prelim.doc";
using (var cf = new CompoundFile(testFile))
{
var streamEntry = cf.RootStorage.GetStream("WordDocument");
Assert.DoesNotThrow(() => streamEntry.GetData());
}
}
Adapted version of DirectoryEntry.Read()
public void Read(Stream stream, int version)
{
StreamRW rw = new StreamRW(stream);
entryName = rw.ReadBytes(64);
nameLength = rw.ReadUInt16();
stgType = (StgType)rw.ReadByte();
rw.ReadByte();//Ignore color, only black tree
//stgColor = (StgColor)br.ReadByte();
leftSibling = rw.ReadInt32();
rightSibling = rw.ReadInt32();
child = rw.ReadInt32();
// Thank you to bugaccount (BugTrack id 3519554)
if (stgType == StgType.StgInvalid)
{
leftSibling = NOSTREAM;
rightSibling = NOSTREAM;
child = NOSTREAM;
}
storageCLSID = new Guid(rw.ReadBytes(16));
stateBits = rw.ReadInt32();
creationDate = rw.ReadBytes(8);
modifyDate = rw.ReadBytes(8);
startSetc = rw.ReadInt32();
if (version == 4)
size = rw.ReadInt64();
else
{
size = rw.ReadInt32();
// upper 4 bytes may not be zeroed in older version files, therefore we ignore them
// as per : http://msdn.microsoft.com/en-us/library/dd942175.aspx
rw.ReadUInt32();
}
}
Thank You very much Gary,
your bug report is very accurate.
This type of reporting helps OpenMcdf to become a better software !
Your fix and test method will be added in 1.5.5 fix release and in trunk branch ASAP.
Best Regards,
Federico
I have also run into this bug many times on older OLE file formats. I have an alternate way to fix the problem. I'm not sure the best place to submit a code change, I'm new to this process.
Here is my proposal to handle older file formats. The change is to a method in CompoundFile.cs: