From: dw <lim...@ya...> - 2011-02-26 23:08:02
|
So, my first proposed modification is to replace the mdb_ole_read_full routine in data.c. While the current routine does work, it has a number of inefficiencies: - Performs unnecessary memory allocation/memcpy of the Memo Field Definition - Performs (repeated) unnecessary memcpys - Performs (repeated) unnecessary reallocs (which also do memcpys) - Allocates more memory than is needed to hold the data - Uses an inconsistent memory allocator (malloc vs g_malloc) - Leaves the col->bind_ptr in an unusable state My replacement (below) does: - Zero memcpys - Zero reallocs - A single, correctly-sized allocation using g_malloc - Preserves the Memo Field Definition in col->bind_ptr - Fully commented My next proposed modification is more substantive, but this seemed like a good place to start. dw /* * mdb_ole_read_full - reads the entire OLE field * * mdb - the database pointer * col - the column to read * size - outputs the size of the buffer returned (may be NULL) * * returns - the result in a big buffer. The number of data bytes is * returned in the size parameter. The returned buffer must freed * using g_free(). * * On return, col->bind_ptr still points to the 12 byte Memo Field * Definition, NOT the data. This means the OLE field can be * re-read if necessary. * */ void* mdb_ole_read_full(MdbHandle *mdb, MdbColumn *col, size_t *size) { void *pOldBind; unsigned char *result; size_t pos, iTotSize; // What's the total length of the field? Drop off flags iTotSize = mdb_get_int32(col->bind_ptr, 0)& 0x3fffffff; // Allocate room for the entire field. result = (unsigned char *)g_malloc(iTotSize); // Save the old pointer which points to the 12 byte Memo Field Definition pOldBind = col->bind_ptr; // mdb_ole_read& mdb_ole_read_next always write to col->bind_ptr. // So we adjust it to point to our full sized buffer col->bind_ptr = result; // Reads at most 1 data page pos = mdb_ole_read(mdb, col, pOldBind, iTotSize); // Is there more to read? while (pos< iTotSize) { // Adjust col->bind_ptr so the next call will write to // the appropriate offset in the buffer col->bind_ptr = result + pos; // Read the next chunk (at most 1 data page). Passing NULL // for the 3rd parameter avoids redundant error checking pos += mdb_ole_read_next(mdb, col, NULL); } // assert pos == iTotSize if (size) *size = pos; // restore the 12 byte Memo Field Definition col->bind_ptr = pOldBind; return result; } |
From: Jakob E. <jab...@gm...> - 2011-05-13 10:30:59
|
I have a general question about OLE fields: Once I read an OLE field from the database, what do I do with it? Let's say the user put an image into a field. Then the OLE field contains the jpeg/bmp/whatever file wrapped in some strange binary format. How can I extract the actual files from this binary OLE wrapper format? Or did I misunderstand something? Does anybody have experience with this? Best regards, Jakob On 27.02.2011, at 00:07, dw wrote: > So, my first proposed modification is to replace the mdb_ole_read_full > routine in data.c. While the current routine does work, it has a number > of inefficiencies: > > - Performs unnecessary memory allocation/memcpy of the Memo Field Definition > - Performs (repeated) unnecessary memcpys > - Performs (repeated) unnecessary reallocs (which also do memcpys) > - Allocates more memory than is needed to hold the data > - Uses an inconsistent memory allocator (malloc vs g_malloc) > - Leaves the col->bind_ptr in an unusable state > > My replacement (below) does: > > - Zero memcpys > - Zero reallocs > - A single, correctly-sized allocation using g_malloc > - Preserves the Memo Field Definition in col->bind_ptr > - Fully commented > > My next proposed modification is more substantive, but this seemed like > a good place to start. > > dw > > /* > * mdb_ole_read_full - reads the entire OLE field > * > * mdb - the database pointer > * col - the column to read > * size - outputs the size of the buffer returned (may be NULL) > * > * returns - the result in a big buffer. The number of data bytes is > * returned in the size parameter. The returned buffer must freed > * using g_free(). > * > * On return, col->bind_ptr still points to the 12 byte Memo Field > * Definition, NOT the data. This means the OLE field can be > * re-read if necessary. > * > */ > void* > mdb_ole_read_full(MdbHandle *mdb, MdbColumn *col, size_t *size) > { > void *pOldBind; > unsigned char *result; > size_t pos, iTotSize; > > // What's the total length of the field? Drop off flags > iTotSize = mdb_get_int32(col->bind_ptr, 0)& 0x3fffffff; > > // Allocate room for the entire field. > result = (unsigned char *)g_malloc(iTotSize); > > // Save the old pointer which points to the 12 byte Memo Field Definition > pOldBind = col->bind_ptr; > > // mdb_ole_read& mdb_ole_read_next always write to col->bind_ptr. > // So we adjust it to point to our full sized buffer > col->bind_ptr = result; > > // Reads at most 1 data page > pos = mdb_ole_read(mdb, col, pOldBind, iTotSize); > > // Is there more to read? > while (pos< iTotSize) > { > // Adjust col->bind_ptr so the next call will write to > // the appropriate offset in the buffer > col->bind_ptr = result + pos; > > // Read the next chunk (at most 1 data page). Passing NULL > // for the 3rd parameter avoids redundant error checking > pos += mdb_ole_read_next(mdb, col, NULL); > } > > // assert pos == iTotSize > > if (size) > *size = pos; > > // restore the 12 byte Memo Field Definition > col->bind_ptr = pOldBind; > > return result; > } > > > > ------------------------------------------------------------------------------ > Free Software Download: Index, Search & Analyze Logs and other IT data in > Real-Time with Splunk. Collect, index and harness all the fast moving IT data > generated by your applications, servers and devices whether physical, virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights. http://p.sf.net/sfu/splunk-dev2dev > _______________________________________________ > mdbtools-dev mailing list > mdb...@li... > https://lists.sourceforge.net/lists/listinfo/mdbtools-dev |
From: Jakob E. <jab...@gm...> - 2011-05-17 14:25:29
|
Dear Mailing List, I have done some research on my own, and I found a very interesting blog article that describes part of the OLE field format: http://jvdveen.blogspot.com/2009/01/ole-and-accessing-files-embedded-in.html Additionally, I have done some reverse engineering of the format to find out more details. Here's a summary how data is stored in OLE fields in Access 2000 databases, partially copied from above blog article, partially my own research. HOW TO READ OLE FIELDS ====================== All testing has been done with a German version of Access 2000 If there is a file inside an OLE field, the OLE field starts with a little endian short: 0x1c15 This is the format of OLE fields: - Package header - Ole header - Data block length - Data (which can be a structured storage, but it can also be the actual file) - Sometimes a metafilepict block - Ole footer PACKAGE HEADER ============== - A Signature (short): this indicates that the file is a package. Always 0x1c15 (little endian ie. 0x15 0x1c) - The header size (short) - An object type (uint): 0 = linked, 1= embedded, 2 = either (who came up with either?) - The length of the friendly name in the header (short) - The length of the class name in the header (short) - The offset of the friendly name (short) - The offset of the class name (short) - The size of the object (int) (note: this is 0xFFFFFFFF for me) - The friendly name (string, variable length) - The class name (string, variable length) OLE HEADER ========== - The Ole version (uint) (For me this is always 0x01 0x05 0x00 0x00) - The Format (uint) (For me this is 0x02 0x00 0x00 0x00) - The object type name length (int) - The object type name (string, variable length) If the object type name from the OLE header is "Package" (and the class name from the package header is also "Package"), then the data is in a special format (I'll call that a package stream). Otherwise, the data is the file itself. Files that are included directly are Word, Excel, Bitmap. Files that are put into package streams are JPEG, PNG, TIFF, text files, etc. PACKAGE STREAM FORMAT ===================== - package stream header - packaged file data (payload) - two NULL bytes PACKAGE STREAM HEADER ===================== - some kind of signature 0x02 0x00 - The filename as a zero terminated string - The full MSDOS path to the file as a zero terminated string - four unknown bytes 0x00 0x00 0x03 0x00 - the length of the MSDOS path as a four byte little endian integer - Again the full MSDOS path to the file as a zero terminated string - the length of the payload as a four byte little endian integer Some examples: (friendly name depends on Access language) Friendly Name Class Name Ole Type Name ---------------------------------------------------------------------- Paket Package Package Dokument Word.Document.8 Word.Document.8 Bitmap Paint.Picture PBrush Arbeitsblatt Excel.Sheet.8 Excel.Sheet.8 NOTE: uint: 4 byte unsigned integer int: 4 byte integer short: 2 byte integer string: array of single byte characters, probably in the system codepage On 13.05.2011, at 12:30, Jakob Egger wrote: > I have a general question about OLE fields: Once I read an OLE field from the database, what do I do with it? Let's say the user put an image into a field. Then the OLE field contains the jpeg/bmp/whatever file wrapped in some strange binary format. How can I extract the actual files from this binary OLE wrapper format? Or did I misunderstand something? > > Does anybody have experience with this? > Best regards, > Jakob > > > On 27.02.2011, at 00:07, dw wrote: > >> So, my first proposed modification is to replace the mdb_ole_read_full >> routine in data.c. While the current routine does work, it has a number >> of inefficiencies: >> >> - Performs unnecessary memory allocation/memcpy of the Memo Field Definition >> - Performs (repeated) unnecessary memcpys >> - Performs (repeated) unnecessary reallocs (which also do memcpys) >> - Allocates more memory than is needed to hold the data >> - Uses an inconsistent memory allocator (malloc vs g_malloc) >> - Leaves the col->bind_ptr in an unusable state >> >> My replacement (below) does: >> >> - Zero memcpys >> - Zero reallocs >> - A single, correctly-sized allocation using g_malloc >> - Preserves the Memo Field Definition in col->bind_ptr >> - Fully commented >> >> My next proposed modification is more substantive, but this seemed like >> a good place to start. >> >> dw >> >> /* >> * mdb_ole_read_full - reads the entire OLE field >> * >> * mdb - the database pointer >> * col - the column to read >> * size - outputs the size of the buffer returned (may be NULL) >> * >> * returns - the result in a big buffer. The number of data bytes is >> * returned in the size parameter. The returned buffer must freed >> * using g_free(). >> * >> * On return, col->bind_ptr still points to the 12 byte Memo Field >> * Definition, NOT the data. This means the OLE field can be >> * re-read if necessary. >> * >> */ >> void* >> mdb_ole_read_full(MdbHandle *mdb, MdbColumn *col, size_t *size) >> { >> void *pOldBind; >> unsigned char *result; >> size_t pos, iTotSize; >> >> // What's the total length of the field? Drop off flags >> iTotSize = mdb_get_int32(col->bind_ptr, 0)& 0x3fffffff; >> >> // Allocate room for the entire field. >> result = (unsigned char *)g_malloc(iTotSize); >> >> // Save the old pointer which points to the 12 byte Memo Field Definition >> pOldBind = col->bind_ptr; >> >> // mdb_ole_read& mdb_ole_read_next always write to col->bind_ptr. >> // So we adjust it to point to our full sized buffer >> col->bind_ptr = result; >> >> // Reads at most 1 data page >> pos = mdb_ole_read(mdb, col, pOldBind, iTotSize); >> >> // Is there more to read? >> while (pos< iTotSize) >> { >> // Adjust col->bind_ptr so the next call will write to >> // the appropriate offset in the buffer >> col->bind_ptr = result + pos; >> >> // Read the next chunk (at most 1 data page). Passing NULL >> // for the 3rd parameter avoids redundant error checking >> pos += mdb_ole_read_next(mdb, col, NULL); >> } >> >> // assert pos == iTotSize >> >> if (size) >> *size = pos; >> >> // restore the 12 byte Memo Field Definition >> col->bind_ptr = pOldBind; >> >> return result; >> } >> >> >> >> ------------------------------------------------------------------------------ >> Free Software Download: Index, Search & Analyze Logs and other IT data in >> Real-Time with Splunk. Collect, index and harness all the fast moving IT data >> generated by your applications, servers and devices whether physical, virtual >> or in the cloud. Deliver compliance at lower cost and gain new business >> insights. http://p.sf.net/sfu/splunk-dev2dev >> _______________________________________________ >> mdbtools-dev mailing list >> mdb...@li... >> https://lists.sourceforge.net/lists/listinfo/mdbtools-dev > |