Menu

APFS not recognized

2023-08-27
2023-10-18
  • Robert Simpson

    Robert Simpson - 2023-08-27

    I've got a fairly substantial APFS file (compressed to 15gb in 7z) that by default isn't recognized by 7-zip as an APFS file. What's the best way to send you this file?

    I debugged it down to two places in the APFS handler code:

    The first starting at line 2823 (your comment about "is this possible" in the code? The answer is yes, it's possible. Uncommenting the continue gets me past this issue and onto the next issue.

          if (type == APFS_TYPE_DSTREAM_ID)
          {
            PRF(printf("  DSTREAM_ID"));
            if (pair.Key.Size() != 8)
              return S_FALSE;
            // j_dstream_id_val_t
            if (pair.Val.Size() != 4)
              return S_FALSE;
            const UInt32 refcnt = Get32(pair.Val);
    
            // The data stream record can be deleted when its reference count reaches zero.
            PRF(printf("  refcnt = %d", (unsigned)refcnt));
    
            if (vol.NodeIDs.IsEmpty())
              return S_FALSE;
    
            if (vol.NodeIDs.Back() != id)
            {
              // is it possible ?
               // continue; <----------- yes apparently it's possible!
              return S_FALSE;
            }
    
            CNode &inode = vol.Nodes.Back();
    
            if (inode.refcnt_defined)
              return S_FALSE;
    
            inode.refcnt = refcnt;
            inode.refcnt_defined = true;
            if (inode.refcnt != (UInt32)inode.nlink)
            {
              PRF(printf(" refcnt != nlink"));
              // is it possible ?
              // return S_FALSE;
            }
            continue;
          }
    

    The next issue is here in Open2() around line 2382:

        for (unsigned i = 0; i < sb.max_file_systems; i++)
        {
          const oid_t oid = sb2.fs_oid[i];
          if (oid == 0)
            continue;
          // for (unsigned k = 0; k < 1; k++) // for debug
          RINOK(OpenVolume(omap, oid))  <------- right here 
          const unsigned a = Vols.Back().Refs.Size();
          numRefs += a;
          if (numRefs < a)
            return S_FALSE; // overflow
        }
    

    sb.max_file_systems is 100, and apparently once i gets to 3, we come to OpenVolume around lines 2563:

        const int index = map.Omap.FindKey(apfs.root_tree_oid);
        if (index == -1)
          return S_FALSE;
    

    Here index is -1, and it returns false, so the entire file is deemed unrecognizable as an APFS despite there being multiple volumes that parse OK.

    If I allow the code to continue past the loop and ignore the S_FALSE error return, the operations complete and I can actually parse and receive the contents of the APFS file.

    The APFS handler seems to decide that any errors at all, even minor ones, result in a "not an archive" error result and you get nothing.

     
    • Sam Tansy

      Sam Tansy - 2023-09-08

      Can you make of find "Minimal reproducible example" of this issue?

       
      • Robert Simpson

        Robert Simpson - 2023-09-08

        If you have VMWare, just run a Mac image, create a small volume, add some files to it, then delete a few files, and then add some more. Just enough to scramble the volume's btree a bit. If you want a real challenge, make a snapshot and then update some files. Then you'll have multiple INODE's with the same OID and different XID's.

        One of the fundamental problems is that the APFS filesystem isn't supposed to be read top-down from the start of the btree to the end. It's a jumpscotch thing. You start at the two known INODE's (2 and 3), read their DIR_REC entries, recurse in for directories, and for non-directories, you read each INODE and its props.
        Non-directory INODE's have a private_id which if its not the same as the INODE's OID, points to the OID where you'll find DSTREAM file extents.

        There are also 4 kinds of compression algorithms, Z, LZVN, LZFSE, LZBITMAP if you find a decmpfs attribute record.
        Those attribute records can also point to DSTREAM's externally if they're too big.

        Also in addition to symlink attribute records, you need to look for DREC_EXT_TYPE_SIBLING_ID records on a DIR_REC entry, which tells you that it's a hardlink. For example, a DIR_REC entry might name a file "123.png" and point to OID 9987, but INode 9987's filename is "generic_image.png", and you find that several other DIR_REC entries in other directories also point to that same INODE 9987 and have filenames "789.png" and "456.png".

        Without writing too much pseudo-code, a recursion loop for reading the entire structure into a CObjectVector of Inode entries would look like this:

        CEntry(const ApfsInode& other, const ApfsVol* vol,
            const unsigned int parentIndex, const char *actualName) : ApfsINode(other);
        
        CObjectVector<CEntry> Items;
        
        void CHandler::RecurseAddItems(ApfsVol* vol, const UInt64 oidDirectory, const unsigned int parentVectorIndex = UINT32_MAX)
        {
            ApfsDir dir(vol);
            ApfsInode node;
        
            if (!dir.GetInode(node, oidDirectory))
                return;
        
            CEntry entry(node, vol, parentVectorIndex, NULL);
            unsigned int nodeindex = Items.Add(entry);
        
            CObjectVector<ApfsDirRec> items;
        
            if (!dir.Enumerate(items, oidDirectory))
                return;
        
            for (unsigned int n = 0; n < items.Size(); ++n)
            {
                const ApfsDirRec& item = items[n];
        
                if (item.flags != DT_DIR))
                {
                    ApfsInode subnode;
        
                    if (!dir.GetInode(subnode, item.file_id))
                        continue;
        
                    CEntry subentry(subnode, vol, nodeindex, items.has_sibling_id ? item.name : NULL);
                    Items.Add(subentry);
                }
                else
                {
                    RecurseAddItems(vol, item.file_id, nodeindex);
                }
            }
        }
        
        void CHandler::LoadData()
        {
            Items.Clear();
        
            for (int n = 0; n < NX_MAX_FILE_SYSTEMS; ++n)
            {
                ApfsVol* vol = m_container->GetVolume(n);
                if (!vol)
                    continue;
        
                RecurseAddItems(vol, ROOT_DIR_INO_NUM);
                RecurseAddItems(vol, PRIV_DIR_INO_NUM);
            }
        }
        

        For the btree itself, as you're reading it, you'll take the OID with the highest XID.

         
  • Robert Simpson

    Robert Simpson - 2023-08-27

    Around line 1939:

        /* Object map B-trees are sorted by object identifier and then by transaction identifier
           but it's possible to have identical Ids in map ?
           do we need to look transaction id ?
           and search key with largest transaction id? */
        if (key.oid <= prev.oid)
          return false; <-------- this is returning false for index 3
        prev = key;
        Keys.Add(key.oid);
        Vals.Add(val);
    

    We have two matching oid's in here with different values, which causes the loop to end prematurely, which is why volume index 3 in the loop returns -1 and fails. I was missing about 400k items out of the APFS file because of this.

    Line 2542:

      ReadObjectMap(apfs.omap_oid, &vol, map.Omap);
    

    Doesn't check the return value so it didn't detect the early abort.

     
  • Igor Pavlov

    Igor Pavlov - 2023-08-29

    I don't want to download big archive file.
    So now I'm not ready to debug or change that code.

     
  • Robert Simpson

    Robert Simpson - 2023-08-29

    What size APFS file could I construct that'd be acceptable?

    The APFS spec definitely talks about multiple transaction ID's assigned to a single OID, and to take the highest one under normal circumstances.

     
  • Robert Simpson

    Robert Simpson - 2023-08-30

    So ... making some progress, but hit another barrier. This time, when decompressing resource forks.
    For method = 8, we hit the LzfseDecoder, and the decompression loop exits properly, but the final 7 extra byte check fails on line 244:

    On entry, unpackSize = 2396, packSize = 1622
    On exiting the loop and checking afterward, unpackSize = 0 and packSize = 63

    On exiting, the unpackSize remainder is different for each file, but consistently not 7.

      // LZVN encoder writes 7 additional zero bytes
      if (packSize != 7)
        return S_FALSE;
      do
      {
        Byte b;
        if (!m_InStream.ReadByte(b))
          return S_FALSE;
        packSize--;
        if (b != 0)
          return S_FALSE;
      }
      while (packSize != 0);
    
      return S_OK;
    
     
    • Igor Pavlov

      Igor Pavlov - 2023-08-30

      I suppose I have changed that code already after v23.01.

       
  • Robert Simpson

    Robert Simpson - 2023-08-30

    The APFS file I'm testing, it looks like a hot mess. I have DIR_REC entries before INODE's, FILE_EXTENT records and DSTREAM's with no INODE, INODE's with a private_id both less than, equal to, and greater than the current ID ... it's just crazy.
    I also have FILE_EXTENT records with valid length and type, but a position of 0.

     
  • Igor Pavlov

    Igor Pavlov - 2023-08-30

    what way that apfs was created?
    what software was used?

     
  • Robert Simpson

    Robert Simpson - 2023-08-30

    I've actually had a big breakthrough just now ... All this time I've been opening the file through the VMDK, and operating on the file within 7z through the VMDK handler. I extracted out the APFS file instead just a few minutes ago, and suddenly things are looking much better and I'm not getting the failures I was seeing before.

     
  • Igor Pavlov

    Igor Pavlov - 2023-08-30

    Did you use 7-zip to extract from vmdk?

     
  • Robert Simpson

    Robert Simpson - 2023-08-30

    Yes.

     
  • Robert Simpson

    Robert Simpson - 2023-08-30

    grrr ... may be a red-herring ... still digging. I had a defect in my code where it wasn't asking for streams until it got to subarchives, and the APFS file has no subarchives.

     
  • Robert Simpson

    Robert Simpson - 2023-08-30

    Bah, still a hot mess. I'll be less spammy in the future.

     
  • Robert Simpson

    Robert Simpson - 2023-08-30

    Still a hot mess. Attached are my code changes so far, which I'm sure still aren't right. Also included a log from running that APFS file using SHOW_DEBUG_INFO

     
  • Robert Simpson

    Robert Simpson - 2023-09-02

    I found a reference implementation of APFS here: https://github.com/Paragon-Software-Group/paragon_apfs_sdk_ce

    It's pretty clean, and seems to work really well. If anything, it's answered some questions where the Apple APFS documentation is open to interpretation.

     

Log in to post a comment.