Yes, I think that would be an excellent first step. Having a new data type that is recognized as being a string, and looks like a string within YAML when it is composed entirely of valid characters, but which can accept invalid bytes allows the representation of strings from many sources that might contain invalid byte sequences. The canonical values (not the value of the data type or its scalar representation, but the representation that YAML uses when testing for equality) might need some normalization, but deciding that is part of defining the new data type.
I do disagree with having the % sign only apply if the next two characters form a hexadecimal digit less than 0x80. My reason for this objection, though, is for unquoted scalar values, which don't have a native way to escape characters. I don't see a reason to deny them byte values below 0x80 if invalid bytes are being escaped anyways, but this is not a major limitation and I assume you have a reason.
Moreover, though, once its known that we are not talking about the existing YAML string type, but a different string type, the limitation that the value being decoded must be a valid Unicode string to avoid breaking existing applications disappears. The new string type wouldn't have that contractual obligation to fulfill. Then it's worth asking the spec writers how the new data type's scalar representation can be made more efficient or aesthetic in a future version of YAML. And I have some ideas about that:
1) To address the tag issue, all that would be needed is a way to say scalars default to the new data type rather than to YAML strings. This is something I would like for other reasons and could perhaps be done with a directive. It also opens up the possibility for others to establish their own default data type for scalars. If the override were based on the type of scalar (quoted, plain, or block), then it would also be possible to do something like the following:
Of course, in this example, it isn't that big a deal to add the !!binary tag, but that's the basic idea (if a bit crude... %DEFAULTBLOCKSCALAR is rather wordy)
My own reason for wanting such a tag would look something more like:
   Name: Great Foyer
      You are standing in the great foyer. All around you
      are white marble statues. A warm fire flickers in the
      There are doors to the north, south, and east.
That doesn't illustrate why I suggested overriding based on the type of scalar, however. The observant among you will have noticed that the . has meaning. And my actual usage is far more complex using a Wiki style markup (and not for a text game), but you get the idea. You'll also note that its meant to be a custom data type, not a standard one.
And, for something similar to your problem (assuming the utf-8 data type gets defined):
- Filename: My First File
  Action: compile
- Filename: My Second File
  Action: copy
- Filename: My%82Third File
  Action: compile
2) My second idea, another way to handle the tag issue, would be to allow one of the special characters to be assigned as a shortcut for a complete tag using something similar to a preprocessor directive again. It would look something like the following:
- Filename: My First File
  Action: compile
- Filename: @ My Second File
  Action: copy
- Filename: @ My%82Third File
  Action: compile
In my example, the %ATTYPE directive is assigning the @ sign to be a shorthand for the tag (which could also be written !!utf-8 if the !! prefix hasn't been reassigned).
3) My third idea isn't actually my own... its yours. Once it's been established that we are dealing with a data type other than the default string, there's no reason the scalar's value has to be a valid Unicode string (again, because it isn't breaking any contracts... it was always meant to have some invalid bytes). Of course, the details of how it is handled in each step of the YAML processor would still need carefullly considered, but we'd know that the data is valid on both ends of that process. It is validly encoded within the YAML stream (or file) and it a valid string as far as the application is concerned (because its type warned the application that it wasn't necessarily a valid Unicode string). That puts us a lot closer to finding something that works for everyone (well... at least more of us). In talking about how to encode such a data type, rather than a normal YAML string, I don't think you would encounter so much resistance. Even without a native escape sequence though, having a standard data type that can store a string that might contain invalid bytes will give libraries and applications a way to encode that data in a way that is still human readable.
The one big caveat to a new data type is that the application needs to know how to deal with it, but, actually, that isn't so hard. In Python, the most appropriate action for a python processor may be to return a byte string (old style string, I think its called?) if the scalar is of the new data type and a unicode string otherwise. In this case, the application using the processor would just get strings of one kind or the other. Or, a smart library writer (in Python or any other language) might expose an option so that both types are returned by the processor using the same data type as much as possible. Then, the application wouldn't need to deal with it as anything other than an invalid scalar if it didn't convert. Either way, the necessary signal is there for the application to decide what to do with it (most likely, using options in a library).
Now, for those who might wonder why not just use a binary, I think William was making an excellent point about its human readability. Why obscure the meaning of the entire message for just a few bytes? But, if your application demands that you can't lose those bytes, then you need some other alternative. I would add to this that it serves as a signal to the application and processor that the data is supposed to be a string, if they are capable of representing/handling it in that way. Otherwise, they can treat it as a binary, or whatever. The intent is known so the library and application can decide.