-
-
Notifications
You must be signed in to change notification settings - Fork 34.1k
Description
Bug report
After adding the ignorechars parameter for the Base64 decoder (see #144001), decoding in non-strict mode is almost equivalent to decoding with ignorechars including all characters. Except for one detail -- in non-strict mode the first valid padding stops decoding. Any following data is silently ignored. This leads to issues like #137687.
This contradicts RFC 4648, section 3.3 which only allows to ignore the pad character if it is present before the end of the encoded data.
Furthermore, such specifications MAY ignore the pad
character, "=", treating it as non-alphabet data, if it is present
before the end of the encoded data. If more than the allowed number
of pad characters is found at the end of the string (e.g., a base 64
string terminated with "==="), the excess pad characters MAY also be
ignored.
b'YW==Jj' and b'YWJ=j' should be decoded to b'abc', not to b'a' or b'ab'.
So, how are we going to fix this issue? We can simply change the behavior by default -- this may be a breaking change, but it is a bugfix, it breaks incorrect behavior. We can start long process of emitting a FutureWarning, and then changing the behavior few releases later. We can add a new option to alter the behavior and start emitting a FutureWarning by default.
In 3.15+ we can simply pass the ignorechars argument to enable RFC 4648 complaining lenient behavior. The question is about the default behavior.