Skip to content

gh-145980: Add support for alternative alphabets in the binascii module#145981

Open
serhiy-storchaka wants to merge 4 commits intopython:mainfrom
serhiy-storchaka:binascii-alphabet
Open

gh-145980: Add support for alternative alphabets in the binascii module#145981
serhiy-storchaka wants to merge 4 commits intopython:mainfrom
serhiy-storchaka:binascii-alphabet

Conversation

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Mar 15, 2026

  • Add the alphabet parameter in functions b2a_base64(), a2b_base64(), b2a_base85() and a2b_base85().
  • And a number of "*_ALPHABET" constants.
  • Remove b2a_z85() and a2b_z85().

📚 Documentation preview 📚: https://cpython-previews--145981.org.readthedocs.build/

…i module

* Add the alphabet parameter in functions b2a_base64(), a2b_base64(),
  b2a_base85() and a2b_base85().
* And a number of "*_ALPHABET" constants.
* Remove b2a_z85() and a2b_z85().

.. data:: BASE64_ALPHABET

The Base 64 alphabet according to :rfc:`4648`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a .. versionadded: next directive as well.


.. data:: UU_ALPHABET

The Uuencoding alphabet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's uuencoding for Unix-to-Unix instead of Uuencoding. I would suggest that you also link the Wikipedia page maybe?


.. data:: XX_ALPHABET

The Xxencoding alphabet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. xxencoding

Comment on lines +156 to +157
return binascii.b2a_base64(s, newline=False,
alphabet=binascii.URLSAFE_BASE64_ALPHABET)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity in the code, maybe have an _URLSAFE_BASE64_ALPHABET global variable?

Comment on lines +49 to +63
def test_constants(self):
for name in ('BASE64_ALPHABET', 'URLSAFE_BASE64_ALPHABET',
'CRYPT_ALPHABET', 'BCRYPT_ALPHABET',
'UU_ALPHABET', 'XX_ALPHABET',
'BINHEX_ALPHABET'):
value = getattr(binascii, name)
self.assertIsInstance(value, bytes)
self.assertEqual(len(value), 64)
self.assertEqual(len(set(value)), 64)
for name in ('BASE85_ALPHABET', 'ASCII85_ALPHABET',
'Z85_ALPHABET'):
value = getattr(binascii, name)
self.assertIsInstance(value, bytes)
self.assertEqual(len(value), 85)
self.assertEqual(len(set(value)), 85)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use some helper method:

def check_alphabet(self, name, size):
    with self.subTest(name=name):
        alphabet = getattr(binascii, name)
        self.assertIsInstance(value, bytes)
        self.assertEqual(len(value), size)
        self.assertEqual(len(set(value)), size)

Comment on lines +339 to +357
with self.assertRaises(TypeError):
binascii.b2a_base64(data, alphabet=None)
with self.assertRaises(TypeError):
binascii.a2b_base64(data, alphabet=None)
with self.assertRaises(TypeError):
binascii.b2a_base64(data, alphabet=alphabet.decode())
with self.assertRaises(TypeError):
binascii.a2b_base64(data, alphabet=alphabet.decode())
with self.assertRaises(TypeError):
binascii.a2b_base64(data, alphabet=bytearray(alphabet))
with self.assertRaises(ValueError):
binascii.b2a_base64(data, alphabet=alphabet[:-1])
with self.assertRaises(ValueError):
binascii.a2b_base64(data, alphabet=alphabet[:-1])
with self.assertRaises(ValueError):
binascii.b2a_base64(data, alphabet=alphabet+b'?')
with self.assertRaises(ValueError):
binascii.a2b_base64(data, alphabet=alphabet+b'?')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refactor this with a helper method so that if we need to check some validation later we don't need to update it twice.

For instance, there is a missing test case for binascii.b2a_base64(data, alphabet=bytearray(alphabet)) (only a2b_base64 is tested here)

with self.assertRaises(TypeError):
binascii.a2b_base85(data, alphabet=alphabet.decode())
with self.assertRaises(TypeError):
binascii.a2b_base85(data, alphabet=bytearray(alphabet))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

static PyObject *
get_reverse_table(binascii_state *state, PyObject *alphabet, int size, int padchar)
{
PyObject *reverse_table;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest having a goto error for cleanup in case this function grows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants