Porting – Adding Support for Python 3

After you modernize your C extension to use the latest features available in Python 2, it is time to address the differences between Python 2 and 3.

The recommended way to port is keeping single-source compatibility between Python 2 and 3, until support Python 2 can be safely dropped. For Python code, you can use libraries like six and future, and, failing that, if sys.version_info >= (3, 0): blocks for conditional code. For C, the py3c library provides common tools, and for special cases you can use conditional compilation with #if IS_PY3.

To start using py3c, #include <py3c.h>, and instruct your compiler to find the header.

The Bytes/Unicode split

The most painful change for extension authors is the bytes/unicode split: unlike Python 2’s str or C’s char*, there is a sharp divide between human-readable strings and binary data. You will need to decide, for each string value you use, which of these two types you want.

Make the division as sharp as possible: mixing the types tends to lead to utter chaos. Function that takes both Unicode strings and bytes should be rare, and should generally be convenience functions in your interface; not code deep in the internals.

With py3c, the human-readable strings are PyStr_* (PyStr_FromString, PyStr_Type, PyStr_Check, etc.). They correspond to PyString on Python 2, and PyUnicode on Python 3. The supported API is the intersection of PyString_* and PyUnicode_*, except PyStr_Size (see below) and the deprecated PyUnicode_Encode; additionally PyStr_AsUTF8String is defined.

For binary data, use PyBytes_* (PyBytes_FromString, PyBytes_Type, PyBytes_Check, etc.). These correspond to PyString on Python 2, and Python 3 provides them directly. The supported API is the intersection of PyString_* and PyBytes_*,

Porting mostly consists of replacing “PyString_” to either “PyStr_” or “PyBytes_”; just see the caveat about size below.

You might meet two more string types. One is PyUnicode_*, which is provided by both Python versions directly, and should be used wherever you used PyUnicode in Python 2 code already. The other is PyString_*, the Python 2 type used to store both kinds of stringy data. This type is not in Python 3, and must be replaced.

To summarize:

String kind py2 py3 Use
PyStr_* PyString_* PyUnicode_* Human-readable text
PyBytes_* PyString_* Binary data
PyUnicode_* Unicode strings
PyString_* error In unported code

String size

When dealing with Unicode strings, the concept of “size” is tricky, since the number of characters doesn’t necessarily correspond to the number of bytes in the string’s UTF-8 representation.

To prevent subtle errors, this library does not provide a PyStr_Size function.

Instead, use PyStr_AsUTF8AndSize. This functions like Python 3’s PyUnicode_AsUTF8AndSize, except under Python 2, the string is not encoded (as it should already be in UTF-8), the size pointer must not be NULL, and the size may be stored even if an error occurs.

Ints

While string type is split in Python 3, the int is just the opposite: int and long were unified. PyInt_* is gone and only PyLong_* remains (and, to confuse things further, PyLong is named “int” in Python code). The py3c headers alias PyInt to PyLong, so if you’re using them, there’s no need to change at this point.

Argument Parsing

The format codes for argument-parsing functions of the PyArg_Parse family have changed somewhat.

In Python 3, the s, z, es, es# and U (plus the new C) codes accept only Unicode strings, while c and S only accept bytes.

Formats accepting Unicode strings usually encode to char* using UTF-8. Specifically, these are s, s*, s#, z, z*, z#, and also es, et, es#, and et# when the encoding argument is set to NULL. In Python 2, the default encoding was used instead.

There is no variant of z for bytes, which means htere’s no built-in way to accept “bytes or NULL” as a char*. If you need this, write an O& converter.

Python 2 lacks an y code, which, in Python 3, works on byte objects. The use cases needing bytes in Python 3 and str in Python 2 should be rare; if needed, use #ifdef IS_PY3 to select a compatible PyArg_Parse call.

Compare the Python 2 and Python 3 docs for full details.

Module initialization

The module creation process was overhauled in Python 3. py3c provides a compatibility wrapper so most of the Python 3 syntax can be used.

PyModuleDef and PyModule_Create

Module object creation with py3c is the same as in Python 3.

First, create a PyModuleDef structure:

static struct PyModuleDef moduledef = {
    PyModuleDef_HEAD_INIT,
    .m_name = "spam",
    .m_doc = PyDoc_STR("Python wrapper for the spam submodule."),
    .m_size = -1,
    .m_methods = spam_methods,
};

Then, where a Python 2 module would have

m = Py_InitModule3("spam", spam_methods, "Python wrapper ...");

use instead

m = PyModule_Create(&moduledef);

For m_size, use -1. (If you are sure the module supports multiple subinterpreters, you can use 0, but this is tricky to achieve portably.) Additional members of the PyModuleDef structure are not accepted under Python 2.

See Python documentation for details on PyModuleDef and PyModule_Create.

Module creation entrypoint

Instead of the void init<name> function in Python 2, or a Python3-style PyObject *PyInit_<name> function, use the MODULE_INIT_FUNC macro to define an initialization function, and return the created module from it:

MODULE_INIT_FUNC(name)
{
    ...
    m = PyModule_Create(&moduledef);
    ...
    if (error) {
        return NULL;
    }
    ...
    return m;
}

Other changes

If you find a case where py3c doesn’t help, use #if IS_PY3 to include code for only one or the other Python version. And if your think others might have the same problem, consider contributing a macro and docs to py3c!

Building

When building your extension, note that Python 3.2 introduced ABI version tags (PEP 3149), which can be added to shared library filenames to ensure that the library is loaded with the correct Python version. For example, instead of foo.so, the shared library for the extension module foo might be named foo.cpython-33m.so.

Your buildsystem might generate these for you already, but if you need to modify it, you can get the tags from systonfig:

>>> import sysconfig
>>> sysconfig.get_config_var('EXT_SUFFIX')
'.cpython-34m.so'
>>> sysconfig.get_config_var('SOABI')
'cpython-34m'

This is completely optional; the old filenames without ABI tags are still valid.

Done!

Do your tests now pass under both Python 2 and 3? (And do you have enough tests?) Then you’re done porting!

Once you decide to drop compatibility with Python 2, you can move to the Cleanup section.