What are the main topics that distinguish an advanced developer from a just effective enough python programmer? You are good at python when what you code is elegantly simple and idiomatic.

Each language and community has its own way of resolving certain kind of problem. That specific way of doing things, is what we call idiomatic. We want our code to be idiomatic because not only we will be writing code that is easier to understand but also we are resolving problems using well known and tested techniques. Being idiomatic is to create simple code that relies on existing solution for normal problems. We don't reinvent the wheel.

In this post I will describe the main topics that can make your code more idiomatic, and some advanced functionalities you need to be familiar as an advanced python developer.

Multiple python versions

There will be situations where you need multiple versions of python. You may be just fine using the default python 2 or 3 of your system. But there are situation when some client/project requires a very specific version. You may also need to work in different projects which any of them may use different specific versions. In this scenario you need a way to manage your python versions. And this is not the same than managing dependency versions. I'm talking about the python language version itself.

The solution to this problem is very simple, just use pyenv. With it, you will be able to have any version you want at your disposal, very easy.

$ pyenv versions # lists all installed versions
$ pyenv install 3.7.4 # installs specific verion
$ pyenv global 3.7.4 # activates the specific version
$ pyenv local 3.7.4 # version for a directory

The pyenv also installs development headers that you will need when making c/c++ extensions. But you shouldn't worry about the exact path. CMake's find_package is going to help you with that.

find_package(Python3 COMPONENTS Development)
target_include_directories(<project_name>
    PUBLIC ${Python3_INCLUDE_DIRS})

Dunder methods

Dunder or magic method, are methods that start and end with double _ like __init__ or __str__. This kind of methods are the mechanism we use to interact directly with python's data model. A language data model describes:

  • How Values are stored in memory.
  • Object identity, equality and truthiness.
  • Name resolution, function/method dispatching.
  • Basic types, type/value composition.
  • Evaluation order, eagerness/laziness.

Basically the __ methods allow us to interact with core concepts of the python language. You can see them also as a mechanism of implementing behaviours, interface methods.

For a detailed description of many useful dunders and related concepts I recommend you to read this guide.

@ Function decorators

Decorators are nothing more than an special case of Higher Order functions with @ syntax support. It's quite equivalent to doing function composition. We can use decorator not only for normal functions but also for class methods.

from functools import wraps

def add10(f):
    @wraps(f)
    def g(*args, **kwargs):
        return f(*args, **kwargs) + 10
    return g

@add10
def add1(a):
    return a + 1

p.add1(0)  # 11                                                           

wraps from fuctools is in itself another decorator that keeps the metadata from the original wrapped function.

Interfaces

Interfaces help us to enforce the implementation of certain characteristics by other code that commits in doing so.

One of the characteristic we can enforce is the definition of specific methods, like we would do defining a normal java interface:

from abc import ABC, abstractclassmethod

class Animal(ABC):
    @abstractclassmethod
    def make_sound(self):
        return "indistinguishable noise"

class Cat(Animal):
    def make_sound(self):
        return "miauu"

class Dog(Animal):
    def make_something(self):
        return "eat"

We used the @abstractclassmethod decorator to enforce the definition of specific method in child classes:

>>> Cat().make_sound()
'miauu'
>>> Dog()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Dog with abstract methods make_sound

But the previous enforcement required us to try to create an instance in the first place. If we would like to be even more strict, we could use metaclass to make the script fail while loading the class definition.

class Animal(type):
    def __new__(cls, name, bases, body):
        if 'make_sound' not in body:
            raise TypeError('no make_sound method')
        return super().__new__(cls, name, bases, body)

class Cat(metaclass=Animal):
    def make_sound(self):
        return "miauu"

class Dog(metaclass=Animal):
    def make_something(self):
        return "eat"
 
Traceback (most recent call last):
  [...] class Dog(metaclass=Animal):
TypeError: no make_sound method

In the same way that when we instantiate a class we create an object, when we instantiate a Meta-class we create a class. Meta Classes are a way of controlling the creation of classes. This example also indirectly shows that the __new__ dunder is the responsible of creating the instance while __init__ initialized the instance previously created by __new__.

Since python 3.6 instead of having the __new__ method inside a metaclass we can just use __init_subclass__ instead. Then our interface example would be the following:

from _collections_abc import _check_methods

class Animal():
    def __init_subclass__(cls, *args, **kwargs):
        if _check_methods(cls, 'make_sound') is NotImplemented:
            raise TypeError("make_sound not implemented")

class Cat(Animal):
    [...] # same than in previous example

class Dog(Animal):
    [...] # same than in previous example

If you use _check_methods you will have extra style points.

Context manager

Context manager, or in mundane words: classes with __enter__ and __exit__ methods. Context managers give us support for the RAII pattern through the with syntax. An important thing to remember is that when implementing __exit__ you should check the exception values, because you have the choice to propagate or not the exception that happened inside with. If you return a true value you can suppress the exception. But under no circumstance you are expected to re-raise an exception inside the __exit__ method.

For example, let suppose we had nothing better to do than to use the low level http.client library; we could wrap HTTPConnection inside a context manager:

from http.client import HTTPConnection
from contextlib import AbstractContextManager # not really necessary but looks cool

class Conn(AbstractContextManager):
    def __init__(self, host):
        self.host = host

    def __enter__(self):
        self.conn = HTTPConnection(self.host, 80)
        return self.conn

    def __exit__(self, *args):
        self.conn.close()

with Conn('example.com') as conn:
    conn.request('GET', '/')
    res = conn.getresponse()
    print(res.status, res.reason)

That code is very verbose, we can fix that using the contextlib module. I recommend you to read it's whole documentation. If we want we can use a generator instead of a full AbstractContextManager class implementation.

from http.client import HTTPConnection
from contextlib import contextmanager

@contextmanager
def Conn(host):
    conn = HTTPConnection(host, 80)
    try:
        yield conn
    finally:
        conn.close()

We also can get rid completely of the Conn class with closing:

from contextlib import closing

with closing(HTTPConnection("example.com", 80)) as conn:
    conn.request('GET', '/')
    res = conn.getresponse()
    print(res.status, res.reason)

Asynchronous programming

When we use async and await we are doing cooperative concurrency (not parallelism). You may want to check some online documentation or tutorial online if you are not familiar with those terms.

In practice we have a bunch of async functions and an event loop. And that's it. But what happens when you actually want to define a real parallel operation? What if some important client wants to have some custom crazy high performant network code? How can we create low-level parallel code that from the point of view of Python appears to be asynchronous code?

First we need a way to make a blocking code a coroutine. The following make_async function does exactly that:

import sys
import inspect
from functools import wraps
from concurrent.futures import ThreadPoolExecutor

def make_async(g):
    @wraps(g)
    async def f(*args, **kwargs):
        loop = asyncio.get_event_loop()
        return await loop.run_in_executor(
            ThreadPoolExecutor(),
            lambda: g(*args, **kwargs)
        )
    sys.modules[__name__]
    frm = inspect.stack()[1]
    mod = inspect.getmodule(frm[0])
    setattr(mod, g.__name__, f)

A very neat function right? Yehaa, but this will only truly work if the g function releases the GIL. The following code, which uses pybind11, defines a C++ module x with a send_message function that inside it releases the GIL.

#include <pybind11/pybind11.h>
#include <thread>
#include <chrono>

std::string send_message(std::string input)
{
    pybind11::gil_scoped_release release; // GIL RELEASE
    std::this_thread::sleep_for(std::chrono::seconds(1));
    return input + " done!";
}

PYBIND11_MODULE(x, m) {
    m.def("send_message", &send_message, 
          "sends something though the network");
}

The pybind11::gil_scoped_release class releases the GIL when is constructed and then acquires the GIL again at the end of the function call.

import asyncio
from x import send_message

make_async(send_message) # using the function from above

async def send(msg):
    print("sending", msg)
    result = await send_message(msg)
    print("sent", result)
    return result

loop = asyncio.get_event_loop()
to_send = [loop.create_task(send(str(i))) for i in range(3)]
loop.run_until_complete(asyncio.wait(to_send))

And the output is what you would expect:

sending 0
sending 1
sending 2
sent 0 done!
sent 1 done!
sent 2 done!

Profiling

Profiling are some of those techniques that we use when we really fucked something up. It's a great tool to know, but you will suffer trying to figure out why you are getting some esoteric crashes, or why something isn't working as it should.

For calling trees and CPU time we can use cprofile and KCacheGrind:

$ python -m cProfile -o script.profile main.py
$ pyprof2calltree -i script.profile -o script.calltree
$ kcachegrind script.calltree

But cprofile, profile and hotshot aren't that useful if we have multi-threaded code or if any bottleneck is generated by non-explicit function calls. A much more effective profiler is yappi, and it really is. You won't go back to cprofile after playing around with yappi. Don't take my word for it, you can see that the PyCharm IDE uses yappi by default if you have it installed.

To use yappi we need to add some code to our script:

import yappi
yappi.start(builtins=True)

# YOUR CODE GOES HERE
# a context manager would be great for this

func_stats = yappi.get_func_stats()
func_stats.save('script.calltree', 'CALLGRIND')
yappi.stop()
yappi.clear_stats() 

After the profiling ends we can open the profiling file with:

$ kcachegrind script.calltree

Another important aspect of profiling is to record memory usage.

We can take a memory snapshot in any moment with pympler:

from pympler import muppy, summary

all_objects = muppy.get_objects()
summary.print_(summary.summarize(all_objects), limit=100)

The main features of pympler can be accessed through ClassTracker.

Network analysis

Some applications are very hard to understand and we need to start seeing them as black-boxes. Or maybe we have a very obscure problem when we send data through the network.

The most practical way of analysis would consist on you modifying your software to record every request it receives or sends, and in a perfect world you would want that feature to be able to turn on/off while in production.

But most mortals don't understand thir own systems enough nor want to go thorough that time investment. But even in that case, there are things we can do:

  • Incoming traffic: Ensure that our server receives HTTP traffic, we can do this being behind a load balancer or a reverse proxy, so that we can keep serving https. Doing this we can simply use wireshark to read the incoming traffic.
  • Outgoing traffic: We need a proxy, and if we are making HTTPS requests we need to install custom certificates, so we can me "the man in the middle". This requires the use of mitmproxy

In normal scenarios, where you understand and control the codebase, you should be logging and analysing the traffic internally without relying on the previous tricks, specially because you may won't be able to do "mitm attacks" with a production server under heavy load without slowing everything down.

Logs are your best friend...

Logging

Most of the time, logging just works and you shouldn't worry. But under heavy load we can approach logging by:

  • Don't logging at all, and only using metrics instead. Or,
  • Send the logs through the network: if log locally, that will put your server and your code under heavy load, and you may need to create code specially designed to being able to handle the logging.
  • Avoid logging to a disk that you use for something else: Don't put load to a disk that you use for other tasks.
  • If you want to write your local logs yourself, please ensure that you rotate them. You could use RotatingFileHandler, but logrotate is better.

Obviously how you approach any logging problem will depend on how often, how important, and how big are the logs. In most cases you can just log and forget that that exists.