It turns out that using str(), list(), dict() and tuple() for creating empty sequences isn't as fast as their shorthand counterparts ('', [], {}, ()). Plotting the differences with this code gives the following graphs:

from timeit import timeit
import matplotlib.pyplot as plt
import sys

title = '{type_} creation time in Python {version}'
version = '%d.%d.%d' % sys.version_info[:3]
executions = [10**i for i in range(10)]
comparisons = (
    (str, ('s=""', 's=str()')),
    (list, ('l=[]', 'l=list()')),
    (dict, ('d={}', 'd=dict()')),
    (tuple, ('t=()', 't=tuple()')),
)

for i, (type_, statements) in enumerate(comparisons):
    plt.figure(i)
    plt.title(title.format(type_=type_.__name__, version=version))
    plt.xlabel('Number of Executions')
    plt.ylabel('Execution Duration (s)')
    plt.yscale('log')
    plt.xscale('log')
    for statement in statements:
        xaxis = [timeit(statement, number=n) for n in executions]
        plt.plot(xaxis, executions, label=statement)
    plt.legend(statements)

plt.show()

Plot of the difference between d={} and d=dict() in Python 3.6.2 drawn with matplotlib

Plot of the difference between l=[] and l=list() in Python 3.6.2 drawn with matplotlib

Plot of the difference between s='' and s=str() in Python 3.6.2 drawn with matplotlib

Plot of the difference between t=() and t=tuple() in Python 3.6.2 drawn with matplotlib

There appears to be a trend. First off the number of objects created and the time it takes seems to be a linear function. Second off; It's clear that the shorthands are always faster. Running the code multiple times gives consistent results.

Why are they faster? It's actually fairly simple, why there is a difference. We can inspect what happens with the dis module:

>>> import dis
>>> def f(): s=''
>>> dis.dis(f)
  2           0 LOAD_CONST               1 ('')
              2 STORE_FAST               0 (s)
              4 LOAD_CONST               0 (None)
              6 RETURN_VALUE

>>> def f(): s=str()
>>> dis.dis(f)
  2           0 LOAD_GLOBAL              0 (str)
              2 CALL_FUNCTION            0
              4 STORE_FAST               0 (s)
              6 LOAD_CONST               0 (None)
              8 RETURN_VALUE

>>> def f(): d={}
>>> dis.dis(f)
  2           0 BUILD_MAP                0
              2 STORE_FAST               0 (d)
              4 LOAD_CONST               0 (None)
              6 RETURN_VALUE

>>> def f(): d=dict()
>>> dis.dis(f)
  2           0 LOAD_GLOBAL              0 (dict)
              2 CALL_FUNCTION            0
              4 STORE_FAST               0 (d)
              6 LOAD_CONST               0 (None)
8 RETURN_VALUE

The dis module disassembles CPython bytecode. What you see in all caps are the opcodes that CPython executes.

So when you call str(), list(), dict() or tuple() Python has to do two things: first load the global function, second call the function (and that function then has to execute). However, when you use the shothands, Python only has to call a fast opcode like LOAD_CONST or BUILD_MAP.

What can we learn from this? LOAD_GLOBAL is slow because it first has to check for the function (in this case) in the local namespace, then in the module namespace and lastly the global (builtin) namespace. This means, that if you have some code that loads multiple globals and you have to call it hundred of thousands of times, you can try to avoid the globals before opting for Cython, C, Rust or whatever is cool tomorrow (you should check out PyPy).

Not too long ago I had a discussion with Akuli about the fastest way to compute the length (number of digits) of a natural number (f(1)=1, f(2)=1, f(101)=3). Akuli was smarter than me and suggested using log10. Here I compare how eliminating global scope affects the running time:

>>> import math
>>> from timeit import timeit
>>> def akuli(n):
...    return math.floor(math.log10(n) + 1)
...
>>> def no_globals(n, floor=math.floor, log=math.log10):
...    return floor(log(n) + 1)
...
>>> timeit('[akuli(i) for i in range(1, 1111111)]', setup='from __main__ import akuli', number=1000)
767.7599000299997
...
>>> timeit('[no_globals(i) for i in range(1, 1111111)]', setup='from __main__ import no_globals', number=1000)
600.506064831

The function is a whooping ~21.79% faster with this simple speed up (compiled language advocates hate him). The naive version (and mine...) and perhaps the most intuitive runs like this with no globals:

>>> def naive(n, length=len, string=str):
...     return length(string(n))
>>> timeit('[naive(i) for i in range(1, 1111111)]', setup='from __main__ import naive', number=1000)
374.48799204826355

Wait, what ? It's faster! It's ~37.64% faster! I honestly don't know why. Perhaps someone can tell me my benchmarks are wrong, len and str are insanely optimised or that log10 from C somehow is slow.

Read part 2: How to find the length of a natural number in Python.

This post was inspired by Comparing blank string definition in Python3.