Skip to main content

How to Profile and Speed Up Your Python Code

·1196 words·6 mins
Table of Contents
Python Standard Library - This article is part of a series.
Part 6: This Article

Optimising Python code without profiling is like navigating a maze blindfolded - you might get lucky, but you’ll probably waste time.

Profiling is the process of measuring how your code performs, whether it’s tracking execution time or memory usage. Without it, you’re guessing where the bottlenecks are, and guesses are often wrong. In this post, we’ll explore how to profile Python code for both time and memory usage, interpret the results, and use that data to make your code faster and more efficient.

Why Profile Python Code?
#

Profiling helps you answer critical questions about your code’s performance:

  • Is your function slow because it’s doing too much work, or because it’s calling an inefficient library?
  • Is your code using too much memory, and if so, where is that memory being allocated?
  • Are there hidden inefficiencies in your algorithms or data structures?

By profiling your code, you can focus your optimisation efforts where they matter most-saving time and frustration.

Built-in Python Profiling Tools
#

Python provides two powerful built-in tools for profiling:

  1. cProfile: Measures execution time and function call statistics.
  2. tracemalloc: Tracks memory allocations and identifies memory leaks.

Let’s combine these tools into a reusable context manager that profiles both time and memory in a single run.

The profile_code Context Manager
#

Here’s a context manager that profiles execution time and memory usage:

import cProfile
import io
import linecache
import pstats
import tracemalloc
from contextlib import contextmanager
from textwrap import dedent
from typing import Literal

@contextmanager
def profile_code(include: tuple[Literal["time", "memory"], ...] = ("time", "memory")):
    """
    Profile execution time using cProfile and tracemalloc.

    Args:
        include: A tuple of strings specifying what to profile ("time", "memory", or both).
    """
    print("=" * 60)
    print(f"{' & '.join(include).upper()} PROFILING")
    print("-" * 60)

    # Create profiler
    profiler = cProfile.Profile()

    # Start profiling
    if "memory" in include:
        tracemalloc.start()

    if "time" in include:
        profiler.enable()

    yield

    # Stop profiling and print results
    if "time" in include:
        profiler.disable()

        # Get execution time statistics
        string_io = io.StringIO()
        stats = pstats.Stats(profiler, stream=string_io)
        _ = stats.strip_dirs()
        _ = stats.sort_stats("cumtime")
        _ = stats.print_stats(20)
        print(dedent(string_io.getvalue()).strip())

    if "memory" in include:
        # Get memory statistics
        current, peak = tracemalloc.get_traced_memory()

        print(f"Current memory usage: {current / 1024 / 1024:.2f} MB")
        print(f"Peak memory usage: {peak / 1024 / 1024:.2f} MB\n")

        # Get top memory allocations
        snapshot = tracemalloc.take_snapshot()
        tracemalloc.stop()
        snapshot = snapshot.filter_traces(
            (
                tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
                tracemalloc.Filter(False, "<unknown>"),
            )
        )
        top_stats = snapshot.statistics("lineno")

        print("Top 10 memory-consuming lines:")
        for index, stat in enumerate(top_stats[:10], 1):
            frame = stat.traceback[0]
            print(
                f"#{index}: {frame.filename}:{frame.lineno}: {stat.size / 1024:.1f} KiB"
            )
            line = linecache.getline(frame.filename, frame.lineno).strip()
            if line:
                print(f"    {line}")

        other = top_stats[10:]
        if other:
            size = sum(stat.size for stat in other)
            print(f"{len(other)} other lines: {size / 1024:.1f} KiB")
        total = sum(stat.size for stat in top_stats)
        print(f"Total allocated size: {total / 1024:.1f} KiB")

How It Works
#

The profile_code context manager works as follows:

  1. Start Profiling: When you enter the context manager, it initialises cProfile for time profiling and tracemalloc for memory profiling, based on the include parameter.
  2. Execute Your Code: The code inside the with block runs while being profiled.
  3. Stop Profiling and Print Results: When you exit the context manager, it stops profiling and prints:
    • Execution time statistics (top 20 functions by total time).
    • Memory usage statistics (current and peak memory usage, top 10 memory-consuming lines).

Example Usage
#

Let’s use the profile_code context manager to profile a slow function:

def slow_function(duration):
    """A function that simulates a time-consuming task."""
    print(f"Running slow_function for {duration} seconds...")
    time.sleep(duration)
    print("slow_function finished.")


def fast_function():
    """A function that performs a quick task."""
    print("Running fast_function...")
    total = 0
    for i in range(10000):
        total += i
    print("fast_function finished.")


def process_data():
    """A function that calls other functions."""
    print("Starting data processing...")
    slow_function(2)
    for _ in range(3):
        fast_function()
    print("Data processing finished.")

if __name__ == '__main__':
    with profile_code():
        process_data()

This will output:

  • The top 20 functions by execution time.
  • The current and peak memory usage.
  • The top 10 lines where memory is allocated.

Profiling Results
#

============================================================
TIME & MEMORY PROFILING
------------------------------------------------------------
Starting data processing...
Running slow_function for 2 seconds...
slow_function finished.
Running fast_function...
fast_function finished.
Running fast_function...
fast_function finished.
Running fast_function...
fast_function finished.
Data processing finished.
20 function calls in 2.038 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    2.038    2.038 test.py:103(process_data)
     1    0.000    0.000    2.005    2.005 test.py:87(slow_function)
     1    2.005    2.005    2.005    2.005 {built-in method time.sleep}
     3    0.032    0.011    0.033    0.011 test.py:94(fast_function)
    10    0.000    0.000    0.000    0.000 {built-in method builtins.print}
     1    0.000    0.000    0.000    0.000 contextlib.py:145(__exit__)
     1    0.000    0.000    0.000    0.000 {built-in method builtins.next}
     1    0.000    0.000    0.000    0.000 test.py:12(profile_code)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
Current memory usage: 0.02 MB
Peak memory usage: 0.02 MB

Top 10 memory-consuming lines:
#1: /opt/homebrew/Cellar/python@3.13/3.13.11/Frameworks/Python.framework/Versions/3.13/lib/python3.13/pstats.py:230: 2.1 KiB
    fragment = fragment[:-1]
#2: /opt/homebrew/Cellar/python@3.13/3.13.11/Frameworks/Python.framework/Versions/3.13/lib/python3.13/pstats.py:229: 1.5 KiB
    dict[fragment] = tup
#3: /Users/toby/dev/projects/tobydevlin.com-3.0/test.py:32: 1.4 KiB
    profiler.enable()
#4: /opt/homebrew/Cellar/python@3.13/3.13.11/Frameworks/Python.framework/Versions/3.13/lib/python3.13/pstats.py:264: 1.1 KiB
    stats_list.append((cc, nc, tt, ct) + func +
#5: /opt/homebrew/Cellar/python@3.13/3.13.11/Frameworks/Python.framework/Versions/3.13/lib/python3.13/pstats.py:547: 1.1 KiB
    return os.path.basename(filename), line, name
#6: /opt/homebrew/Cellar/python@3.13/3.13.11/Frameworks/Python.framework/Versions/3.13/lib/python3.13/pstats.py:289: 1.1 KiB
    newcallers[func_strip_path(func2)] = caller
#7: /Users/toby/dev/projects/tobydevlin.com-3.0/test.py:46: 1.1 KiB
    print(dedent(string_io.getvalue()).strip())
#8: /opt/homebrew/Cellar/python@3.13/3.13.11/Frameworks/Python.framework/Versions/3.13/lib/python3.13/pstats.py:296: 1.0 KiB
    newstats[newfunc] = (cc, nc, tt, ct, newcallers)
#9: /Users/toby/dev/projects/tobydevlin.com-3.0/test.py:38: 0.9 KiB
    profiler.disable()
#10: /opt/homebrew/Cellar/python@3.13/3.13.11/Frameworks/Python.framework/Versions/3.13/lib/python3.13/cProfile.py:59: 0.8 KiB
    entries = self.getstats()
80 other lines: 9.2 KiB
Total allocated size: 21.4 KiB

The most important section is the table sorted by cumtime (cumulative time). This column shows the total time spent in a function, including all the functions it calls. It’s the best indicator of where your program is spending the most time overall.

  1. Top Bottleneck: Look at the first few lines. You can see that process_data is at the top, but the real workhorse of time consumption is slow_function, which directly calls time.sleep. The cumtime of 2.005 seconds for slow_function is almost entirely spent in the time.sleep call.

  2. Function Calls: The ncalls column tells you how many times a function was called. fast_function was called 3 times, but its total time is negligible compared to slow_function.

The memory usage for this script is very low (0.02 MB). The “Top 10 memory-consuming lines” are mostly showing memory used by the profiler itself (pstats.py, cProfile.py), not the actual code. For this particular run, memory is not a concern.

Key Takeaways
#

In this post, we explored how to profile Python code for time and memory usage. Here’s what you should remember:

  • Profiling is essential: It helps you identify bottlenecks in your code so you can optimise the right parts.
  • Use cProfile for time profiling: It measures execution time and function call statistics.
  • Use tracemalloc for memory profiling: It tracks memory allocations and identifies memory leaks.
  • The profile_code context manager: Combines both tools into a reusable utility for profiling time and memory in a single run.
  • Always profile before optimising: Don’t guess where the bottlenecks are-let the data guide you.

Try It Yourself!
#

Now that you know how to profile your Python code, it’s time to put it into practice:

  1. Profile your own code: Use the profile_code context manager above to identify bottlenecks in your projects.
  2. Experiment with optimisations: Try different approaches (e.g., list comprehensions, generator expressions) and measure the impact.
  3. Share your results: Let me know in the comments what you discovered-did profiling reveal any surprises?

Happy profiling!

Resources
#

Python Standard Library - This article is part of a series.
Part 6: This Article