Python Memory Management

Seema Thapa
DataDrivenInvestor
Published in
4 min readNov 29, 2019

--

Have you ever wonder why memory management is significant?

Well, Memory is like storing all the happenings in our brain. Likewise, data is saved in the computer’s machinery brain. If memory is lost, we won’t be able to remember and we have to ask our existence.

First, what would have happened if the memory wasn’t managed properly? I guess there will be a memory leak, which means the objects or allocated memory that is no longer desired are still lingering around. It makes the system slower. Understanding memory management can help us write productive code, getting more familiar with how to solve problems, troubleshoot and debug.

In Python, everything is an object. It describes that everything has properties and methods. It can be sub-classed (inherited), assigned to a variable and passed as an argument to a function.

Two priors that we need to consider are:

  1. The sizes of basic Python objects.
  2. How Python manages its memory internally?

Python code is converted into bytecode which are the instructions written in a more machine-readable format. I remember getting .pyc or __pycache__ folder including in the .gitignore file. I didn’t know why it was created along with my python project. The actual intention is to cached the bytecode for faster execution of code.

Python Interpreter receives its share for execution. How much memory is allocated will depend on the python versions, platform, and environment setting.

We can easily get the size of objects by importing the sys module. For example in the case of 64-bit version running python 3.6.8:

To get the size of objects in bytes:

>>> import sys
>>> sys.getsizeof({})
240
>>> sys.getsizeof([])
64
>>> string = "Seema"
>>> sys.getsizeof(string)
54

Similarly, we can check the identity of an object by using id( ). Here, i is a reference variable and 10 is an object.

>>> i = 10
>>> j = 10
>>> id(i)
10914784
>>> id(j)
10914784
>>> id(i) == id(j)
True
>>> hex(id(i)) (a memory address in hexadecimal)
'0xa68be0'
>>> hex(id(j))
'0xa68be0'

Internally, python consists of the preserved main storage which can be divided into two areas:

Stack Memory for static memory allocation.

Heap Memory for dynamic memory allocation.

A simple example of Stack and Heap Memory Allocation
Methods and References are created in the stack memory. Whereas Objects are created in heap memory.

The stack memory follows the LIFO order. The most recent function or method is deleted from the stack after the value is returned or gets out of scope.

Python’s heapq module implements a min-heap. Python heap is internally looked at by the Python memory manager. Python objects(PyObject) usually consist of three things:

a. Object Type

b. A Reference Count

c. Object Value

Some people may not be familiar with references. References are the container objects or variable names that points to another object. We can count by 1 every time an object is referenced.

>>> x = 20
# Reference count equals 1
>>> y = 20
# Reference count equals 2
>>> three_twenties = [20, 20, 20]
# Reference count equals 5

We can also count the reference using the sys module:

>>> import sys
>>> message = “Hello”
>>> sys.getrefcount(message)
2
# We get 2 references count, 1 from the message variable and 1 from getrefcount

By default, Python is implemented in the C programming language. So, it follows a CPython Standard Garbage collection. Garbage Collection in Python consists of two components. Reference Counting and Generational Garbage Collection. Python used only reference counting for memory management and it can’t detect the reference cycle. The reference cycle occurs when one or more objects refer to each other.

Simplest examples can be:

>>> cycle = []
>>> cycle.append(cycle)
>>> i = 0
>>> obj1 = { }
>>> obj2 = { }
>>> obj2[i+1] = obj1
>>> obj1[i+1] = obj2
#Here, obj2 is referencing to obj1 and vice versa

Garbage Collection is done automatically in Python. Generational Garbage Collection is based upon the theory that most objects die young. It consists of three generations. Garbage collection runs periodically based on a threshold. When the threshold value is set to zero, garbage collection is disabled.

What is the threshold?

A threshold is a rate or ratio which defines the total memory allocation and deallocation. It is simply the collection frequency or multipliers of relative generations.

Pseudocode to start garbage collection using threshold is:

When no. of allocated memory minus no. of deallocated memory > threshold

Please start garbage collection

By default, the threshold value is (700, 10, 10). 700 is for first-generation and 10 for each of the other two generations.

>>> import gc
>>> gc.get_threshold()
(700, 10, 10)

We can also set the threshold value:

gc.set_threshold(threshold0[, threshold1[, threshold2]])

>>> gc.set_threshold(860, 12, 12)

It means if the above calculation measure exceeds 700 then the automatic garbage collector will run. If among the three generations, two or more exceeds the threshold, GC prefers the oldest one first.

Initially, the only generation 0 is examined. If generation 0 has been examined more than threshold1 times since generation 1 has been examined, then the generation 1 is examined as well. Similarly, threshold2 controls the number of collections of the generation 1 before collecting generation 2.

GC iterates over each container object and temporarily removes all references to all container objects it references.

--

--