Python struct Module Tutorial – Pack and Unpack Binary Data

What Is Python Struct and Why You Need It

The python struct module is a built-in library that converts Python values into raw bytes and back again. If you have ever wondered how a network packet is built, how a binary game save file stores data, or how a sensor sends readings over a serial port — the answer almost always involves the same idea that struct implements.

Python, by default, wraps every value in a high-level object. When you write score = 9500, Python handles memory management, type tagging, and reference counting invisibly. That overhead is great for application logic, but useless when you need to send exactly two bytes over a network socket or write a fixed-size binary record to a file. The struct module python strips all of that away and gives you direct control over raw byte layout.

You do not need to install anything. The python struct module ships with every Python version.

python

import struct

print(struct.calcsize("i"))

That single line tells you an integer takes exactly four bytes when packed with struct. That precision is the whole point — and we are going to explore every part of it.

How the struct Format String Works

Every operation in python struct starts with a format string. This string is the instruction manual that tells Python exactly how to read or write a sequence of bytes. Understanding the struct format string is the single most important skill you need before writing any pack or unpack code.

A format string is a regular Python string made up of characters that each represent a data type. For example, "iHf" means: a signed integer, then an unsigned short, then a float. Each character maps directly to a C data type with a fixed, predictable byte size.

You can also prefix a format string with a byte order indicator to control how multi-byte values are laid out in memory. Without a prefix, Python defaults to the platform's native byte order, which can vary between machines. In most real-world code you will want to be explicit.

python

import struct

# ">iHf" = big-endian, signed int (4), unsigned short (2), float (4)
fmt = ">iHf"
size = struct.calcsize(fmt)
print(f"This format uses {size} bytes: int(4) + short(2) + float(4) = {size}")

This format uses 10 bytes: int(4) + short(2) + float(4) = 10

The > prefix says "use big-endian byte order." The three format characters describe three values in order. Every struct call you write will begin with a format string exactly like this — get comfortable reading them and the rest of the module becomes straightforward.

Packing Python Values into Bytes with struct.pack

struct.pack() takes Python values and converts them into a bytes object — a fixed-size sequence of raw bytes. This is the function you use when you want to write binary data to a file or send a structured payload over a network connection.

The syntax is:

python

struct.pack(format_string, value1, value2, ...)

You pass the format string first, then each value in order. The number of values must match the number of format characters in the string. The result is always a bytes object.

python

import struct

# Pack a sensor reading: sensor_id (H = unsigned short), temperature (f = float), humidity (f = float)
sensor_id = 7
temperature = 23.5
humidity = 64.2

packed = struct.pack(">Hff", sensor_id, temperature, humidity)
print(f"Packed bytes: {packed.hex()}")
print(f"Byte count: {len(packed)}")

Packed bytes: 000741bc000042806666
Byte count: 10

Ten bytes total: two for the sensor ID, four for temperature, four for humidity. Those twenty hex characters are exactly what would get written to a binary log file or transmitted to a hardware device. No labels, no overhead, just raw python binary data. Notice how the sensor ID 7 shows up as 0007 — that is big-endian zero-padding at work.

Unpacking Bytes Back to Python Values with struct.unpack

struct.unpack() is the reverse of pack. It takes a bytes object and a format string and returns a tuple of Python values. This is how you read binary data from any source — a file, a socket, a buffer — and turn it into something your program can use.

python

struct.unpack(format_string, buffer)

The buffer must be exactly struct.calcsize(format_string) bytes long. Pass the wrong size and you get a struct.error immediately. This strictness is actually useful — it means a size mismatch fails loudly rather than silently producing garbage values.

python

import struct

raw = struct.pack(">Hff", 7, 23.5, 64.2)

sensor_id, temperature, humidity = struct.unpack(">Hff", raw)
print(f"Sensor ID : {sensor_id}")
print(f"Temperature : {temperature:.1f}°C")
print(f"Humidity : {humidity:.1f}%")

Sensor ID : 7
Temperature : 23.5°C
Humidity : 64.2%

The python struct pack unpack round trip works cleanly. The values come back exactly as they went in, within the precision limits of 4-byte floats. One thing beginners often miss: struct.unpack() always returns a tuple, even for a single value. So struct.unpack(">H", raw) returns (7,), not 7. The comma in sensor_id, temperature, humidity = ... is what unpacks that tuple into individual variables.

Python Struct Format Characters You Need to Know

The python struct format characters are the alphabet of binary data. Each one maps to a specific C type with a guaranteed byte size. Here is the practical set you will use most often, as documented in the official Python struct reference:

Character	Python Type	Size	Description
`B`	int	1 byte	Unsigned byte (0–255)
`b`	int	1 byte	Signed byte (−128 to 127)
`H`	int	2 bytes	Unsigned short
`h`	int	2 bytes	Signed short
`I`	int	4 bytes	Unsigned int
`i`	int	4 bytes	Signed int
`Q`	int	8 bytes	Unsigned long long
`q`	int	8 bytes	Signed long long
`f`	float	4 bytes	Single-precision float
`d`	float	8 bytes	Double-precision float
`?`	bool	1 byte	True or False
`s`	bytes	varies	Raw byte string
`x`	—	1 byte	Pad byte (skipped)

The s character works a little differently. You prefix it with a count to set its length — "10s" means 10 bytes of raw data. When packing, Python automatically zero-pads or truncates to that exact length. When unpacking, you get back a bytes object that you can decode and strip.

python

import struct

# Pack a username as a 10-byte fixed-width field
data = struct.pack("10s", b"alice")
username, = struct.unpack("10s", data)
print(repr(username))
print(username.rstrip(b"").decode())

b'alice'
alice

The five-byte string alice is zero-padded to fill all ten bytes. On unpack you get those ten bytes back including the padding, which is why rstrip(b"") removes the null bytes before decoding to a clean string. This fixed-width string pattern appears constantly in binary file formats and network protocol headers.

Byte Order and Endianness in Python Struct

Byte order — also called endianness — determines how multi-byte values are arranged in memory. Big-endian stores the most significant byte first; little-endian stores it last. When you share python binary data between systems or read an existing binary file format, getting byte order wrong produces silently incorrect values — no exceptions, just wrong numbers.

The struct format string supports five byte order prefixes, all documented in the struct module docs:

Prefix	Meaning	Typical use
`@`	Native order, native size	Platform-specific local use
`=`	Native order, standard size	Portable local use
`<`	Little-endian	x86 files, Windows formats
`>`	Big-endian	Many binary standards
`!`	Network order (big-endian)	TCP/IP sockets

Here is a concrete demonstration of how the same integer looks completely different depending on byte order:

python

import struct

value = 0x0A0B0C0D # A memorable hex pattern

le_bytes = struct.pack("<I", value)
be_bytes = struct.pack(">I", value)

print(f"Value : {hex(value)}")
print(f"Little-endian : {le_bytes.hex()}")
print(f"Big-endian : {be_bytes.hex()}")

Value : 0x0a0b0c0d
Little-endian : 0d0c0b0a
Big-endian : 0a0b0c0d

The bytes are in reverse order. If you read big-endian data with a little-endian format string, python struct will decode the bytes without complaint and give you a completely wrong number. No warning, no error — silent failure. Always know your byte order before writing unpack code, and encode it explicitly in your format string rather than relying on @.

struct.calcsize in Practice

struct.calcsize() tells you exactly how many bytes a format string requires without packing any data. This is invaluable for reading binary files and streams in structured chunks — you calculate the record size once, then use it to read and unpack records in a loop.

python

import struct

formats = ["B", "H", "I", "Q", "f", "d", "10s", ">HIdH8s"]

for fmt in formats:
 size = struct.calcsize(fmt)
 print(f" {fmt:<12} = {size} bytes")

 B = 1 bytes
 H = 2 bytes
 I = 4 bytes
 Q = 8 bytes
 f = 4 bytes
 d = 8 bytes
 10s = 10 bytes
 >HIdH8s = 24 bytes

The struct calcsize python function shines when you define your format strings as constants at the top of a module and calculate their sizes once at startup. Everything that follows — reading, seeking, slicing — can use those pre-computed sizes instead of hard-coded magic numbers.

Parsing Binary Records from a Buffer

A perfect real-world use of python struct is reading fixed-size records from a binary buffer. This is exactly how game save files, sensor logs, and network captures store their data. Each record is a fixed number of bytes; you read that many bytes, unpack them, and repeat.

python

import struct
import io

# Each record: score (H = 2 bytes), player_id (I = 4 bytes), level (B = 1 byte)
RECORD_FMT = ">HIB"
RECORD_SIZE = struct.calcsize(RECORD_FMT)

# Write several records to an in-memory binary buffer
buffer = io.BytesIO()
game_records = [(9500, 1001, 5), (7200, 1002, 3), (12000, 1003, 8)]
for score, pid, level in game_records:
 buffer.write(struct.pack(RECORD_FMT, score, pid, level))

# Rewind and read every record back
buffer.seek(0)
total_records = len(buffer.getvalue()) // RECORD_SIZE
print(f"Record size : {RECORD_SIZE} bytes")
print(f"Total records: {total_records}")
print()

while True:
 raw = buffer.read(RECORD_SIZE)
 if not raw:
 break
 score, pid, level = struct.unpack(RECORD_FMT, raw)
 print(f" Player {pid} | score: {score} | level: {level}")

Record size : 7 bytes
Total records: 3

 Player 1001 | score: 9500 | level: 5
 Player 1002 | score: 7200 | level: 3
 Player 1003 | score: 12000 | level: 8

The read(RECORD_SIZE) pattern is the foundation of all binary file parsing. You always read exactly as many bytes as the format needs, unpack in one clean call, and move on. Replace io.BytesIO with an actual file opened in "rb" mode and this code works identically on any binary file on disk.

Full Working Example: Binary Network Packet

Here is a complete program that builds and parses a custom binary network packet using python struct. It brings together the format string, byte order control, struct.pack, struct.unpack, and struct.calcsize in a single realistic scenario — the kind of code you would write when implementing a simple binary protocol over a socket.

python

import struct

# Packet header layout (network byte order = big-endian):
# H = packet type (2 bytes, unsigned short)
# I = sequence number (4 bytes, unsigned int)
# d = timestamp (8 bytes, double)
# H = payload length (2 bytes, unsigned short)
# 8s = source ID (8 bytes, fixed-width string)
HEADER_FMT = "!HIdH8s"
HEADER_SIZE = struct.calcsize(HEADER_FMT)


def create_packet(ptype, seq, timestamp, source, message):
 payload = message.encode("utf-8")
 src_padded = source.encode("utf-8").ljust(8)[:8]
 header = struct.pack(HEADER_FMT, ptype, seq, timestamp, len(payload), src_padded)
 return header + payload


def decode_packet(raw):
 ptype, seq, ts, plen, src = struct.unpack(HEADER_FMT, raw[:HEADER_SIZE])
 payload = raw[HEADER_SIZE : HEADER_SIZE + plen].decode("utf-8")
 return {
 "type": ptype,
 "sequence": seq,
 "timestamp": ts,
 "source": src.rstrip().decode("utf-8"),
 "message": payload,
 }


# Build a status packet
packet = create_packet(
 ptype=1,
 seq=100,
 timestamp=1714000000.0,
 source="SRV-A",
 message="STATUS OK",
)

print(f"Header size : {HEADER_SIZE} bytes")
print(f"Total packet : {len(packet)} bytes")
print()

# Decode it back
fields = decode_packet(packet)
for key, value in fields.items():
 print(f" {key:<12}: {value}")

Header size : 24 bytes
Total packet : 33 bytes

 type : 1
 sequence : 100
 timestamp : 1714000000.0
 source : SRV-A
 message : STATUS OK