The python struct module is a built-in library that converts Python values into raw bytes and back again. If you have ever wondered how a network packet is built, how a binary game save file stores data, or how a sensor sends readings over a serial port — the answer almost always involves the same idea that struct implements.
Python, by default, wraps every value in a high-level object. When you write score = 9500, Python handles memory management, type tagging, and reference counting invisibly. That overhead is great for application logic, but useless when you need to send exactly two bytes over a network socket or write a fixed-size binary record to a file. The struct module python strips all of that away and gives you direct control over raw byte layout.
You do not need to install anything. The python struct module ships with every Python version.
import struct
print(struct.calcsize("i"))
4
That single line tells you an integer takes exactly four bytes when packed with struct. That precision is the whole point — and we are going to explore every part of it.
Every operation in python struct starts with a format string. This string is the instruction manual that tells Python exactly how to read or write a sequence of bytes. Understanding the struct format string is the single most important skill you need before writing any pack or unpack code.
A format string is a regular Python string made up of characters that each represent a data type. For example, "iHf" means: a signed integer, then an unsigned short, then a float. Each character maps directly to a C data type with a fixed, predictable byte size.
You can also prefix a format string with a byte order indicator to control how multi-byte values are laid out in memory. Without a prefix, Python defaults to the platform's native byte order, which can vary between machines. In most real-world code you will want to be explicit.
import struct
# ">iHf" = big-endian, signed int (4), unsigned short (2), float (4)
fmt = ">iHf"
size = struct.calcsize(fmt)
print(f"This format uses {size} bytes: int(4) + short(2) + float(4) = {size}")
This format uses 10 bytes: int(4) + short(2) + float(4) = 10
The > prefix says "use big-endian byte order." The three format characters describe three values in order. Every struct call you write will begin with a format string exactly like this — get comfortable reading them and the rest of the module becomes straightforward.
struct.pack() takes Python values and converts them into a bytes object — a fixed-size sequence of raw bytes. This is the function you use when you want to write binary data to a file or send a structured payload over a network connection.
The syntax is:
struct.pack(format_string, value1, value2, ...)
You pass the format string first, then each value in order. The number of values must match the number of format characters in the string. The result is always a bytes object.
import struct
# Pack a sensor reading: sensor_id (H = unsigned short), temperature (f = float), humidity (f = float)
sensor_id = 7
temperature = 23.5
humidity = 64.2
packed = struct.pack(">Hff", sensor_id, temperature, humidity)
print(f"Packed bytes: {packed.hex()}")
print(f"Byte count: {len(packed)}")
Packed bytes: 000741bc000042806666
Byte count: 10
Ten bytes total: two for the sensor ID, four for temperature, four for humidity. Those twenty hex characters are exactly what would get written to a binary log file or transmitted to a hardware device. No labels, no overhead, just raw python binary data. Notice how the sensor ID 7 shows up as 0007 — that is big-endian zero-padding at work.
struct.unpack() is the reverse of pack. It takes a bytes object and a format string and returns a tuple of Python values. This is how you read binary data from any source — a file, a socket, a buffer — and turn it into something your program can use.
struct.unpack(format_string, buffer)
The buffer must be exactly struct.calcsize(format_string) bytes long. Pass the wrong size and you get a struct.error immediately. This strictness is actually useful — it means a size mismatch fails loudly rather than silently producing garbage values.
import struct
raw = struct.pack(">Hff", 7, 23.5, 64.2)
sensor_id, temperature, humidity = struct.unpack(">Hff", raw)
print(f"Sensor ID : {sensor_id}")
print(f"Temperature : {temperature:.1f}°C")
print(f"Humidity : {humidity:.1f}%")
Sensor ID : 7
Temperature : 23.5°C
Humidity : 64.2%
The python struct pack unpack round trip works cleanly. The values come back exactly as they went in, within the precision limits of 4-byte floats. One thing beginners often miss: struct.unpack() always returns a tuple, even for a single value. So struct.unpack(">H", raw) returns (7,), not 7. The comma in sensor_id, temperature, humidity = ... is what unpacks that tuple into individual variables.
The python struct format characters are the alphabet of binary data. Each one maps to a specific C type with a guaranteed byte size. Here is the practical set you will use most often, as documented in the official Python struct reference:
| Character | Python Type | Size | Description |
|---|---|---|---|
B | int | 1 byte | Unsigned byte (0–255) |
b | int | 1 byte | Signed byte (−128 to 127) |
H | int | 2 bytes | Unsigned short |
h | int | 2 bytes | Signed short |
I | int | 4 bytes | Unsigned int |
i | int | 4 bytes | Signed int |
Q | int | 8 bytes | Unsigned long long |
q | int | 8 bytes | Signed long long |
f | float | 4 bytes | Single-precision float |
d | float | 8 bytes | Double-precision float |
? | bool | 1 byte | True or False |
s | bytes | varies | Raw byte string |
x | — | 1 byte | Pad byte (skipped) |
The s character works a little differently. You prefix it with a count to set its length — "10s" means 10 bytes of raw data. When packing, Python automatically zero-pads or truncates to that exact length. When unpacking, you get back a bytes object that you can decode and strip.
import struct
# Pack a username as a 10-byte fixed-width field
data = struct.pack("10s", b"alice")
username, = struct.unpack("10s", data)
print(repr(username))
print(username.rstrip(b"").decode())
b'alice'
alice
The five-byte string alice is zero-padded to fill all ten bytes. On unpack you get those ten bytes back including the padding, which is why rstrip(b"") removes the null bytes before decoding to a clean string. This fixed-width string pattern appears constantly in binary file formats and network protocol headers.
Byte order — also called endianness — determines how multi-byte values are arranged in memory. Big-endian stores the most significant byte first; little-endian stores it last. When you share python binary data between systems or read an existing binary file format, getting byte order wrong produces silently incorrect values — no exceptions, just wrong numbers.
The struct format string supports five byte order prefixes, all documented in the struct module docs:
| Prefix | Meaning | Typical use |
|---|---|---|
@ | Native order, native size | Platform-specific local use |
= | Native order, standard size | Portable local use |
< | Little-endian | x86 files, Windows formats |
> | Big-endian | Many binary standards |
! | Network order (big-endian) | TCP/IP sockets |
Here is a concrete demonstration of how the same integer looks completely different depending on byte order:
import struct
value = 0x0A0B0C0D # A memorable hex pattern
le_bytes = struct.pack("<I", value)
be_bytes = struct.pack(">I", value)
print(f"Value : {hex(value)}")
print(f"Little-endian : {le_bytes.hex()}")
print(f"Big-endian : {be_bytes.hex()}")
Value : 0x0a0b0c0d
Little-endian : 0d0c0b0a
Big-endian : 0a0b0c0d
The bytes are in reverse order. If you read big-endian data with a little-endian format string, python struct will decode the bytes without complaint and give you a completely wrong number. No warning, no error — silent failure. Always know your byte order before writing unpack code, and encode it explicitly in your format string rather than relying on @.
struct.calcsize() tells you exactly how many bytes a format string requires without packing any data. This is invaluable for reading binary files and streams in structured chunks — you calculate the record size once, then use it to read and unpack records in a loop.
import struct
formats = ["B", "H", "I", "Q", "f", "d", "10s", ">HIdH8s"]
for fmt in formats:
size = struct.calcsize(fmt)
print(f" {fmt:<12} = {size} bytes")
B = 1 bytes
H = 2 bytes
I = 4 bytes
Q = 8 bytes
f = 4 bytes
d = 8 bytes
10s = 10 bytes
>HIdH8s = 24 bytes
The struct calcsize python function shines when you define your format strings as constants at the top of a module and calculate their sizes once at startup. Everything that follows — reading, seeking, slicing — can use those pre-computed sizes instead of hard-coded magic numbers.
A perfect real-world use of python struct is reading fixed-size records from a binary buffer. This is exactly how game save files, sensor logs, and network captures store their data. Each record is a fixed number of bytes; you read that many bytes, unpack them, and repeat.
import struct
import io
# Each record: score (H = 2 bytes), player_id (I = 4 bytes), level (B = 1 byte)
RECORD_FMT = ">HIB"
RECORD_SIZE = struct.calcsize(RECORD_FMT)
# Write several records to an in-memory binary buffer
buffer = io.BytesIO()
game_records = [(9500, 1001, 5), (7200, 1002, 3), (12000, 1003, 8)]
for score, pid, level in game_records:
buffer.write(struct.pack(RECORD_FMT, score, pid, level))
# Rewind and read every record back
buffer.seek(0)
total_records = len(buffer.getvalue()) // RECORD_SIZE
print(f"Record size : {RECORD_SIZE} bytes")
print(f"Total records: {total_records}")
print()
while True:
raw = buffer.read(RECORD_SIZE)
if not raw:
break
score, pid, level = struct.unpack(RECORD_FMT, raw)
print(f" Player {pid} | score: {score} | level: {level}")
Record size : 7 bytes
Total records: 3
Player 1001 | score: 9500 | level: 5
Player 1002 | score: 7200 | level: 3
Player 1003 | score: 12000 | level: 8
The read(RECORD_SIZE) pattern is the foundation of all binary file parsing. You always read exactly as many bytes as the format needs, unpack in one clean call, and move on. Replace io.BytesIO with an actual file opened in "rb" mode and this code works identically on any binary file on disk.
Here is a complete program that builds and parses a custom binary network packet using python struct. It brings together the format string, byte order control, struct.pack, struct.unpack, and struct.calcsize in a single realistic scenario — the kind of code you would write when implementing a simple binary protocol over a socket.
import struct
# Packet header layout (network byte order = big-endian):
# H = packet type (2 bytes, unsigned short)
# I = sequence number (4 bytes, unsigned int)
# d = timestamp (8 bytes, double)
# H = payload length (2 bytes, unsigned short)
# 8s = source ID (8 bytes, fixed-width string)
HEADER_FMT = "!HIdH8s"
HEADER_SIZE = struct.calcsize(HEADER_FMT)
def create_packet(ptype, seq, timestamp, source, message):
payload = message.encode("utf-8")
src_padded = source.encode("utf-8").ljust(8)[:8]
header = struct.pack(HEADER_FMT, ptype, seq, timestamp, len(payload), src_padded)
return header + payload
def decode_packet(raw):
ptype, seq, ts, plen, src = struct.unpack(HEADER_FMT, raw[:HEADER_SIZE])
payload = raw[HEADER_SIZE : HEADER_SIZE + plen].decode("utf-8")
return {
"type": ptype,
"sequence": seq,
"timestamp": ts,
"source": src.rstrip().decode("utf-8"),
"message": payload,
}
# Build a status packet
packet = create_packet(
ptype=1,
seq=100,
timestamp=1714000000.0,
source="SRV-A",
message="STATUS OK",
)
print(f"Header size : {HEADER_SIZE} bytes")
print(f"Total packet : {len(packet)} bytes")
print()
# Decode it back
fields = decode_packet(packet)
for key, value in fields.items():
print(f" {key:<12}: {value}")
Header size : 24 bytes
Total packet : 33 bytes
type : 1
sequence : 100
timestamp : 1714000000.0
source : SRV-A
message : STATUS OK