Skip to content
Home » numpy Part 11 – Structured numpy Arrays

numpy Part 11 – Structured numpy Arrays

Spread the love

Today we’ll learn how to create structured numpy arrays.

structured numpy arrays

In all the arrays up to now we had homogeneous data, so data of one and the same type, like all ints or all floats. But we can also define structured numpy arrays with columns of different types. Let’s say we want to implement the following table as an array:

NameOccupationAgeIncome
John Smithdentist4730534
Mike Turnerdriver5024130
Lisa Steepteacher2829547
Jennifer Leenurse3125478

dtype

Using dtype we can define a different data type for each column. As you can see, we’ll need strings for the first two columns and integers for the other two.

First, let’s define a single column, for example the Age column and populate it. We can do it like so:

ages = np.array([(47,), (50,), (28,), (31,)],
                dtype = np.dtype([('age', 'i1')]))

# We access the data in the age column by key.
print(ages['age'])

Here’s the output:

[47 50 28 31]

For better readability we could assign the data type to a variable:

age = np.dtype([('age', 'i1')])

ages = np.array([(47,), (50,), (28,), (31,)],
                dtype = age)

# We access the data in the age column by key.
print(ages['age'])

Your Panda3D Magazine

Make Awesome Games and Other 3D Apps

with Panda3D and Blender using Python.

Cool stuff, easy to follow articles.

Get the magazine here (PDF).

Creating Structured numpy Arrays

Now we’re ready to create our structured array, the one shown before. It will have four columns and we’ll populate it with the data presented before.

Here’s the code in which we create and use the structured array:

import numpy as np

# Let's define a data type and assign it to a variable.
# We want the strings to be 20-character unicode strings, 
# so we should use the string U20. The age should be a 1-byte
# integer and the income should be a 4-byte integer.
worker = np.dtype([('name', 'U20'), 
                   ('occupation', 'U20'), 
                   ('age', 'i1'), 
                   ('income', 'i4')])

workers = np.array([
    ('John Smith', 'dentist', 47, 30534),
    ('Mike Turner', 'driver', 50, 24130),
    ('Lisa Steep', 'teacher', 28, 29547),
    ('Jennifer Lee', 'nurse', 31, 25478)],
    dtype = worker)

# We can access the data by row, by column or individually.
# If you need a whole row, just use the index of that row.
# So, if you need the second row, you should use the index 1.
print("Second row:")
print(workers[1])
print()

# You can access a whole column by key, like before. Let's access the 
# occupation column for example.
print("The occupation column:")
print(workers['occupation'])
print()

# And now let's access the third item in the names column.
print("The third name:")
print(workers['name'][2])
print()

# or so
print("The third name again:")
print(workers[2]['name'])

If we run this program, we’ll get the following output:

Second row:
('Mike Turner', 'driver', 50, 24130)

The occupation column:
['dentist' 'driver' 'teacher' 'nurse']

The third name:
Lisa Steep

The third name again:
Lisa Steep

Python Jumpstart Course

Learn the basics of Python, including OOP.

with lots of exercises, easy to follow

The course is available on Udemy.

A More Complex Example

And now a slightly more complex example. Let’s implement the following table as a structured numpy array:

Box IDSize  Content  
 LengthWidthHeightItemAmountPrice
21031649.5240.2511.89book11128.65
21154139.2221.1518.36pencil480240.00
19952051.5019.4715.45eraser1200400.25

We’ll use the following types:

– int32 for the box id

– float64 for length, width, height and price

– int16 for amount

– unicode (30 characters) for item

This time we won’t use the string representations of the data types, but rather the full type names, so int32 instead of i4, for example. For unicode we’ll need the following syntax: np.unicode, 30.

Here’s our implementation:

import numpy as np

box = np.dtype([('boxID', np.int32),
                ('size', [('length', np.float64), 
                          ('width', np.float64), 
                          ('height', np.float64)]),
                ('content', [('item', np.unicode, 30),
                             ('amount', np.int16),
                             ('price', np.float64)])])

boxes = np.array([
    (210316, (49.52, 40.25, 11.89), ('book', 11, 128.65)),
    (211541, (39.22, 21.15, 18.36), ('pencil', 480, 240.00)),
    (199520, (51.50, 19.47, 15.45), ('eraser', 1200, 400.25))],
    dtype = box)


# last row
print("Last row:")
print(boxes[-1])
print()

# the box id column
print("The box id column:")
print(boxes['boxID'])
print()

# the width column
print("The width column:")
print(boxes['size']['width'])
print()

# the content columns
print("The content columns:")
print(boxes['content'])
print()

# the price in the second row
print("The price in row 2:")
print(boxes['content']['price'][1])
print()

And here’s the output:

Last row:
(199520, (51.5, 19.47, 15.45), ('eraser', 1200, 400.25))

The box id column:
[210316 211541 199520]

The width column:
[40.25 21.15 19.47]

The content columns:
[('book',   11, 128.65) ('pencil',  480, 240.  ) ('eraser', 1200, 400.25)]

The price in row 2:
240.0

Blender Jumpstart Course

Learn the basics of 3D modeling in Blender.

step-by-step, easy to follow, visually rich

The course is available on Udemy and on Skillshare.

Here’s the video version of the article:


Spread the love

Leave a Reply