In the previous part of the openpyxl series we were iterating over the rows and columns of a spreadsheet using indices and ranges. In this part we’ll see how to do that using Python iterators.
We’ll be working on the same workbook as before, so let’s import it:
>>> from openpyxl import load_workbook
>>> workbook = load_workbook(filename="wb1.xlsx")
>>> sheet = workbook.active
Now we can use the iter_rows and iter_cols methods to iterate over the rows and columns respectively. They return a generator:
>>> sheet.iter_rows()
<generator object Worksheet._cells_by_row at 0x000001F632992D48>
If you want to learn more about generators in Python, I have a series of articles about them, starting with the very basics of generators, so feel free to have a look at it.
You can iterate over the rows in a loop. The cells in each row will be returned in a tuple:
>>> for row in sheet.iter_rows():
... print(row)
...
(<Cell 'Sheet'.A1>, <Cell 'Sheet'.B1>, <Cell 'Sheet'.C1>)
(<Cell 'Sheet'.A2>, <Cell 'Sheet'.B2>, <Cell 'Sheet'.C2>)
You can also iterate over the columns like that. Again, the cells in each column will be returned in a tuple:
>>> for col in sheet.iter_cols():
... print(col)
...
(<Cell 'Sheet'.A1>, <Cell 'Sheet'.A2>)
(<Cell 'Sheet'.B1>, <Cell 'Sheet'.B2>)
(<Cell 'Sheet'.C1>, <Cell 'Sheet'.C2>)
Using the min_row, max_row, min_col and max_col arguments, with pretty self-explanatory names, you can specify the ranges you want to iterate over, for example:
>>> for row in sheet.iter_rows(min_row=2, max_row=3, min_col=2):
... print(row)
...
(<Cell 'Sheet'.B2>, <Cell 'Sheet'.C2>)
(<Cell 'Sheet'.B3>, <Cell 'Sheet'.C3>)
As you can see, both methods return tuples of cells. If you need the values, not the cells, you can set the values_only argument to True:
>>> for row in sheet.iter_rows(values_only=True):
... print(row)
...
('hey!', None, None)
(None, None, 'ho!')
(None, None, None)
As you can see, if a cell is empty, like most cells in our example, its value is None.
An even shorter way of iterating over the rows and columns, especially if you don’t need to specify the ranges, but rather want to iterate over the whole workbook, is to use just the rows and columns attributes:
>>> for row in sheet.rows:
... print(row)
...
(<Cell 'Sheet'.A1>, <Cell 'Sheet'.B1>, <Cell 'Sheet'.C1>)
(<Cell 'Sheet'.A2>, <Cell 'Sheet'.B2>, <Cell 'Sheet'.C2>)
(<Cell 'Sheet'.A3>, <Cell 'Sheet'.B3>, <Cell 'Sheet'.C3>)
>>> for col in sheet.columns:
... print(col)
...
(<Cell 'Sheet'.A1>, <Cell 'Sheet'.A2>, <Cell 'Sheet'.A3>)
(<Cell 'Sheet'.B1>, <Cell 'Sheet'.B2>, <Cell 'Sheet'.B3>)
(<Cell 'Sheet'.C1>, <Cell 'Sheet'.C2>, <Cell 'Sheet'.C3>)
In the next part we’ll see how to convert spreadsheet data to a dictionary.