KDnuggets→ original

How to Use Python itertools for Time Series Feature Engineering

Python itertools helps quickly create features for time series. The new article covers practical examples of efficient feature engineering and ready-to-use code

How to Use Python itertools for Time Series Feature Engineering
Source: KDnuggets. Collage: Hamidun News.
◐ Listen to article

Python itertools — this is a built-in module for working with iterators, which becomes an indispensable tool for feature engineering in time series. If you're building features through loops and copying datasets, itertools will help you do this several times faster and more memory-efficient.

What is itertools and why you need it

The itertools module contains functions for creating fast, memory-efficient iterators. For working with time series, this is critical: instead of storing lists in memory, we generate features on the fly. On datasets with millions or billions of points, this isn't just convenient — it's necessary to work at all.

Main itertools functions for feature engineering:

  • `combinations()` — all pairs and combinations of elements (lag features, variable interactions)
  • `permutations()` — all permutations (testing orders and feature influence)
  • `islice()` — extracting subsequences and sliding windows without copying
  • `zip_longest()` — combining series of different lengths with missing value handling
  • `chain()` — combining iterators into one stream without data copying

Practical examples: from naive approach to scalable

Classic problem: a time series of prices, need to build lag features with lags 1, 2, 3.

Naive approach (slow):

```python import pandas as pd data = pd.read_csv('prices.csv') for lag in range(1, 4): data[f'price_lag_{lag}'] = data['price'].shift(lag) ```

Problem: each `shift()` creates a copy of the dataset in memory. On 10 million rows this is ~300 million copy operations plus 3x memory consumption.

With itertools (fast and efficient):

```python from itertools import islice prices = [100, 102, 105, 103, 107, 110] for lag in range(1, 4): lagged = list(islice(prices, lag, None)) print(f'lag {lag}: {lagged}') ```

`islice` doesn't copy the list — it simply shifts the pointer to the required position. On large datasets, this saves hours of computation.

Second example — sliding windows:

```python from itertools import islice, tee

def sliding_window(iterable, n): iterables = tee(iterable, n) for i, it in enumerate(iterables): for _ in range(i): next(it, None) return zip(*iterables)

prices = [100, 102, 105, 103, 107, 110] windows = list(sliding_window(prices, 3)) ```

This creates sliding windows of size 3, using lazy iterators instead of copying the array.

Performance: where memory and time are saved

On datasets with billions of points:

  • Lazy computation — elements are generated on demand, not loaded as a whole
  • Operation chains — `chain()` combines sources without intermediate copies
  • Composition — combining itertools, we build a complex pipeline with minimal memory

Result: memory consumption can drop by 10-100x. In the cloud, this means a cheaper bill. On a local machine — it works with what previously didn't fit in RAM.

What this means for you

For data scientists: work with datasets that previously simply didn't fit in memory.

For ML engineers: a production model requires fewer resources and is more stable.

itertools is not magic, it's a built-in way to work with data smarter. Start with lag features and sliding windows, then find where else you can apply lazy computation.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.
What do you think?
Loading comments…