Kaggle Notes -- Time Series as Features

09/18/25 on Learning

Core Idea: Skip complex time-series models. Extract temporal features from time data and use standard ML models (Random Forest, XGBoost) for predictions — practical and easy to implement.

1. Key Feature Engineering

1.1 Time Index Features

Capture cycles (hour, weekday) and long-term trends via timestamps.

import pandas as pd
# Convert, index and sort (must-do for time series)
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.set_index('timestamp').sort_index()
# Extract features
df['hour'] = df.index.hour
df['dayofweek'] = df.index.dayofweek  # 0=Monday
df['days_since_start'] = (df.index - df.index.min()).days

1.2 Rolling Window Features

Short-term trends via sliding window stats (match window to data frequency: 7D for daily).

# 7-day rolling stats
df['rolling_7d_mean'] = df['target'].rolling(window='7D').mean()
df['rolling_7d_std'] = df['target'].rolling(window='7D').std()
df = df.dropna()  # Remove window-generated NaNs

1.3 Lag Features

Use past values (t-1, t-7) to predict future — avoid data leakage!

df['lag_1'] = df['target'].shift(1)  # 1-day lag
df['lag_7'] = df['target'].shift(7)  # 7-day lag
df = df.dropna()  # Remove shift-generated NaNs

2. Simple Workflow

Clean Data: Fill missing values with interpolation
Engineer Features: Combine the three feature types above
Time-Based Split: No random split — preserve time order
Train Model: Use standard ML models (e.g., RandomForestRegressor)

3. Key Takeaways

Avoids complex models (ARIMA) — uses familiar ML tools
Golden Rules: Sort by time; split by time; no future data leakage
Great for: Demand forecasting, energy prediction, sales trends