Data Loading System¶
Technical guide to HAEO's unified time series loading architecture.
Overview¶
The data loading system transforms Home Assistant sensor data into time series aligned with optimization horizons. Input entities call this system to load and expose forecast data.
The system addresses three core challenges:
- Heterogeneous data sources: Sensors provide different formats (simple values vs forecasts from various integrations)
- Temporal alignment: Forecast timestamps rarely match optimization periods
- Partial coverage: Forecasts often don't span the entire optimization horizon
The system uses a three-stage pipeline: extraction → combination → fusion.
See Home Assistant documentation for background on entities and sensors:
Architecture¶
The data loading pipeline consists of four stages:
- Orchestration (
TimeSeriesLoader) - Coordinates the entire loading process - Extraction (
sensor_loader.py) - Reads Home Assistant entities and detects formats - Combination (
forecast_combiner.py) - Merges multiple sensors into unified data - Fusion (
forecast_fuser.py) - Aligns data to optimization horizon using interpolation
Input entities call TimeSeriesLoader.load_intervals() or load_boundaries() when they need to refresh their data.
The coordinator reads the already-loaded values from input entities.
graph LR
subgraph "Input Entity"
IE[HaeoInputNumber]
end
subgraph "Loading Pipeline"
TSL[TimeSeriesLoader]
Ext[Extraction]
Comb[Combination]
Fuse[Fusion]
end
subgraph "Runtime Data"
RD[runtime_data.inputs]
end
IE --> TSL
TSL --> Ext
Ext --> Comb
Comb --> Fuse
Fuse --> IE
IE --> RD
Each stage has a single responsibility and clear interfaces, making the system testable and extensible.
Design Decisions¶
Why separate extraction and fusion? Extraction handles heterogeneous input formats, while fusion handles temporal alignment. This separation allows adding new forecast formats without changing alignment logic.
Why trapezoidal integration for fusion? Linear programming optimization works with energy (power × time), not instantaneous power. Trapezoidal integration accurately computes interval averages from point samples.
Why additive sensor combination? Physical intuition: multiple solar arrays sum their power output, multiple price components sum to total cost. This matches real-world energy network behavior.
TimeSeriesLoader¶
The TimeSeriesLoader orchestrates the complete loading pipeline.
Input entities instantiate and call this loader to refresh their forecast data.
Responsibilities¶
- Validate that all referenced sensors exist and are available
- Coordinate extraction, combination, and fusion stages
- Convert results to HAEO base units (see Units for details)
- Handle single sensors and sensor lists uniformly
Interface Design¶
The loader provides methods for two types of data loading:
available()- Checks if sensors exist without loading data (used during config validation)load_intervals()- Returns n interval averages for n+1 fence post timestampsload_boundaries()- Returns n+1 point-in-time values at each fence post timestamp
Intervals vs Fence Posts: Optimization horizons are defined by n+1 timestamps (fence posts) creating n periods (intervals). Different physical quantities require different loading approaches:
- Interval values (n values): Power, efficiency, costs - values that represent averages over time periods
- Fence post values (n+1 values): Capacity, SOC limits - values that represent states at specific points in time
All methods accept flexible value parameters (single sensor string, list, or constant) to support different configuration field types.
Return Behavior¶
The loading methods return lists of floats:
load_intervals()returns n values (one per optimization period)load_boundaries()returns n+1 values (one per fence post timestamp)
Both handle constant values by broadcasting to the appropriate length.
Both support a default parameter for optional fields with fallback values.
Values use HAEO base units: kilowatts (kW) for power, kilowatt-hours (kWh) for energy, $/kWh for prices. See Units documentation for conversion details.
Sensor Extraction¶
The sensor_loader.py module extracts data from Home Assistant entities.
Payload Types¶
A sensor provides either:
- Present value (float): Current reading at query time
- Forecast series (list of timestamp/value tuples): Future predictions
This distinction drives all downstream processing: present values repeat across horizons, forecast series interpolate and cycle.
Format Detection¶
The system uses duck typing to identify forecast formats. Each integration has distinct attribute structures, allowing automatic detection without configuration.
The detection logic tries all known parsers (see Extractors) and returns the first match. If no parser matches, the system falls back to extracting the numeric state value.
Why Automatic Detection?¶
Users shouldn't need to specify integration types in configuration. Automatic detection reduces configuration complexity and prevents errors from misconfiguration. Adding new forecast formats requires only a new parser module, not changes to user configuration.
Extractors¶
The extractor system (extractors/) handles integration-specific forecast formats.
Supported Integrations¶
| Integration | Use Case | Parser Module |
|---|---|---|
| Amber Electric | Electricity pricing | amberelectric.py |
| AEMO NEM | Wholesale pricing | aemo_nem.py |
| HAEO | Chaining HAEO sensor outputs | haeo.py |
| Solcast Solar | Solar forecasting | solcast_solar.py |
| Open-Meteo Solar Forecast | Solar forecasting | open_meteo_solar_forecast.py |
Parser Design¶
Each parser is a standalone module with two responsibilities:
- Detection: Identify if a sensor state matches the integration's format
- Extraction: Parse forecast data into (Unix timestamp, value) tuples
Parsers declare expected units and device classes for automatic unit conversion.
Adding New Formats¶
To support a new forecast integration:
- Create a parser module in
extractors/ - Implement
detect()andextract()static methods - Declare
DOMAIN,UNIT, andDEVICE_CLASSclass attributes - Add tests to
tests/data/loader/extractors/
The system automatically discovers and uses new parsers without configuration changes. See existing parsers for implementation patterns.
Combining Payloads¶
The forecast_combiner.py module merges multiple sensor payloads.
Combination Strategy¶
Multiple sensors combine additively:
- Present values sum together
- Forecast series interpolate to shared timestamps, then sum
This matches physical reality: two solar arrays produce combined power, multiple price components sum to total cost.
Timestamp Alignment¶
When combining forecast series with different timestamps, the system:
- Creates a union of all timestamps from all sensors
- Interpolates each sensor's values to this common timestamp set
- Sums interpolated values at each timestamp
This ensures no information loss when sensors report forecasts at different intervals (e.g., 30-minute vs hourly).
Mixed Payloads¶
When some sensors provide present values and others provide forecasts, the present values become the initial forecast value. The combination then proceeds as pure forecast series merging.
Fusion to Horizon¶
The forecast_fuser.py module aligns combined forecasts to optimization horizons.
Fusion Functions¶
The fuser provides two functions for different data types:
fuse_to_intervals()- Produces n interval averages using trapezoidal integrationfuse_to_boundaries()- Produces n+1 point-in-time values via interpolation
Interval Fusion Strategy¶
The fuse_to_intervals() function produces values for each optimization period:
- Uses trapezoidal integration to compute accurate period averages
- Accounts for value changes within periods
This matches optimization requirements: linear programming operates on energy quantities (power × time), not instantaneous values.
Fence Post Fusion Strategy¶
The fuse_to_boundaries() function produces values at each timestamp boundary:
- Uses linear interpolation to get values at exact fence post times
- Preserves point-in-time nature of quantities like capacity and SOC limits
This is appropriate for energy storage values that represent states at specific moments, not averages over periods.
Interval Averaging¶
The system uses trapezoidal integration to compute accurate interval averages from point forecasts. This accounts for value changes within optimization periods, producing more accurate results than simple point sampling or nearest-neighbor approaches.
Why trapezoidal integration? Energy (kWh) = Power (kW) × Time (h). Optimization operates on energy quantities, so we need accurate power-over-time averaging. Trapezoidal integration provides the best balance of accuracy and simplicity for linear interpolation.
Forecast Cycling¶
When forecasts don't cover the full horizon, the system cycles them using natural period alignment. See Forecast Cycling for details.
The cycling happens before fusion, ensuring the fusion always has data covering beyond the last horizon timestamp.
Forecast Cycling¶
The forecast_cycle.py module handles partial forecast coverage.
Natural Period Alignment¶
Forecasts cycle at their natural period (24 hours for daily patterns, 7 days for weekly patterns, etc.). The system identifies this period automatically by detecting when forecast timestamps align with 24-hour boundaries.
Why natural periods? Different forecast types have different inherent cycles:
- Electricity prices often have daily patterns (time-of-use pricing)
- Some forecasts span multiple days (7-day solar forecasts)
- Cycling should preserve the forecast's intended pattern
Time-of-Day Preservation¶
When cycling, the system maintains time-of-day alignment. A 6-hour forecast from 2pm-8pm repeats at the same times each day, not offset by arbitrary amounts.
This ensures realistic patterns: expensive electricity in the evening stays expensive in the evening on subsequent days.
Design Rationale¶
Cycling is necessary because users configure long optimization horizons (48-168 hours) but integrations provide shorter forecasts. Simply repeating the last value would lose time-of-day patterns. Wrapping to zero would produce unrealistic gaps. Natural period cycling preserves patterns while extending coverage.
Error Handling¶
The data loading system uses ValueError for all data problems.
Coordinators catch these and convert them to appropriate Home Assistant exceptions based on context.
Error Strategy¶
- Transient errors (sensor offline, API timeout) →
UpdateFailed(coordinator retries) - Permanent errors (invalid sensor ID, wrong device class) →
ConfigEntryError(user must fix configuration)
This separation ensures temporary issues don't require user intervention while permanent problems surface immediately.
Error Messages¶
All error messages include specific sensor entity IDs and actionable guidance. Users should be able to identify the problem sensor and understand how to fix it without reading code or logs.
Testing¶
Tests are organized by component:
tests/data/loader/- Extraction and loading teststests/data/util/- Combination and fusion teststests/data/loader/extractors/- Format-specific parser tests
Test Strategy¶
Unit tests cover individual functions (extraction, combination, fusion, cycling) in isolation. Integration tests verify the complete pipeline from sensor IDs to horizon-aligned values.
Each test uses realistic fixtures based on actual integration data formats. This ensures parsers handle real-world edge cases (missing fields, unexpected value ranges, timezone handling).
Running Tests¶
# All data loading tests
uv run pytest tests/data/ -v
# Specific component
uv run pytest tests/data/loader/test_time_series_loader.py -v
# With coverage report
uv run pytest tests/data/ --cov=custom_components.haeo.data
Related Documentation¶
-
Input Entities
How input entities use the loading system.
-
Forecasts and Sensors guide
User-facing documentation for sensor behavior.
-
Units
Unit conversion system and base units.
-
Coordinator
How coordinator reads loaded data.