Using Hierarchical Recursive Temporal Neural Aggregation and Predictive Modelling Systems

Simon Bergeron
6 min readNov 30, 2024

Here in Toronto, a common phrase uttered by anyone who had their birthday "ruined,” as I’m sure you’ve heard repeatedly, has told you, “The Weatherman is always wrong.” That’s why I’ve decided to create my own logistical temporal regressive neural network and predictive modelling masterpiece.

In this article, I will detail how you too can create a hierarchical recursive temporal neural aggregation and predictive modelling system (HRTNAPS) and, in addition, how to keep up with the rapidly changing field of AI.

What even is a Hierarchical Recursive Temporal Neural Aggregation and Predictive Modelling System?

The concept of HRTNAPS can be broken up into multiple elements that come together to make accurate predictions.

Hierarchical

The organizing data or relationships in a multi-level structure. This could be for recognizing patterns between global trends → regional trends → local patterns.

Recursive

The system uses feedback loops to refine its understanding or predictions. This means that predictions at one stage can be re-evaluated and adjusted based on new data.

Temporal

Focuses on time-series data or sequences where the order and timing of events are crucial, like stock prices or heart rate data over time.

Neural Aggregation

Combines neural network techniques to merge information from multiple sources at levels of the hierarchy. This could be used for things like predicting overall power grid demand based on energy usage for an individual household.

Predictive Modeling

The system generates forecasts or predictions based on learned patterns in historical data. For example, predicting future sales or, in my case, weather conditions.

Workflow

For my particular project, I decided to plug in various data points to my HRTNAPS. Let’s use an example workflow, which, although simplified, I think will still be a great example to illustrate what really is and HRTNAPS.

Data Hierarchy

Here we create various levels of data. This helps us understand how certain events impact others. That’s why we have a global and regional field.

data = {
"global": {"temperature": [20, 21, 22], "humidity": [40, 42, 43]},
"regional": {
"north": {"temperature": [15, 16, 17], "humidity": [35, 36, 37]},
"south": {"temperature": [25, 26, 27], "humidity": [45, 46, 47]}
}
}

Temporal Modeling

Here let’s create a class to use the time series data and initalize the model.

import torch
import torch.nn as nn

class TemporalModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(TemporalModel, self).__init__()
self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)

def forward(self, x):
lstm_out, _ = self.lstm(x)
output = self.fc(lstm_out[:, -1, :]) # Use the last output for prediction
return output

model = TemporalModel(input_size=2, hidden_size=32, output_size=1)

Neural Aggregation

Here we are going to have the model simulate the predictions using simple data encoding.

import torch

def aggregate_features(global_data, regional_data):
global_features = torch.mean(global_data, dim=0)
regional_features = torch.mean(regional_data, dim=0)
aggregated_features = torch.cat([global_features, regional_features], dim=0)
return aggregated_features

global_data = torch.rand((3, 2))
regional_data = torch.rand((3, 2))

aggregated = aggregate_features(global_data, regional_data)

Recursive Feedback

This is going to be our main loop. The model is going to fit to the actual data by simulating. This allows it to make future predictions.

def recursive_refinement(predictions, ground_truth, learning_rate=0.01):
for _ in range(5): # Iterate 5 times for refinement
error = ground_truth - predictions
adjustments = learning_rate * error
predictions += adjustments
return predictions

# Example refinement
predictions = torch.tensor([0.5, 0.6])
ground_truth = torch.tensor([1.0, 0.9])
refined_predictions = recursive_refinement(predictions, ground_truth)

Predictive Modeling

Here we are going to combine the temporal model with the aggregated features.

class HRTNAPS(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(HRTNAPS, self).__init__()
self.temporal_model = TemporalModel(input_size, hidden_size, hidden_size)
self.fc = nn.Linear(hidden_size * 2, output_size) # Combine global and regional

def forward(self, global_data, regional_data):
global_features = self.temporal_model(global_data)
regional_features = self.temporal_model(regional_data)
combined_features = torch.cat([global_features, regional_features], dim=1)
return self.fc(combined_features)

hrt_model = HRTNAPS(input_size=2, hidden_size=32, output_size=1)

To sum up, essentialally, what is the logic? We simulate the data, evaluate the simulation accuracy and then from that

Future Outlook: How to Learn

“Give a man a fish and you feed him for a day. Teach him how to fish and you feed him for a lifetime”
- Philosopher Lao Tzu

While things are changing rapidly, one thing stays constant. How we format and write code. We might change modules, pipelines, and various functions, but the general format is the same.

Pipelines: General Overview

Pipelines in the context of data processing, machine learning, or system design represent a structured, step-by-step process to transform raw inputs into actionable outputs. They ensure a streamlined workflow, modularize processes, and make it easier to debug, maintain, and scale systems.

Where to Find Pipelines

When starting on a project, you should use pipelines to format your data in the desired format and predict data. Once you have a roadmap of what your project should do, I would recommend going over to Perplexity and having it recommend resources for you to check out. This could lead you to Youtube tutorials to watch to learn about the module you might use or documentation.

How Should Pipelines Be Formatted?

Data-science pipelines can be formatted in one of four ways. The first is code-based pipelines, which allow for maximum control and customization.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier

pipeline = Pipeline([
('scaler', StandardScaler()), # Preprocessing
('classifier', RandomForestClassifier()) # Model
])

pipeline.fit(X_train, y_train)

There are also workflow-oriented pipelines. They allow for tasks to be completed in a sequence with clear dependencies. These are ideal for complex workflows.

from airflow import DAG
from airflow.operators.python_operator import PythonOperator

def extract_data():
pass # Extraction logic

def transform_data():
pass # Transformation logic

def load_data():
pass # Loading logic

with DAG('etl_pipeline', schedule_interval='@daily') as dag:
extract = PythonOperator(task_id='extract', python_callable=extract_data)
transform = PythonOperator(task_id='transform', python_callable=transform_data)
load = PythonOperator(task_id='load', python_callable=load_data)

extract >> transform >> load # Define the task order

There is also a third. This is called a visual pipeline. It’s fairly intuitive for those who are unfamiliar with coding. They are mainly used for prototyping and play a supporting role. As a result I have attached a video for those who are interested.

Documentation Pipelines

Documentation Pipelines are used to plan out a new Python project or feature.

from graphviz import Digraph

dot = Digraph(comment='Simple Python Workflow')

dot.node('A', 'Start')
dot.node('B', 'Data Preprocessing')
dot.node('C', 'Train Model')
dot.node('D', 'Evaluate Model')
dot.node('E', 'Deploy Model')

dot.edges(['AB', 'BC', 'CD', 'DE'])
dot.render('workflow', format='png', cleanup=True) # Generates a flowchart

Saving Model Training Data

When working with machine learning models, it’s important to save both the training data and the model itself to ensure reproducibility, scalability, and ease of deployment.

What format should you use?

The type and intended use of the data determine the optimal format.

  1. Parquet: Highly efficient for large datasets
  2. Joblib: Great for Python-specific tasks
  3. Pickle: Great for Python objects
  4. HDF5: Great for large heirarchical or multidimensional data
  5. JSON: Great for structured, human-readable data
  6. CSV: Best for table-like data that humans will read

Parquet

Parquet is highly efficient for large datasets due to its columnar storage format. It supports compression to reduce file size without significant performance loss.

df.to_parquet("training_data.parquet", compression="snappy")

df = pd.read_parquet("training_data.parquet")

Joblib

Joblib is great for optimization and memory-efficient storage for large objects. If the Python object isn’t very large, pickel might be better.

from joblib import dump, load

# Save data
dump(data, "training_data.joblib")

# Load data
data = load("training_data.joblib")

Pickle

Pickle is great for smaller Python objects where Joblib isn’t faster. It directly supports Python data structures, but it can be insecure if loading untrusted files.

import pickle

# Save data
with open("training_data.pkl", "wb") as f:
pickle.dump(data, f)

# Load data
with open("training_data.pkl", "rb") as f:
data = pickle.load(f)

HDF5

HDF5 is great for large hierarchical or multidemnsional data.

import h5py

# Save data
with h5py.File("training_data.h5", "w") as f:
f.create_dataset("dataset", data=array)

# Load data
with h5py.File("training_data.h5", "r") as f:
array = f["dataset"][:]

JSON

JSON is great for structured data like dictionaries. It is very readable and is portable cross-platform.

import json

# Save data
with open("training_data.json", "w") as f:
json.dump(data, f)

# Load data
with open("training_data.json", "r") as f:
data = json.load(f)

CSV

The first is CSV (Comma-Seperated Values). This data structure is much easier for humans to read, but as the file sizes increase, it struggles. Additionally, it has limited support for complex data structures

import pandas as pd

# Save data to CSV
df.to_csv("training_data.csv", index=False)

# Load data
df = pd.read_csv("training_data.csv")

The math

For those who are curious and want a more mathematically rigorous explanation of what I did for my project, here is the link: [https://acrobat.adobe.com/id/urn:aaid:sc:VA6C2:ccc7aecb-dde2-4a3c-b4db-8a8c2e9c091c].

Last Words

How to Find the Information

  • a. Start with foundational resources: Use reliable sources like research papers, books, or official APIs.
  • b. Ask the right questions: Break problems into smaller, focused questions to guide exploration.
  • c. Build a system: Use tools like Notion, PARA, or spaced repetition for organized, effective learning.
  • d. Experiment: Apply what you learn in a hands-on way to reinforce concepts.

Sign up to discover human stories that deepen your understanding of the world.

--

--

Simon Bergeron
Simon Bergeron

Written by Simon Bergeron

I'm a mathematician turned computer science student with a growing passion for AI. Currently immersed in the world of artificial intelligence via TKS.

No responses yet

Write a response