
Using Hierarchical Recursive Temporal Neural Aggregation and Predictive Modelling Systems
Here in Toronto, a common phrase uttered by anyone who had their birthday "ruined,” as I’m sure you’ve heard repeatedly, has told you, “The Weatherman is always wrong.” That’s why I’ve decided to create my own logistical temporal regressive neural network and predictive modelling masterpiece.

In this article, I will detail how you too can create a hierarchical recursive temporal neural aggregation and predictive modelling system (HRTNAPS) and, in addition, how to keep up with the rapidly changing field of AI.
What even is a Hierarchical Recursive Temporal Neural Aggregation and Predictive Modelling System?
The concept of HRTNAPS can be broken up into multiple elements that come together to make accurate predictions.
Hierarchical
The organizing data or relationships in a multi-level structure. This could be for recognizing patterns between global trends → regional trends → local patterns.
Recursive
The system uses feedback loops to refine its understanding or predictions. This means that predictions at one stage can be re-evaluated and adjusted based on new data.
Temporal
Focuses on time-series data or sequences where the order and timing of events are crucial, like stock prices or heart rate data over time.
Neural Aggregation
Combines neural network techniques to merge information from multiple sources at levels of the hierarchy. This could be used for things like predicting overall power grid demand based on energy usage for an individual household.
Predictive Modeling
The system generates forecasts or predictions based on learned patterns in historical data. For example, predicting future sales or, in my case, weather conditions.
Workflow
For my particular project, I decided to plug in various data points to my HRTNAPS. Let’s use an example workflow, which, although simplified, I think will still be a great example to illustrate what really is and HRTNAPS.
Data Hierarchy
Here we create various levels of data. This helps us understand how certain events impact others. That’s why we have a global and regional field.
data = {
"global": {"temperature": [20, 21, 22], "humidity": [40, 42, 43]},
"regional": {
"north": {"temperature": [15, 16, 17], "humidity": [35, 36, 37]},
"south": {"temperature": [25, 26, 27], "humidity": [45, 46, 47]}
}
}
Temporal Modeling
Here let’s create a class to use the time series data and initalize the model.
import torch
import torch.nn as nn
class TemporalModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(TemporalModel, self).__init__()
self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x):
lstm_out, _ = self.lstm(x)
output = self.fc(lstm_out[:, -1, :]) # Use the last output for prediction
return output
model = TemporalModel(input_size=2, hidden_size=32, output_size=1)
Neural Aggregation
Here we are going to have the model simulate the predictions using simple data encoding.
import torch
def aggregate_features(global_data, regional_data):
global_features = torch.mean(global_data, dim=0)
regional_features = torch.mean(regional_data, dim=0)
aggregated_features = torch.cat([global_features, regional_features], dim=0)
return aggregated_features
global_data = torch.rand((3, 2))
regional_data = torch.rand((3, 2))
aggregated = aggregate_features(global_data, regional_data)
Recursive Feedback
This is going to be our main loop. The model is going to fit to the actual data by simulating. This allows it to make future predictions.
def recursive_refinement(predictions, ground_truth, learning_rate=0.01):
for _ in range(5): # Iterate 5 times for refinement
error = ground_truth - predictions
adjustments = learning_rate * error
predictions += adjustments
return predictions
# Example refinement
predictions = torch.tensor([0.5, 0.6])
ground_truth = torch.tensor([1.0, 0.9])
refined_predictions = recursive_refinement(predictions, ground_truth)
Predictive Modeling
Here we are going to combine the temporal model with the aggregated features.
class HRTNAPS(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(HRTNAPS, self).__init__()
self.temporal_model = TemporalModel(input_size, hidden_size, hidden_size)
self.fc = nn.Linear(hidden_size * 2, output_size) # Combine global and regional
def forward(self, global_data, regional_data):
global_features = self.temporal_model(global_data)
regional_features = self.temporal_model(regional_data)
combined_features = torch.cat([global_features, regional_features], dim=1)
return self.fc(combined_features)
hrt_model = HRTNAPS(input_size=2, hidden_size=32, output_size=1)
To sum up, essentialally, what is the logic? We simulate the data, evaluate the simulation accuracy and then from that
Future Outlook: How to Learn
“Give a man a fish and you feed him for a day. Teach him how to fish and you feed him for a lifetime”
- Philosopher Lao Tzu
While things are changing rapidly, one thing stays constant. How we format and write code. We might change modules, pipelines, and various functions, but the general format is the same.
Pipelines: General Overview
Pipelines in the context of data processing, machine learning, or system design represent a structured, step-by-step process to transform raw inputs into actionable outputs. They ensure a streamlined workflow, modularize processes, and make it easier to debug, maintain, and scale systems.
Where to Find Pipelines
When starting on a project, you should use pipelines to format your data in the desired format and predict data. Once you have a roadmap of what your project should do, I would recommend going over to Perplexity and having it recommend resources for you to check out. This could lead you to Youtube tutorials to watch to learn about the module you might use or documentation.
How Should Pipelines Be Formatted?
Data-science pipelines can be formatted in one of four ways. The first is code-based pipelines, which allow for maximum control and customization.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
pipeline = Pipeline([
('scaler', StandardScaler()), # Preprocessing
('classifier', RandomForestClassifier()) # Model
])
pipeline.fit(X_train, y_train)
There are also workflow-oriented pipelines. They allow for tasks to be completed in a sequence with clear dependencies. These are ideal for complex workflows.
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
def extract_data():
pass # Extraction logic
def transform_data():
pass # Transformation logic
def load_data():
pass # Loading logic
with DAG('etl_pipeline', schedule_interval='@daily') as dag:
extract = PythonOperator(task_id='extract', python_callable=extract_data)
transform = PythonOperator(task_id='transform', python_callable=transform_data)
load = PythonOperator(task_id='load', python_callable=load_data)
extract >> transform >> load # Define the task order
There is also a third. This is called a visual pipeline. It’s fairly intuitive for those who are unfamiliar with coding. They are mainly used for prototyping and play a supporting role. As a result I have attached a video for those who are interested.
Documentation Pipelines
Documentation Pipelines are used to plan out a new Python project or feature.
from graphviz import Digraph
dot = Digraph(comment='Simple Python Workflow')
dot.node('A', 'Start')
dot.node('B', 'Data Preprocessing')
dot.node('C', 'Train Model')
dot.node('D', 'Evaluate Model')
dot.node('E', 'Deploy Model')
dot.edges(['AB', 'BC', 'CD', 'DE'])
dot.render('workflow', format='png', cleanup=True) # Generates a flowchart
Saving Model Training Data
When working with machine learning models, it’s important to save both the training data and the model itself to ensure reproducibility, scalability, and ease of deployment.
What format should you use?
The type and intended use of the data determine the optimal format.
- Parquet: Highly efficient for large datasets
- Joblib: Great for Python-specific tasks
- Pickle: Great for Python objects
- HDF5: Great for large heirarchical or multidimensional data
- JSON: Great for structured, human-readable data
- CSV: Best for table-like data that humans will read
Parquet
Parquet is highly efficient for large datasets due to its columnar storage format. It supports compression to reduce file size without significant performance loss.
df.to_parquet("training_data.parquet", compression="snappy")
df = pd.read_parquet("training_data.parquet")
Joblib
Joblib is great for optimization and memory-efficient storage for large objects. If the Python object isn’t very large, pickel might be better.
from joblib import dump, load
# Save data
dump(data, "training_data.joblib")
# Load data
data = load("training_data.joblib")
Pickle
Pickle is great for smaller Python objects where Joblib isn’t faster. It directly supports Python data structures, but it can be insecure if loading untrusted files.
import pickle
# Save data
with open("training_data.pkl", "wb") as f:
pickle.dump(data, f)
# Load data
with open("training_data.pkl", "rb") as f:
data = pickle.load(f)
HDF5
HDF5 is great for large hierarchical or multidemnsional data.
import h5py
# Save data
with h5py.File("training_data.h5", "w") as f:
f.create_dataset("dataset", data=array)
# Load data
with h5py.File("training_data.h5", "r") as f:
array = f["dataset"][:]
JSON
JSON is great for structured data like dictionaries. It is very readable and is portable cross-platform.
import json
# Save data
with open("training_data.json", "w") as f:
json.dump(data, f)
# Load data
with open("training_data.json", "r") as f:
data = json.load(f)
CSV
The first is CSV (Comma-Seperated Values). This data structure is much easier for humans to read, but as the file sizes increase, it struggles. Additionally, it has limited support for complex data structures
import pandas as pd
# Save data to CSV
df.to_csv("training_data.csv", index=False)
# Load data
df = pd.read_csv("training_data.csv")
The math
For those who are curious and want a more mathematically rigorous explanation of what I did for my project, here is the link: [https://acrobat.adobe.com/id/urn:aaid:sc:VA6C2:ccc7aecb-dde2-4a3c-b4db-8a8c2e9c091c].
Last Words
How to Find the Information
- a. Start with foundational resources: Use reliable sources like research papers, books, or official APIs.
- b. Ask the right questions: Break problems into smaller, focused questions to guide exploration.
- c. Build a system: Use tools like Notion, PARA, or spaced repetition for organized, effective learning.
- d. Experiment: Apply what you learn in a hands-on way to reinforce concepts.