nickapos@home:~$

SRE and predictive analysis

2026-04-06T06:00:00+01:00

Table of contents

Introduction
Predictive Analysis
What is our goal
Analysis
Putting everything into practice
Linking predictive analysis with SRE

Introduction

Site Reliability Engineering (SRE) as a practice has enabled us to consistently and reliably detect and react to significant events affecting our services over the years. However, as a process, SRE is largely reactive. Even though its value cannot be overstated, there is always room for improvement.

The best way to prevent an outage is to anticipate it — and to never allow it to happen in the first place.

This transforms SRE from a reactive process to a predictive one.

Even if we cannot predict with 100% certainty when a problem will occur, being able to estimate it approximately — within a reasonable level of confidence — is still valuable. It allows us to raise awareness in advance, so that the person on call can anticipate potential issues, remain alert, and intervene early to mitigate or even prevent the problem before it escalates.

This is what we will explore in this article: building a bridge between traditional Site Reliability Engineering and the scientific discipline of predictive analysis.

Predictive Analysis

Predictive analysis is the process of using statistical methods to analyse data related to a specific product, market, service, or phenomenon, and then applying those insights to predict future events. It employs a range of techniques that have been extensively documented across various industries.

One of the best resources available online is the NIST/SEMATECH e‑Handbook of Statistical Methods, which provides a solid overview of these techniques and includes step‑by‑step guidance on how to evaluate and fine‑tune each method in order to identify the most appropriate one for a given case.

What is our goal

Our goal is to use the data produced by our service to create a predictive model that allows us to extrapolate from past behaviour and forecast future outcomes.

This can be achieved in several ways. In this article, we will focus on one approach, with alternative methods to be discussed in future articles.

Analysis

Collecting data

To perform this analysis, we need a dataset capturing the past behaviour of a specific aspect of our service. This dataset can take many forms — for example, logs or metrics.

Generally speaking, we need to divide our dataset into two parts: good or acceptable events versus bad or unacceptable events.

For instance, let’s say we want to focus on the latency of an interface. Latency is a continuous metric; it is not a binary one that easily allows us to assign a value of 0 to “bad” and 1 to “good” and split our dataset accordingly. However, we can simulate this binary behaviour by defining a threshold and classifying any value above it as “bad” and any value below it as “good.”

This approach allows us to isolate all the bad events, including their corresponding timestamps.

Aggregating our data

After filtering out the bad events over time, we now need to transform the data further to determine their frequency within a specific period. This step effectively groups and counts the bad events over defined intervals, giving us a rate — for example, events per second, per minute, per hour, and so on.

We can think of these periods as “buckets” that collect all events occurring within their duration.

This process allows us to produce a histogram showing the distribution of these events over time.

Fitting our data

Now we come to the core of the methodology. We have our event histogram — is there a way to find a mathematical function that fits our dataset and can be used for forecasting?

This is where regression methods and curve fitting come into play. We use these techniques to find a mathematical function or numerical model that expresses the behaviour of our service, either mathematically or numerically.

This task is not new; there are well‑established methods for performing it. Several well‑known regression techniques can help us find the best match for our dataset across a variety of mathematical function families. This functionality even exists in some calculators, which can perform curve fitting and determine the best match across seven or eight categories of mathematical functions.

Modern mathematical and statistical software is even more powerful. We can perform extensive curve fitting using MATLAB, R, SageMath, or even plain Python with libraries such as Pandas, SciPy, or Scilab.

Dangers of overfitting

Whenever performing curve fitting, there is always a risk of overfitting the data. When this happens, the model hugs the dataset too closely, matching precisely all the variations in the histogram. While this might look impressive visually, it is not useful for forecasting. Such a model captures all the noise and fails to allow for natural variation, leading to poor generalisation, misleading conclusions, and unstable behaviour.

The NIST Engineering Statistics Handbook warns about the dangers of overfitting. It recommends testing residuals to ensure they do not follow recognisable patterns, and validating the model by using it to predict future values before rolling it out to production.

In our example, we will test several models and compare their fit using the following criteria:

$R^2$ — coefficient of determination. The fraction of variance in our observed values that the model explains. It ranges from 0 to 1 (or 0–100%). A higher $R^2$ indicates a better in‑sample fit but does not guarantee accurate predictions or freedom from overfitting.
RMSE (Root Mean Square Error) — The square root of the average squared error; effectively the standard deviation of the residuals. Lower is better.
MAE (Mean Absolute Error) — The average of the absolute differences between actual and predicted values. Lower is better.
AIC (Akaike Information Criterion) — A model selection score that trades off fit and complexity. It is based on likelihood with a penalty for the number of parameters. When two models fit similarly, AIC tends to favour the simpler one, helping to avoid overfitting compared to relying solely on $R^2$ or RMSE.
BIC (Bayesian Information Criterion) — Similar to AIC but with a stronger penalty for model complexity (the penalty scales with the logarithm of the sample size). It tends to favour models with lower complexity.

Using our model

Now that we have a working model that fits our data to a certain extent, we can use it for two purposes.

First, we can determine which part of the cycle we are currently in, and second, we can predict spikes of unwanted events within a certain level of confidence. Depending on the nature of the model, we may have more than one solution. If our model is periodic, this means that recurring cycles exist.

This can be tricky because, depending on the situation, our period may be too long; if we do not have sufficient data to feed into the model, we may be unable to detect it. For example, if our dataset spans one month but the period is six months, there is no way to extrapolate meaningful predictions from such limited data.

On the other hand, if the data exhibit a shorter period of a few days, then a one‑month dataset is usually more than enough to establish a clear pattern.

If we end up with a functional model that includes timestamps, we can use it to determine exactly where we are in the periodic cycle — which means we can easily predict when the next spike or spikes will occur and remain extra vigilant.

Alternatively, if we do not have timestamps, we can rely on pattern matching. We can collect a small number of samples, group them into buckets, and then use the model to identify which part of the cycle we are currently in through pattern comparison.

Putting everything into practice

After describing the methodology, it makes sense to provide an example. We will apply the methodology to analyse one of my personal services — a GoToSocial instance with a single user.

For those unfamiliar with it, GoToSocial is an ActivityPub social networking server. It is written in Go and is lightweight enough to be hosted even on a Raspberry Pi. Despite the small size of the service, it is quite active and federates with hundreds of other ActivityPub instances, making it a realistic example of a small‑scale, real‑world application.

The data used are real, not synthetic. We will work with logs spanning the period from the 25th of February 2026 to the 1st of April 2026.

Presenting the error histogram

After collecting our logs for the above period, we apply the first step: identifying the errors and splitting our data into hourly buckets. This yields a dataset of 886 buckets.

We can see these buckets presented in the following histogram:

Even without any further processing, we can see macroscopically that there is some periodicity. So we expect our model to reflect that in some way.

The challenge now is to determine which category of functions is best for our model and, after selecting a family, what kind of fine-tuning we can perform to improve the fit even further.

Trying the trigonometric family

We can see that the dataset has a periodicity that appears to be daily, and it could potentially be fitted nicely with trigonometric functions. So this is what we will attempt first. For the curve fitting in this case, we use the NumPy library and, specifically, the linalg.lstsq function.

24h period

So this is what we will attempt first. Let’s try fitting a trigonometric function with four harmonics and a period of 24 hours. This gives us a fit with the following properties:

Period hours: 24.0
Harmonics: 4
R^2: 0.11478812632316349
Coefficients:
c0=206.5702885675613
c1=-45.11000222890022
c2=1.6343756945694634
c3=-2.0343489352492035
c4=-6.241474170882098
c5=-10.811558832427124
c6=-1.6575425592831072
c7=1.5152724170962195
c8=-0.8161734650591257

and the following fit:

We can verify both visually and from the fairly low $R^2$ value that the fit is not great — we are only capturing 11% of the observed values.

7-day (168h) period

Repeating the experiment with a 7-day period (168 hours):

Period hours: 168.0
Harmonics: 4
R^2: 0.03184541598545587
Coefficients:
c0=205.8633790794272
c1=16.795771629288875
c2=-8.342650759251661
c3=12.435800671861827
c4=8.168719034649968
c5=5.798148151755778
c6=3.0422911601659615
c7=-2.686076356710398
c8=1.0882103037176767

and the following fit:

We can see that even though $R^2$ is much better than the previous fit, a visual inspection reveals that the fit is actually worse. The projection misses much of the variability in our data.

Comparing different model families

After presenting the two trigonometric options, one may wonder what else we can do.

The answer is that we can create a script that compares many different model families and many different configurations within each family. I have done a test run with 43 different models from different families; here are the best performers:

Tested models: 43
Valid models: 43

Ranked valid models
===================
1. spline_s_auto
   R^2: 0.999896
   RMSE: 0.999504
   MAE: 0.435485
   AIC: 7.121179
   BIC: 26.263530
   Parameters: {'smoothing_factor': None}

2. spline_s_885
   R^2: 0.999896
   RMSE: 0.999504
   MAE: 0.435485
   AIC: 7.121179
   BIC: 26.263530
   Parameters: {'smoothing_factor': 885}

3. spline_s_850623.5396610169
   R^2: 0.900018
   RMSE: 30.999789
   MAE: 24.508399
   AIC: 6086.145327
   BIC: 6105.287677
   Parameters: {'smoothing_factor': np.float64(850623.5396610169)}

4. ets_add_add_s168
   R^2: 0.363532
   RMSE: 78.214129
   MAE: 41.467808
   AIC: 7736.227036
   BIC: 7784.082912
   Parameters: {'trend': 'add', 'seasonal': 'add', 'seasonal_periods': 168}

As we can see, the top three models belong to the same category but with different smoothing factors. The best performers belong to the spline category. The spline family of equations can be used for curve fitting, but these results are a textbook example of overfitting.

We can see that these models capture 99.98% of our dataset. If we graph one of these fits, it looks like this:

We can see that the spline fits even random noisy spikes. If we plot the residuals, they look like this:

We can see that for the majority of our samples, the residuals are zero. This is exactly what the guidance in the NIST Engineering Statistics Handbook warns against, and something we must avoid. Splines are useful as visualisation tools — especially with a smoothing factor — but they are not great for forecasting.

This is why the best fit is really model no. 4, which belongs to a family of functions called ETS (Error-Trend-Seasonality). This family of models is best described in this OpenForecast link by Ivan Svetunkov, a Senior Lecturer in Lancaster University Management School, who has written an excellent monograph on the topic — not only for ETS models but for forecasting in general.

In this case, the ETS model with the best match has a seasonality of 168 hours (7 days or 1 week). If we plot this model, it looks like this:

Validating our ETS model

After selecting our model, we can now perform validation using data from our service that were not part of the training set and collected during a random period of the week. When we apply the model to this dataset, we get the following results:

Metric	Training	Validation	Status
Rows	885	30	✅ Good split
RMSE	~78.21	41.01	✅ 50% improvement
MAE	~41.47	30.41	✅ 26% improvement

And if we plot it, we get the following:

As we can see visually, our predictions — though not perfect — follow the trend of the actual data and can be used for forecasting.

Keeping things up to date

After creating a model that can be used for forecasting successfully, we need to ensure it is regularly refreshed with new incoming training data. This will keep our model reflecting current reality and ensure it can be relied upon for forecasting.

Linking predictive analysis with SRE

We now have a model that we can use to predict future events. In SRE, we define significant events as those with a high burn rate. A burn rate, by definition, is an error rate.

We can expand the use of burn rate from a tool used to detect significant events happening right now, to a tool that forecasts when significant events will occur. This is the crucial link between SRE and predictive analysis.

We can now use our model to predict when an event with a specific burn rate will happen and be on high alert, or alternatively perform proactive actions such as scaling up our clusters in advance.

Layers of monitoring and logging

2026-02-20T17:00:00+00:00

Table of contents

Introduction
Three Ways of DevOps Refresher
Layers of Automation Refresher
Multidimensionality of Monitoring
Aligning the layers of automation with the layers of monitoring
- Monitoring the automation layers
- Monitoring application layer
Logging
Network monitoring

Introduction

A few months ago, I published an article about the various Layers of Automation we can find in a modern environment.

In reality, this article did not start its life a few months ago; it started several years ago, and I have presented it in various shapes and forms both internally in the various companies I have worked for and also at various events in order to highlight the hidden complexity of a modern tech stack.

There is something missing from this article—or, to be precise, not exactly missing but implied. As I have presented in another article, The Three Ways of DevOps, the second way is about feedback loops. It is really about monitoring and alerting.

So how could I ever present the various layers of automation without having a companion article about the various layers of monitoring?

In reality, these two concepts are closely related. In the same way I have presented the various layers of automation in the past in various forums and circumstances, I have also presented the various layers of monitoring, although I did not call them that.

Often I would present this concept using the “right tool for the job,” where I would explain why Tool A may be good for infrastructure monitoring but not necessarily ideal for application monitoring, which implies the existence of an infrastructure layer and an application layer.

This is the focus of this article, where we will continue the discussion from where it was left in the Layers of Automation article and expand it to cover the layers of monitoring.

Three Ways of DevOps Refresher

The first way: Systems thinking. Promotes system-wide design and ultimately automation.
The second way: Amplifying feedback loops. Introduces telemetry to all stages of the system.
The third way: Continuous experimentation and improvement.

Layers of Automation Refresher

The Layers of Automation article is a consequence of the first way of DevOps. We need to think systemically.

This also applies to how SRE should be done. We should not be optimizing for our part of the system but for the whole system.

This means building horizontal collaborations across teams that include product, project management, security and trust, and infrastructure people.

When we think systemically, we can identify that modern services can be analyzed across multiple layers, and each layer needs its own automation.

We can see this represented in the following diagram.

We can see that we can slice our infrastructure into major horizontal layers that may contain additional layers themselves.

This means that monitoring is a multidimensional space. We will come back to this later.

We also see that there are vertical layers that can span multiple horizontal layers, and in this I have included monitoring and alerting automation.

The vertical layers represent common needs that span all horizontal layers, such as monitoring and service discovery.

This means that there is a need for these things (monitoring and service discovery) for each and every one of the horizontal layers.

The major horizontal layers we can identify are:

Infrastructure layer (e.g., creating a virtual machine or a virtual private cloud, a kubernetes cluster, networking¹)
OS layer (e.g., a Linux VM)
Service layer (e.g., a database, a Docker engine, a Tomcat server)
Application layer (e.g., the actual service deployed in the Docker engine or Tomcat)

We can see that some of these layers can be broken down into additional layers. For example, a Kubernetes cluster has itself multiple layers—again an example of multidimensionality.

↩ Networking is a very special case; we will come back to this as well.

Multidimensionality of Monitoring

Before we move on to the alignment of automation and monitoring layers, let’s discuss briefly the concept of multidimensionality, since it will be needed later in the article to define relationships between layers.

According to Wikipedia, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it.

Another useful concept here is the definition of a hypercube. In its most simple form, a hypercube is a cube that contains nested cubes.

In this case, we have an n-dimensional cube, and we need n coordinates to specify properly any point within it.

The reason why this is important is that this concept is analogous to object composability and encapsulation in software engineering.

This means that we can treat objects within objects as hypercubes and use the coordinates concept to navigate this hierarchy.

A good example of this is an LDAP traverse path such as CN=John Doe,OU=Sales,DC=example,DC=com, where we need 4 dimensions in order to reach the John Doe object.

The same thing can be done with programming language packages, where in sales com.example.sales.UserManager, the class UserManager can be reached using 4 dimensions.

This is useful when we are designing the architecture of a service, but it is also very useful when we want to analyze the architecture of this service for monitoring purposes. Our monitoring effectively needs to match closely the design of the service.

This is the reason why seemingly simple questions such as “Is this service healthy?” often do not have a simple answer.

In order to answer this question, we need to take into account both the external dependencies of the service and the internal dimensions of the service.

Aligning the layers of automation with the layers of monitoring

After explaining the multidimensionality of monitoring we are ready to discuss about the alignment between layers of automation and monitoring.

Monitoring the automation layers

It goes without saying that we need monitoring for the automation itself. This comes from the second way of DevOps. We need to know what our automation is doing; if we have observability without monitoring, then we are in for a world of pain.

When things go wrong with automated systems, we usually experience disruption on a massive scale, because automation allows us to make changes at massive scale. Therefore, we need robust testing, monitoring, and alerting systems in place to catch any issues early.

This means that we need to have at least one layer of monitoring for each layer of automation.

For each of these layers, we have one dimension — time — and then, depending on which layer we are talking about, several other secondary dimensions.

We can arrange these dimensions in different ways. For example, at the infrastructure layer, if we have a fleet of virtual machines, we can represent these in several ways. Here are two examples:

As a fleet of systems, where the hierarchy is region -> cluster -> system -> metric, which we can then expand to reach an individual machine (please note the analogy with an LDAP traversal path).
As a collection of metrics that have the system properties encoded as labels. For example, in Prometheus we have
node_load15{region="us-east-1",vpc="vpc-12345678",instance="node1.example.com:9100",job="node-exporter"} 0.27.
We need four dimensions plus time to reach the metric for this particular node. Again, please note the analogy with the LDAP traversal path. The multiple dimensions and the fact that Prometheus needs to keep a separate timeseries for each value of each label results to issues when we have high cardinality metrics and why we need to be extra careful to restructure our metrics to avoid this situation.

The difference is that in the first case, the root is the hierarchy of systems, whereas in the second example, the root is the metric, and we use the additional dimensions to locate the specific system we need.

Monitoring application layer

The application layer has all the complexities described in the previous section, but it also introduces some additional dimensions that do not usually exist when we are monitoring third-party infrastructure or services.

In these cases, we usually do not have first-party visibility into what is happening inside the service itself. For example, when using a managed database, we typically do not have full visibility into its internal state.

This is not the case for home-grown applications, where we have full control over the source code, the testing suite, the CI/CD pipeline, and the deployment automation.

These are all additional layers within the application layer. We need to monitor each of them to define quality gates, ensure that our application is fully tested before it reaches production, and maintain a paper trail of all test results for quality assurance and audit purposes.

After deployment, we need to have application telemetry in place to track the health and state of the application.

By instrumenting our application with one of the libraries provided by major monitoring solutions, we gain a wealth of information about the internal state of the service. Once again, we can see that the application itself behaves as an n-dimensional space.

So far, we have focused only on monitoring, but what applies to monitoring largely applies to logging as well. If we configure our log aggregation properly, logging becomes the second most valuable source of telemetry for our services and infrastructure.

Logging

In a very similar manner to the layers of monitoring, we can also have layers of logging. We have our log aggregators connected to all components of our infrastructure, services, and applications, and they are continuously shipping their logs to a centralised location where we can process them.

Even though the logging layers closely follow the monitoring layers, there are some notable differences.

First of all, in the majority of systems, we have first-party logging. While most services may not have a native integration with our monitoring system of choice, they do provide logging, which we can hook into our log forwarder.

This makes logging extremely valuable because, in many cases, it is the only source of first-party telemetry we have for a system.

Secondly, there is a difference in the dimensional analysis of the logs.

Log dimensional analysis

While monitoring is largely structured — with a well-defined format that allows us to locate specific metrics — logging is mostly unstructured.

Most of the time, logs are free‑form strings, and we need to use query languages to filter log messages and find those related to the service we are interested in.

Depending on the log aggregation system we use, we may be able to add some structure to the logs by classifying them according to severity, application, and subsystem.

By doing so, we add three dimensions in addition to the time dimension, making a total of four.

The complexity of logging, compared to monitoring metrics, shifts from the hierarchy of data to the contents of the logs themselves.

In practice, we often end up using regular expressions or query languages such as Lucene.

Because of their unstructured nature, most people use logs primarily during debugging to see what happened with a service, while relying on monitoring metrics to detect issues in the first place.

This is a valid use of logs, but we can do much more with them.

Having a data lake containing the logs of all your services is a gold mine. There are many ways to query this data lake for insights about your applications — but first, let’s discuss how we can convert unstructured data into semi‑structured data for easier processing.

Using AI with logs

First and foremost, we need to realise that we do not necessarily have to deal with unstructured logs. If we are talking about first‑party logs produced by our own services, then we can agree on a format that can be parsed relatively easily.

However, this will not solve every issue. We will still be using third‑party libraries that produce logs in a variety of formats, so we will not be able to create a single filter capable of extracting all the value from our logs.

Additionally, we will likely encounter the issue of misclassification — when useful information that should appear at a higher severity level is classified as DEBUG by the developer, without realising that this data will be filtered out in a production environment where only ERROR and above are forwarded to the log aggregator.

Furthermore, if we have a system with a relatively small deviation from a known format (e.g., Nginx or Apache logs), we can create regular expressions to target and extract the relevant data.

We can use AI for all these use cases — first, to detect misclassifications, and second, to identify log patterns.

After detecting misclassifications, we can either notify the developers who own the service so they can correct the classification, or, if the logs come from a third‑party service, create filters that ensure valuable operational data is not excluded.

Where logs meet monitoring

It is fair to ask — why are we doing this? The answer is that if we apply these filters, we can extract information from the logs and convert it into metrics.

Then we can feed those metrics into our monitoring solution and combine them with native monitoring data to create powerful queries.

Of course, these derived metrics will not be as timely as real‑time monitoring metrics, but as I mentioned previously, sometimes logging is the only source of first‑party telemetry we have.

In addition to this, the content of logs can have an arbitrary number of dimensions. This means that a single log message can contain one or more valuable fields of information, and this number is not fixed.

Performing an analysis to determine how many dimensions we can extract from each log message that can be converted to metrics and eventually provide valuable insights about the state of our service.

Again AI can be a valuable assistant in detecting these patterns.

Network monitoring

Network equipment monitoring can be approached in two ways.

We can either treat it like regular infrastructure monitoring — tracking standard metrics from each network device such as memory usage, CPU load, and packet counts in/out — or as network tracing, where we focus on tracking interactions rather than device metrics.

Both of these approaches are valuable. Tracing, in particular, can reveal previously unknown relationships between services, misconfigurations, or even security issues.

We can also combine tracing with logging to better understand how services interact with one another and the state they were in at the time.

Cooking with Lui AI: An Intro to Retrieval-Augmented Generation

2026-02-14T02:00:00+00:00

Table of contents

Introduction
What is RAG - A refresher
The moving parts
- The model
- The frontend
  - Streamlit
  - Langchain
  - FAIS
The code
Running the app
Conclusion
References

Introduction

In my previous article, I mentioned Retrieval-Augmented Generation (RAG) briefly. RAG is one of the most important technologies we can use today to customize an LLM without having to retrain it. It helps us provide additional knowledge to a model, and we can use local LLM models in combination with RAG to query and process private documents.

In this article, we will present the equivalent of a very simple AI “hello world” agent using RAG. This code is mostly boilerplate and nothing special but will help us present and discuss about all the moving parts that go into creating a custom AI-enabled bot with extended context using RAG.

We will introduce you to Lui, a very cute AI-driven mouse chef that has read The Boston Cooking-School Cookbook by Fannie Merritt Farmer and is happy to help us with our cooking adventures.

What is RAG - A refresher

As mentioned in my previous article RAG, RAG is a method used by language models to retrieve relevant information from an external knowledge source and then use that to generate its answer. But what does that mean exactly?

Let’s analyze it.

Usually, the input of RAG is a document of some form. Depending on the complexity of your model and the interface you are using, it may support multiple formats as input or just one.

In its most simple form, it is just a text file that is read and then converted to a suitable format that will then be parsed by the model.

This text is read and then split into small chunks with a small overlap between them so no data is lost.

Then the whole dataset is used to create a vector store that will be used as an index for our model input.

When a query is submitted, the whole dataset is traversed, and the most appropriate chunks are passed to the model alongside our query. Then the model uses this input to generate an answer.

Please notice that all of these steps are independent from the actual model itself, which can be any model, local or cloud-based.

The moving parts

The model

First of all, we need to have an LLM model. It can be local or cloud-based, and it has to support some form of API.

In our example, we will use LM Studio in server mode as our backend, with Ministral loaded as our model.

LM Studio supports the OpenAI API, so we need to use some kind of client-side tool that also supports the OpenAI API.

The frontend

Streamlit

Any OpenAI-compatible framework is suitable for this kind of thing, and indeed there are several types for several programming languages.

You can build CLI tools using CLI frameworks or web-based tools using web-based frameworks in various languages.

In our example, we will use the Streamlit framework and Python.

Streamlit is open source and can be installed as a Python pip package:

pip install streamlit

Langchain

Another technology we need is the framework that will allow us to do the vectorization of our external document and create the necessary embeddings the model needs. We will use LangChain for this, an open-source project that will allow us to split the external document, do the vectorization, and create the embeddings.

Embeddings in this case is a method to represent the segments of our external document in a way that captures their meaning. Then our model can use these embeddings to compare them with others and determine how well the meaning of two segments matches.

Embeddings can be used for all sorts of things, not only in AI but also in search engines, social graphs, and generally to search, sort, group, and compare data.

LangChain also requires the sentence-transformers framework.

To install all of these, all we need to do is:

pip install langchain langchain-community langchain-openai langchain-huggingface sentence-transformers

FAIS

We also need the FAISS library, which allows us to do searches on dense vectors. It is also open-source and being developed by the Facebook team.

We can install all of the above with one command:

pip install langchain langchain-community langchain-openai langchain-huggingface streamlit faiss-cpu sentence-transformers

One word of warning: these libraries and frameworks have a lot of dependencies, and because the command above does not specify any versions, it may end up installing incompatible versions.

Instead of running the command above, I would suggest cloning the lui-ai repo, installing pyenv, and running:

pyenv virtualenv lui-ai
source ~/.pyenv/versions/lui-ai/bin/activate
pip install -r requirements.txt

The code

We can see all of the above combined in the app.py.

Specifically, we can see all the libraries imported in the first 7 lines:

import streamlit as st
from langchain_openai import ChatOpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
from pathlib import Path

We can see a function that sets up RAG and the backend LLM in lines 14–59.

# Function that loads the document and creates the RAG pipeline
def create_rag_chain(document_path):
    # Load the document
    with open(document_path, "r", encoding="utf-8") as f:
        document_text = f.read()

    # 1. Split the document into small "chunks"
    # This makes it easier for the model to find relevant information.
    text_splitter = CharacterTextSplitter(
        separator="\n",
        chunk_size=200,  # Chunk size (in characters)
        chunk_overlap=50,  # Overlap between chunks
        length_function=len,
    )
    docs = text_splitter.split_text(document_text)

    # 2. Create "embedding vectors" for each chunk
    # Embeddings convert text into numerical vectors that computers can understand semantically.
    # all-MiniLM-L6-v2 is a small, fast model specialized in converting text into vectors.
    embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

    # 3. Create a vector store (FAISS) to save and search embeddings
    # This is like creating a searchable index for our "textbook."
    db = FAISS.from_texts(docs, embeddings)

    # 4. Configure connection to the local LLM server (LM Studio)
    llm = ChatOpenAI(
        # ↓↓↓ Paste LM Studio's "API Identifier" here ↓↓↓
        model_name="local-model",  # Specify to use the local model
        base_url="http://p52:8001/v1",  # Address of the LM Studio server
        api_key="not-needed",  # No API key needed for a local server
        temperature=0.1,  # Low temperature to stick to reference text for reliable answers
    )

    # 5. Create the RetrievalQA chain
    # This chain combines a retriever (FAISS index) with the LLM.
    # When given a query, it first finds the most relevant text chunks,
    # then passes them along with the query to the LLM to generate an answer.
    retriever = db.as_retriever()
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",  # "stuff" means stuffing all relevant chunks into the prompt
        retriever=retriever,
        return_source_documents=True,
    )
    return qa_chain

Finally, we can see all of the above initialized and called in lines 62–98:

# Create the RAG chain using knowledge.txt
rag_chain = create_rag_chain("the-boston-cooking-school-cookbook.txt")

# =============================
# --- Streamlit UI ---
# =============================

st.title("Lui AI")
st.write("Lets get gooking")
logo_path = Path() / "lui-logo.jpg"

# Initialize chat history
if "messages" not in st.session_state:
    st.session_state.messages = []

# Redisplay messages from history
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# Respond to the user's input
if prompt := st.chat_input("Enter your question"):
    # Display the user's message
    with st.chat_message("user"):
        st.markdown(prompt)
    # Add the user's message to history
    st.session_state.messages.append({"role": "user", "content": prompt})

    # Get the LLM's response
    response = rag_chain.invoke({"query": prompt})
    answer = response["result"]

    # Display the assistant's response
    with st.chat_message("assistant"):
        st.markdown(answer)
    # Add the assistant's response to history
    st.session_state.messages.append({"role": "assistant", "content": answer})

Running the app

After everything is installed, we can run the app using:

streamlit run app.py

Which opens a port where Lui is ready to answer our questions about cooking:

So I asked Lui for his favorite coffee cake recipe:

Conclusion

It is important to remember here that whatever additional context we provide does not replace the fundamental training of the model. So Lui is quite happy to answer questions about Git and Python as well as cooking.

Choose your model carefully. Some understand code, others are good with maths, others with reasoning, some support vision and so on and so forth.

As you can see, we can add multiple documents per chatbot and even have different chatbots with different types of contexts, all directing their questions to the same backend model.

This allows us to deploy specialized apps that will run for us as expert systems against the same LLM depending on our needs.

Also, please remember that in my previous article, I mentioned that the OpenAI API allows us to take the answer of a previous question, add it to our context, and feed it to the LLM alongside our new question using the assistant/user roles.

This is a very powerful concept, and we can use it to create pipelines that will combine different models with different specialized contexts for different things that can implement advanced workflows.

References

Finetuning an LLM for local execution

2026-02-08T05:00:00+00:00

Table of contents

Introduction
Terminology
OpenAI API
LM Studio UI
RAG
MCP
Conclusion

Introduction

In my previous article, I referred to LLMs specifically optimized for local execution.

One would naturally wonder how this can be achieved and, if it is possible, what are the most common pitfalls I need to be aware of?

What are the pros and cons of local execution, and how does it compare against the major models?

These are the questions I will attempt to answer in this article.

Terminology

Before we begin our journey to local execution LLM fine-tuning, first we need to define a few concepts and why it is imperative to understand how they affect performance and accuracy in our results. The first concept to analyze is the context length.

Context Length

Context length, or context window, is the maximum number of tokens (roughly words or subwords) a large language model (LLM) can process in a single input prompt, including conversation history.

But what does this mean? What happens if we have a context length of 4096 tokens and we run over the limit?

Memory

Context length is basically how much of our discussion an LLM can remember. If we go over it, there are a number of options for how to handle this exception.

For LM Studio, we have the following options:

Policy	Behavior	Best for
Stop at Limit	Halts generation when full (reason: `contextLengthReached`).	Strict limits; avoids bad outputs.
Truncate Middle	Drops middle conversation; keeps system prompt, first user message, and recent end. Can loop infinitely if not capped.	Tasks needing early + recent info.
Rolling Window	Drops oldest messages; prioritizes recency. Safest for most chats.	Long conversations; forgets irrelevant history.

It is important to understand that everything we inject into a discussion—e.g., copying and pasting a script—becomes part of the discussion context.

If we are asking questions about a script or document we pasted earlier and we have a small context length, then after a while the actual script or document will be forgotten, and any answers we receive will most likely be hallucinations.

Models that are optimized for local execution tend to have much smaller context length capabilities than the major cloud-based LLMs.

Here is how Mistral AI’s Ministral model compares against several major cloud-based LLMs:

Comparison Table

Model	Provider	Context Length (tokens)
Ministral 3/8B/14B	Mistral	128k (up to 256k)
GPT-4.1 Turbo	OpenAI	128k–1M
Claude Sonnet 4	Anthropic	1M
Gemini 3 Pro	Google	1M–2M
Llama 4 Scout	Meta	10M
Sonar Large	Perplexity	128k–200k
Grok 4.1	xAI	128k–2M
Magic LTM-2	Various	100M

Ministral is one of the most capable local models when it comes to context length, and its maximum context length is actually quite sufficient for document analysis or long conversations. But even it cannot really compete against the major online models, which can handle contexts of hundreds of thousands of tokens without an issue.

However, when LM Studio starts, it loads the model with a default value of 4096, even though it can go up to 256k.

It is natural to wonder why.

Context length is directly linked to performance and resources consumed. If you have a large context length, then for every question asked, the model has to take the whole previous discussion into account and use it for its new answer.

If you add a document and the context length is big enough, then the whole document is added to the context and is scanned every time a question is asked.

This impacts performance and also the resources consumed.

Capacity

When we are increasing the context window, we are increasing how much the model can remember in each session. This means that the model needs more memory.

This memory can either be GPU or CPU memory. There is a way to choose how much of each will be consumed. This technique is called gpu-offloading and it is a way to have some of the processing done in the GPU and some of the processing done in the CPU.

This also affect the execution speed.

If you have a system that has large enough GPU memory so that it can load the whole of your model plus the context window, then this is the most performant scenario, otherwise you need to choose how much of the execution has to be offloaded to the CPU.

Quantization

Quantization is a technique to reduce the computational and memory costs of running inference by representing the weights and activations with low-precision data types like 8-bit integer (int8) instead of the usual 32-bit floating point (float32).

Reducing the number of bits means the resulting model requires less memory storage, consumes less energy (in theory), and operations like matrix multiplication can be performed much faster with integer arithmetic. It allows us to run models on embedded devices, which sometimes only support integer data types.

It also means that any results we have might be less accurate. Many LLMs that are optimised for local execution are quantized.

Ministral AI model used in this article as example is not quantized although quantized versions of it exist.

Putting everything together

LM Studio has been moving toward a direction where it can be used without its user interface. After installation, it provides a command called lms. With lms, we can start LM Studio as a service and, while doing so, configure a number of parameters.

If your system does not have resources and you try to load LM Studio, it might crash. So lms has a dry-run flag that does not load a model with the specified parameters—it just tells you if it thinks this is possible or not:

lms load --estimate-only mistralai/ministral-3-14b-reasoning --context-length 120000

But even this method is not bulletproof. You need to start small and perform many experiments:

lms load mistralai/ministral-3-14b-reasoning --context-length 12000 --gpu=0.2

This command offloads 20% of the operations to the GPU. If your GPU does not have much memory, then it won’t allow you to define a context window larger than this—of course, this is system-dependent.

If you have enough memory in the system, you can try doing everything on the CPU and increase the context length as much as you can:

lms load mistralai/ministral-3-14b-reasoning --context-length 15000 --gpu=0.0

In my test system with 76GB RAM and 4GB GPU memory, these two options allowed me to execute the same simple query of explaining a very small shell script in 1 minute and 1 second with GPU offloading at 20%, and 1 minute and 4 seconds with 0% GPU offloading.

Disabling the GPU offloading allowed me to go way higher in context length. It allowed me to extend the context length to 120,000 tokens, load a PDF of 600 pages, and ask the AI to summarize it. It took 1 hour and 30 minutes to do it at the breakneck speed of 0.4 tokens per second, but it worked.

OpenAI API

After loading the model, you can start the server. By default, LM Studio listens only to localhost IP 127.0.0.1 and port 1234. This can be modified in order to allow for external connections.

Of course, one must be careful and not expose their LLM to the world, so normal precautions apply.

This can be done both from the LM Studio UI and from the command line with the following command:

lms server start --bind 0.0.0.0 --port 1234

Now our server is up and running and can accept connections from other systems. We can verify its state by using:

lms status

Server: ON (port: 1234)

Loaded Models
  · mistralai/ministral-3-14b-reasoning:2 - 9.12 GB

Please note that you do not need to do this in order to use the LM Studio CLI. You can use it without starting LM Studio in server mode.

The reason why you need to do this is because the LM Studio CLI is quite restrictive. It does not support certain extensions that are available only via its UI. We will cover that in the next section.

If you just want to have a quick chat with the AI, all you need to run is:

lms chat

When the LM Studio server is running, it allows us to execute commands against it using an OpenAI-compatible API.

This is very important and very powerful because we can use it with any of the already available tools that are compatible with the OpenAI API.

We will analyze this even further in later sections, but for now, all you need to know is that at minimum you can use any HTTP-enabled library to submit queries to the AI.

We can see an example of a very minimalistic interaction below:

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "local-model",
    "messages": [
      {"role": "system", "content": "You are an expert developer."},
      {"role": "user", "content": "Is Rust better than Python?"}
    ]
  }' | jq '.choices[0].message.content'

In this interaction, it is important to highlight that the JSON we are submitting to the AI server consists of an array with several entries. These entries identify a role and content for that role.

The OpenAI API supports three roles:

System role: This is the role where we give the AI its basic instructions (e.g., “You are an expert in Python”).
User role: This is where our questions come in. Any user interactions are done via the user role.
Assistant role: This is the role used for any responses from the AI to our questions.

This array is the context of our chat; it is saved in the context window alongside anything else we want to inject into it.

We can take this JSON array and transfer it between different AI chats, taking the output of specialized models for specific tasks and using it as input to other AI models, thus creating AI pipelines.

This is a very powerful concept where we can combine AI models/agents of different specializations and gradually build our context, taking the responses from one step of the pipeline and feeding it to the next.

Using this approach, we can optimize for skills and for cost. We can start an AI pipeline using local models that do not cost anything beyond the actual computational resources they use, and when it comes to doing something beyond their abilities, use one of the expensive major cloud-based models only for that specific step.

LM Studio UI

As mentioned in the previous section, LM Studio started its life as an application with a graphical user interface. Its headless operation is a recent addition.

As such, its GUI supports all the bells and whistles—even some things that are not supported via its CLI yet.

Via the UI, we can tweak the details of the model: its context size, its GPU offloading, how creative it can be, the port and IP the server will listen on. We can browse the available models and load one or more.

We can also have a collection of chat sessions that can be preserved between restarts of the application.

This is something that is not supported by the CLI. Under the hood, what the UI saves is our context JSON array, and the next time we interact with a chat, it loads the whole thing into the context.

The UI also has some unique features that are not supported by the CLI yet. It can allow us to mount several files in our context using RAG. We will explain what RAG is in the next section.

The UI allows us to use RAG by attaching up to 5 files with a maximum of 30MB total.

It also allows the model to execute JavaScript scripts in a sandboxed environment.

This is very important because, alongside RAG, it improves the precision of the responses and reduces the chances of hallucinations.

Here is an example of such a case:

This is one of my classic test scenarios I use when I want to evaluate the quality of a financial tool, calculator, spreadsheet, or—in this case—AI.

The correct answer is not £160.72 as the AI suggested here. The correct answer is £154.63.

Here is what a cloud-based AI has to say about this:

This is a very clever trick where an AI will try to use a programming language to answer any math-oriented question. It reduces the chances of hallucination considerably.

But in order to be able to do this, it needs a runtime environment, and since it is dangerous to allow an AI to run scripts on a system, this has to be in a sandbox.

This functionality does not exist in the LM Studio CLI yet, but it exists in its UI.

RAG

RAG in AI usually means Retrieval-Augmented Generation: a method where a language model first retrieves relevant information from an external knowledge source (like documents, a database, or search index) and then uses that retrieved content to generate its answer.

What It Does

RAG is used to make LLM outputs more accurate, up-to-date, and domain-specific by grounding responses in information outside the model’s training data.

How It Works (Typical Flow)

You ask a question.
A retrieval step searches a knowledge base for the most relevant passages.
Those passages are added to the prompt/context, and the model generates an answer using both the retrieved text and its general language abilities.

In the backend, your additional files are converted into vectors and stored in a special database that allows the AI to search them and include them in its reasoning. We can see all of that diagrammatically in the following diagram:

Why People Use It

RAG can reduce “hallucinations” and improve factual reliability because the model can base its response on retrieved source material rather than memory alone.

Availability

RAG is supported by the LM Studio UI but is not supported by its CLI. Even the UI allows us to attach up to 5 documents in a chat and a maximum of 30MB.

Then we can use the AI to have a discussion about our documents.

Another way to use RAG is to use the OpenAI API that LM Studio supports. There are third-party tools that can act as a frontend and use LM Studio as a backend.

One of these tools is AnythingLLM. AnythingLLM is a competitor of LM Studio. It does almost everything LM Studio does, and in some cases, its capabilities exceed those of LM Studio.

When it comes to RAG, for example, AnythingLLM does not have a restriction on how many files and how big you can attach to a discussion. Of course, all of these have to be supported by the backend—you need a big enough context window and enough memory and compute power.

It is also worth mentioning that although I am using LM Studio as my primary example here, almost everything mentioned in this article can also be achieved by using AnythingLLM and Ollama, with Ollama being the backend and AnythingLLM the frontend.

MCP

The Model Context Protocol (MCP) is an open standard and open-source framework introduced by Anthropic in November 2024 to standardize the way artificial intelligence systems like large language models integrate and share data with external tools, systems, and data sources.

MCP provides a universal interface for reading files, executing functions, and handling contextual prompts. Following its announcement, the protocol was adopted by major AI providers, including OpenAI and Google DeepMind.

MCP is supported by both LM Studio and AnythingLLM. It allows us to define integrations between various services/components.

An LLM may use a search engine to retrieve fresh information about a topic, or connect to a database or an API in order to retrieve and analyze data.

This is one of the ways we can offload data parsing from our context to an external optimized service.

Instead of embedding the data to be processed in the context, we allow the model to retrieve it from an external service that supports MCP and then analyze it.

This is also how agentic AI works. It uses MCP to interact with other systems/agents and perform actions.

It is also where any potential hallucinations become reality if the AI is left to run without supervision.

Many companies who handed over their customer service to unsupervised AI agents discovered why this is a bad idea.

Conclusion

In this article, we presented the various concepts that come into play when we are using an AI model and presented how we can combine them in order to fine-tune a model to be used for local execution.

We also presented various tools and explained how they can be used in an optimal way to query a local AI model or as part of a wider pipeline that combines both local models as well as cloud-based ones.

As we move forward in time and agentic AI becomes more prominent and more costly, being able to use optimized models—both local and cloud-based—will be key in order to keep our costs low and our quality high.

To AI or not to AI

2026-02-05T05:00:00+00:00

Table of contents

Introduction
An Attempt to See Through the Smoke Screen
Environmental and ethical concerns
- Environmental Issues
- Ethical Concerns
Cloud-Based and Local AI
Open Source AI
Conclusion

Introduction

AI is one of the hottest topics for discussion over the last few years. It has been a prominent subject at major technological conferences and has driven sales worth billions of dollars in stock markets worldwide.

It is being hailed as one of the major breakthroughs of our time, and many people believe that computers will soon gain consciousness and stand alongside humans as fully sentient beings.

The long-promised Generative AI (GenAI) is expected to usher in a new era of prosperity for humanity.

Or is it?

This is what we will try to analyze in this article:

Is AI really what it’s advertised to be?
Will it deliver on all these promises?
What can we use it for?

An Attempt to See Through the Smoke Screen

First of all, we need a clarification: there is no such thing as AI as a distinct technology—it’s not a singular entity or a finished product. Rather, AI (Artificial Intelligence) is an umbrella term—a marketing label—that groups together a wide range of computational techniques aimed at mimicking aspects of human-like cognition.

For decades, we have used machines to perform non-deterministic computational tasks—tasks where outcomes aren’t fully predictable from the input alone. Over time, different terms have emerged to describe these capabilities: expert systems, neural networks, deep learning, machine learning, and so on—all significant milestones in the evolution of what is now branded AI.

Each of these technologies has its own significance and has contributed meaningfully to today’s AI revolution. The underlying approaches have evolved continuously: algorithms have become more sophisticated, data availability has surged, and hardware has scaled dramatically—yet substantial variation remains across methods, architectures, and use cases.

That said, the surge in public interest—and the popularization of the term AI—has largely been driven by Large Language Models (LLMs). These systems are capable of simulating human-like conversation in natural language, generating coherent text, code, and more.

LLMs—and AI systems broadly—come in many forms: some are general-purpose, others highly specialized. Some require massive computational resources (e.g., training or running billion-parameter models), while others can run efficiently on a standard laptop—even a mobile device.

There is still enormous variation among these systems in terms of:

underlying architecture (e.g., transformers vs. decision trees),
trained capabilities (e.g., reasoning vs. pattern recall),
data requirements,
and computational footprint.

Yet, despite all these differences, we collectively refer to them as AI.

Is It Actually Useful?

Absolutely yes.

As we mentioned in the previous section, we have been using it for years. AI or ML (Machine Learning) has been in continuous use for many, many years.

It has been used for detecting trends (market or social), for optimizing search results, for predicting our next word and providing alternatives and spelling suggestions, for transcribing audio, and for translating text.

It has been used in improving our photos and beautifying our selfies.

It has been used everywhere. We just did not call it AI then; we usually called it “the algorithm.”

The algorithm of Facebook, of Google, of TikTok, of Snapchat, and so on and so forth.

It was built into almost all online platforms and almost all phones. All financial institutions used it to create profiles for their customers; all trading houses used it to predict the behavior of the markets.

All retail and hospitality used it to predict what the next best trend would be.

All advertising companies used it both to profile people and to sell ads.

It will eventually be used everywhere, from consumer devices such as mobile phones (already happening), all the way to industrial equipment where it will be used as part of production, quality assurance and manufacturing.

If you are a farmer you could have a weeding robot detecting individual weed plants and zap them.

The influence of AI in modern professional environments can not be ignored.

Is It Actually Sentient?

Absolutely not.

All AI does is try to identify the most likely answer to a question, within a certain level of uncertainty, using some initial training datasets.

This also applies to all scientific fields that use statistics to do their jobs, from social sciences to engineering, medicine, etc.

When working with stochastic models, you can never guarantee a 100% success rate. You can often provide pretty good results, though.

But in order to do so, you need to control all the parameters of the experiment.

If you have garbage in, you will most likely have garbage out. Many of the AI models available today were trained on unsafe datasets—datasets that contained factually incorrect information and implicit or explicit bias.

As such, even though the technology may be sound and could potentially produce good results, the final result cannot be trusted.

This means that the original promise of many AI companies—that it could replace their workforce—is false, and those people were hyping and mis-selling what AI could do.

AI is not sentient, and it cannot be held accountable for its actions. If it gives false health advice to people and those people are harmed because they followed the AI’s advice, it cannot be held accountable for its actions.

Even though these weaknesses can be addressed, and there are quite a few clever tricks a model can use to minimize the uncertainty of its answers, the above statement cannot be bypassed.

Will It Take Our Jobs?

Unfortunately, the answer to this question is a partial yes.

It is the same question that people asked during the Industrial Revolution and also when the first automated loom was introduced.

AI is most likely the next major productivity enhancer, which means that it will allow a user to do more with less.

This inevitably means that someone is losing business. Someone’s sales will drop, which means that someone will lose their job because of this drop in sales.

Is this a bad thing?

Of course it is.

How do we avoid it?

The answer is still the same as with all those previous times this question was asked. We need to adopt AI in our workflows and focus on training to do what AI cannot do.

Focus on solving problems people have that AI cannot solve independently.

Retrain and refocus.

Environmental and ethical concerns

Environmental Issues

AI companies are being accused of consuming too much power and too many resources.

This is absolutely true.

But then again, this is true of many industrial enterprises. It is interesting that it is only now that people have started wondering about datacenter power consumption.

They did not think to challenge the power consumption an Amazon, Google, or Microsoft datacenter was using five years ago, when it was still using AI—but under a different label.

People calculate today how much energy an AI-based Google search has consumed but never bothered calculating this 5 years ago, even though Google was using AI even then to optimize search results.

Or they never asked how much energy went into curating their algorithmic Facebook or Twitter feed.

But then again, Machine Learning was not a hot topic then. AI—which, I might add, will become sentient any time now—is hot today.

Even non-technical people know of its existence, so it is a good topic for a newspaper article.

Having said all that, the foundation of this argument is true. Industrial enterprises such as AI, cryptocurrencies, large-scale datacenters, or even scientific enterprises such as the Large Hadron Collider need a lot of power, and we should make sure that the environmental impact of such enterprises is as low as possible.

It is in times like these, when the industry turns its attention to a problem for its own needs, that they are fully motivated to find interesting solutions.

In this particular case, I believe that the push for AI will also mean a boost for green energy.

We cannot discover new oil fields whenever and wherever we want, but we can easily take a previously unused industrial estate and plant a solar farm in it and make it produce the energy we need.

In case solar panels are not efficient enough, hey, we have a newfangled productivity enhancer called AI that we could use to make things better.

We can also use it to improve the efficiency of other green energy sources, including nuclear energy.

I believe that even though government policy introduced the migration to a carbon-neutral economy because of climate change, the need for extra energy for AI will accelerate the evolution of green technologies and eventually benefit the whole society.

Ethical Concerns

Oh boy, this is a real stinker.

I am not sure if there has ever been any significant technological advancement with as many ethical concerns as AI.

LLMs started their life by plagiarizing the whole internet, even without consent from the owners and creators of the content.

Major tech companies have taken advantage of terms and conditions agreements designed ages ago to allow people to scan their email and personal data for spam detection and security purposes to train their models.

Artists find their works of art ungracefully scraped by AI companies and then reused without license or permission to create other works of art.

Journalists and authors find their work also scraped without license or permission and used for training these models.

People find their images edited without their consent and then posted again on the internet, with or without clothes on.

Scammers are using all of the AI tools to impersonate people, steal their identity, and money.

Only recently I read about a scam where scammers impersonated people’s voices using AI in order to make money transfers or place orders using phone services that use voice recording as proof that the owner actually requested these transfers or orders.

There are even more concerns when it comes to AI-enabled social media that use psychological manipulation to increase engagement at the detriment of their users’ health, mental or physical.

I could go on and on and on, but the fundamental question is: who is responsible for the use of a tool—the tool itself or the user?

The answer is the user. Always.

So all of the above could have been avoided if the owners of these models took a more responsible approach.

But alas, this was never meant to be.

This is a gold rush, and in a gold rush, you do not have a calm discussion about where to dig. You have a stampede.

It is part of human nature.

But it didn’t have to be this way. The companies that participated in this gold rush are not half-starved diggers. They have balance sheets that are bigger than the national budgets of some countries.

They could behave as responsible adults, but they didn’t.

There is no excuse for that.

Unfortunately, the situation is still evolving, and there is no indication that the protagonists of the story are taking into account the risks of what they do at this breakneck pace.

People will be hurt because of this sloppy and haphazard approach, and the AI companies will have to answer for it. But I suppose they don’t care. They have enough cash to challenge anyone in court, and in the worst case, pay whatever fines they get slapped with.

Everyone in the AI race is busy making money.

Cloud-Based and Local AI

AI and machine learning do not necessarily have to be cloud-based. Nevertheless, in the last few years, they have been dominated by cloud-based operations.

It was simply too expensive to buy and own the necessary hardware to make LLM training possible.

As the technology evolves, however, we see models emerging optimized for local execution. These are often referred to as edge-optimized models.

Where cloud-based AI is often criticized for its high environmental footprint, local AI execution does not have this problem. It can be part of our local computational operations, which means that power consumption and overall environmental impact are considerably lower.

I do not believe that local AI models will replace cloud-based AI compute, because some things are just too large to run locally, even for big companies. These use cases—industrial and environmental simulations, protein/medical research—will still require cloud-based AI.

However, as technology evolves, local models should be able to cover the needs of most people. The ability to use a local model as an expert system exists today.

We can run local AI models that are multimodal, which means that they can understand multiple forms of input. Text, video, or audio inputs are quite powerful and standalone. They do not depend on any external resources.

They can, of course, be extended by the use of external resources, but they can also operate offline.

This last property makes them ideal for isolated/sensitive environments.

When the other major concern of cloud-based AI—apart from environmental impact—is privacy, and the very well-documented past history of not respecting creators’ licenses and just scraping content anyway, many people believe that any information submitted to any online AI system will eventually become part of its training set.

This is not the case for local AI models. These models are isolated, and as the phrasing goes, “what happens in your local AI stays in your local AI.”

So these models are ideal for environments that require discretion and privacy.

Another potential use case for these local models is their use in embedded devices. I fully expect, in a few years, to see AI-enabled embedded systems, either in the form of household/industrial devices or in the form of personal computing/communication devices such as phones, tablets, etc.

I expect a lot of the AI work to happen on-device for most major phones, thus preserving the privacy of their owners. The same thing will apply to smart home devices and to smart computing devices such as routers, firewalls, etc.

But we will also have AI-enabled field devices, like the robot weeder I mentioned previously, but also AI-enabled drones, AI-enabled irrigation systems, and so on and so forth.

Eventually, we will see specialized AI models in embedded devices that will partially replace the hardcoded logic that we see today and allow for more flexible operation in the field.

A good example of this is firewall rules in embedded firewall and router devices. Using an AI model that can do attack pattern recognition could potentially provide us with more flexibility than the hard-coded block lists we use today.

Open Source AI

At the periphery of the AI giants, we gradually see an AI open source community emerging.

This is not a new thing. Even OpenAI was initially meant to be a non-profit organization. There are several major companies that have released versions of their models as open source, even though we often do not have full visibility of their training set and what has gone into that training.

Despite all that, the existence of these open source models is crucial.

In the same way that the open source movement allowed people to learn, experiment, innovate that eventually created the amazing technological wealth we see today, we need a robust open source community for AI as well.

It is the only way to heal the wounds of the past where AI evolution hurt people in the process. We need ethical AI and open AI.

It is also the only way that niche groups could support their use cases. The major AI players will always focus on where the money is, but that does not mean that fringe ideas are not important.

Most of today’s scientific breakthroughs started their life as fringe ideas, and quite often it took a long time for the original scholar or entrepreneur to make the world understand the value of their innovation.

One of the most recent examples of this is mRNA vaccines.

So we see various communities and tools emerging that support and encourage the development of open source AI models, tooling, tests, and data sets.

We can name a few here.

Hugging Face is an online community where people can collaborate on building AI applications. From here, you can collaborate with people and download, upload, design, and host AI models, applications, tests, and data sets.

It reminds me of how GitHub has helped communities form around specific projects.

LM Studio is a free tool that allows you to download and run open source models. It supports the Hugging Face format, several models, offers a graphical interface for interacting with the model, supports MCP integrations as well as RAG, but it also allows you to use it as a server offering both a CLI mode as well as OpenAI API compatibility.

You can run it in headless operation, and it can work with both GPUs or CPUs.

You can also use it as an interface to online AI models if you have an API key for them and a subscription.

AnythingLLM is another tool that can be used as an interface to either local or online models. It can run models on its own, or it can be combined with Ollama as its backend for local execution of models.

Ollama’s purpose is to allow us to run automations using local AI models, so it can be used as the backend for various automation tasks.

Another tool focusing on automation and potentially local execution is OpenClaw. OpenClaw is a personal AI assistant that you can run on your own devices, and you can use it to create integrations across a number of services.

It is primarily designed to work with Anthropic and OpenAI models, but the important thing to remember here is that most of the tools we mentioned previously support the OpenAI API. So we could use OpenClaw with any one of those local models instead of the cloud based ones.

With OpenClaw, someone could use an AI model to drive automations—for home or otherwise—using messaging systems as control planes, in the same way we can use Slack bots to trigger remote actions.

Of course, it can also be used to receive notification events from your various automated systems.

Conclusion

I understand that some people—quite a lot of people—are hurt by the use of AI, either directly or indirectly.

AI is a technology that can be used for good or bad. At the very minimum, some people will lose their jobs; it will be used extensively by scammers and crooks to impersonate people, steal their money, and steal their identities.

It will be used by bad actors to create fake videos and misinformation.

Not using AI for good purposes does not mean that these bad actors won’t use it for bad purposes.

My suggestion is that we should not let AI be used only by these bad actors. We can use it for good.

We can use it to improve our quality of life and create new solutions for existing hard problems.

We can also use it to detect and protect ourselves from any negative uses of AI.

The one thing we cannot do is ignore it. We cannot pretend it does not exist and go back to a pre-AI world.

The genie is out of the bottle.

Garmin vs Garmin

2026-01-15T05:00:00+00:00

Table of contents

Introduction
In Olden Times
The New Shiny
Modes of Operation
Optimizing for Battery Longevity 1
Optimizing for Battery Longevity 2
Conclusion

Introduction

Often in reviews, we see comparisons between big brands or even models of the same brand, so it is easy to find articles that compare Apple smartwatches vs. Garmin, Coros, or Casio, and it would be easy to do one of those as well.

But in this article, we will do something different. I want to highlight the different ways you can use the same watch and explain why it may make sense to do so. We will use the Garmin Instinct 2 Solar for this.

In Olden Times

Back in the day when watches were first invented, they could do one thing. They could tell time, and often even that was done poorly. So things were simple, and there was only one way to use your watch. Gradually, as time went by, watches became more elaborate, and complications started appearing.

So a date function was added, then a chronograph complication was added, and eventually GMT watches appeared that allowed people to track a second timezone if they wanted or even track daylight saving time in their own timezone.

Things went completely crazy when digital watches were introduced because of the flexibility of the LCD display. So now we can buy a simple digital watch that supports a perpetual calendar, multiple time zones with separate DST support for each one, multiple alarms, countdown and stopwatch timers with intervals or lap support — and on top of all of this, it can be solar- and potentially radio- or Bluetooth-calibrated for less than $100.

The New Shiny

When smartwatches were introduced, they took all that digital watches could do and evolved them to a different scale. You can do all of the above and much, much more. So the complexity has increased exponentially, and each new feature only adds to this.

Where there used to be only one way to use your watch, you can now pick from the plethora of features your watch supports and use them as you see fit. I can argue that today it is impossible to use all the features.

Smartwatches with sports profiles have such a large collection of sports profiles that it is impossible to do them all. You will only choose a handful. But this does not apply to sports-related features but to the health-related features and the main watch features as well.

All of this complexity and power, of course, needs to be satisfied by using fast CPUs and large batteries. Where in the past a watch would keep ticking for decades with some maintenance, the modern landscape is filled with obsolete smartwatches that have reached the end of their life because their non-replaceable battery died or their manufacturer stopped supporting them — and without their support, some of their features are disabled.

So one might ask: is there a way to combine the longevity of the past with the flexibility of the present?

This is the topic of this article.

Modes of Operation

The first thing that fails with these sealed, non-repairable devices is the battery. It is a consumable with a limited lifespan; it is supposed to fail after a certain number of recharge cycles, and it does. The second part that fails is the charging port. This is a mechanical component that is continuously connected and disconnected, and eventually it wears out.

The third most common failure is when the manufacturer stops supporting the device. Some devices can operate standalone, and others cannot. With Garmin, we are in luck because all the basic information can be accessed on-device, even though in order to get the full benefit of the supported features, you really need to link it with the Garmin Connect app and friends.

So assuming the third most popular failure mode is not a problem, how can we mitigate the first two and ensure that our device lives as long as possible?

The answer is simple: you preserve battery life as much as possible, and this means running your watch with the absolute minimum you need. Garmin allows us to fine-tune our watches to such an extent that with the Garmin Instinct 2 Solar and full health monitoring on — plus 3-4 training sessions per week — I can get more than 20 days between charges.

If I switch the watch to power-save mode and enable it only for training, plus step tracking and the occasional contactless payment, I get more than 60 days. All of this without any serious exposure to the sun, which means that you can get the same results with the Garmin Instinct 2 without the solar panel.

The Garmin Instinct 3 has even better battery life, and more premium models are even better than the Instinct 3.

All this is great, but so far we have not done much to prolong battery life beyond enabling power-save. Is there anything else we can do?

The answer is yes: we can preserve the charging port by not using it.

Many people think that the solar panel used in models like the Garmin Instinct 2 Solar is a gimmick because you have to be outdoors more than 4 hours per day under strong sun in order for this to make a difference. Not many people are, and even if we were, the sun in Scotland is not strong at all.

But there is another way.

During my experiments with my watches, I discovered that instead of waiting for the sun to make an appearance behind the clouds, you can actually use a desk LED lamp. If it is close enough to the solar panel, then the light intensity is significantly stronger than outdoor sunshine on a bright Mediterranean day — and without the heat.

This means that you can leave your Garmin under the desk lamp overnight, and it will charge right up and be ready to be picked up again in the morning.

In addition to preserving the charging port, this also preserves battery health.

Let’s break down how this works.

Optimizing for Battery Longevity 1

Modern lithium rechargeable batteries start deteriorating after about 300-500 full charge cycles, at which point their capacity is reduced to 80% or even lower.

Let’s see how fast a battery will reach that point. For our comparison, we can use three watches: an Apple Watch that needs daily charging, an average smartwatch that needs to be charged every week, and a Garmin Instinct 2 Solar with its two modes of operation — normal and power-save — that provide battery life of 20 days and 60 days, respectively.

Apple Watch

With daily recharging, an Apple Watch battery will start deteriorating at the 300-day mark — that’s less than a year.

Generic Smartwatch

A generic smartwatch with weekly charging will start deteriorating in $\frac{300}{7} \approx 5$ years — not bad at all.

Garmin Instinct 2 Solar Regular Mode

A Garmin Instinct 2 Solar gives us 20+ days battery life in normal mode operation with heart rate tracking, 4-5 training sessions per week, but not much GPS usage. As such, we will get $\frac{300}{\frac{365}{20}} \approx 16.5$ years out of it. Now we are getting somewhere.

Garmin Instinct 2 Solar Power-Save Mode

With power-save mode, Instinct will give us $\frac{300}{\frac{365}{60}} \approx 49.3 years$. Not bad at all.

Optimizing for Battery Longevity 2

Is there anything else we can do?

Sure, this is where things get really interesting. You see, in the original phrase above about battery charge cycles, the operative word was full charge cycles. This means the battery goes from 100% to 0% and then back.

But most people don’t let their devices go down to 0%; they charge them earlier, so they do partial charges. So what happens if we assume we are charging our watches when the battery is at 75%? This means we are only doing a 0.25 charge. We just need to multiply our previous results by 4.

In this scenario, we have the following results:

Watch	Years for 100% Charge	Years for 25% Charge
Apple Watch	0.82	3.28
Generic smartwatch	5	20
Garmin Instinct 2 Solar regular mode	16.43	65.75
Garmin Instinct 2 Solar power-save mode	49.31	197.24

And that is how we take a device with a known fixed lifespan and optimize it to be usable for way longer than expected.

At this point, I would like to point out that the Garmin Instinct numbers can also be achieved without the solar panel, but in this case, the charging port will eventually wear out long before the battery does.

Conclusion

The tl;dr is: if you have a Garmin Instinct 2 Solar, charge it using the solar panel under a desk lamp every 15 days, and the battery will last several lifetimes and your charging port will be in pristine condition for several decades.

Debugging a DMR hotspot

2026-01-02T10:00:00+00:00

Table of contents

Introduction
Getting a Hotspot
What Does WPSD Look Like?

Introduction

DMR stands for Digital Mobile Radio and is one of several digital modes available to amateur radio operators. Digital modes use encoding to convert either voice or data to a format that can be transmitted over the air via radio and then received and decoded by another radio.

Digital modes have some benefits over analog modes and, under certain circumstances, can deliver superior quality or even allow communication under very difficult conditions. They are usually more efficient than analog radio and allow multiple channels per frequency.

In addition to all the above, digital modes are also designed to use internet-enabled gateways that allow us to connect two remote gateways via the internet and enable users connected to one gateway to communicate with users connected to the other.

This can be done on a commercial as well as an amateur level.

For amateur networks, we have several digital modes, DMR being one of them. We have several digital mode networks all over the world, with BrandMeister being the biggest one. If you open the BrandMeister website, you will see some statistics. You will see how many network masters exist, how many repeaters, and how many hotspots.

The repeaters connect to the masters, who are responsible for call routing. You can connect directly to a repeater with your DMR radio and participate in the discussion if you are in range.

However, if you are not in range, you can roll your own single-user repeater called a hotspot.

This is a very common use case because a lot of people are not close to a repeater, and having a hotspot allows them to participate in digital mode amateur radio. With the cost of a Raspberry Pi and a HAT, you can make contacts across the world.

These hotspots are the focus of this article. We will cover how they can be configured to access digital mode radio networks and how to debug them if things go wrong.

Getting a Hotspot

Acquiring a hotspot is not complicated. There are several off-the-shelf products that you can buy, and this topic has been covered extensively by several people who have written blog posts or published extensive video tutorials. See here, here, here and here for example.

A DIY-minded person can just purchase a Raspberry Pi and an MMDVM Pi HAT, flash it with one of the specialized distributions that exist — such as WPSD — and hey presto, you have a digital modes hotspot. These hotspots support more digital modes than just DMR, but we will focus on DMR in this article.

A hotspot can connect to many DMR networks simultaneously, and the user can configure the hotspot to use specific prefixes for each network to indicate which one they want to access. In addition to multiple networks of a specific mode, you can have multiple modes enabled with multiple networks for each mode.

If you overdo it, things might become complicated, and it is very easy for a change to mess up the configuration of the hotspot.

In this article, we will explain how we can debug the hotspot service on a Raspberry Pi running WPSD, where to find its logs, and how to identify what is wrong with a particular setup.

What Does WPSD Look Like?

After installation, WPSD will present its user with a dashboard that provides information about how many networks it is connected to, what modes are enabled, and information about which channel it is listening to, along with current traffic for that channel.

Each call is identified by the caller callsign. We can also see some statistics about their call.

We can see that in the following screenshot:

We can see the mode and network details on the left-hand side of the dashboard. If a mode is enabled but a specific configuration is invalid, you will see it here as inactive.

Configuration Changes

In addition to the main dashboard, we can see that we have the option of accessing the administrative dashboard, which looks like the following screenshot:

In this dashboard, we can see various editors and tools that can be used to configure different components of our hotspot. These editors are a bit restrictive and do not allow us to edit all sections of the configuration.

These restrictions are removed when we access the Full Editor under the advanced section, which allows us to edit everything but at the risk of breaking our hotspot.

This is why when we click on the Full Editor option, we get the following warning:

The Full Editor allows us to edit the configuration files in free-form text.

This gives us full power to add or remove sections. Please keep in mind that if you use the Full Editor, save your configuration, and then go back to use the form-based editor, your changes will be overwritten, and this will break your hotspot configuration.

If you use the Full Editor once, all future changes have to be done with the Full Editor.

Logging

When something goes wrong, our first step is to check the logs. WPSD allows us to view the service logs from its web interface.

But if we try to access a log file that has too many entries, we get the message we see in the screenshot that the file is too large.

Of course, this is not really an issue because this is a Linux system. We usually don’t connect to its terminal, but it comes fully configured with SSH access.

There is a default user called pi-star with a default password of raspberry.

With these credentials, we can connect to the system:

> ssh pi-star@pi-star.local
pi-star@pi-star.locals password: 
X11 forwarding request failed on channel 0

This is...
 _      _____  _______ 
| | /| / / _ \/ __/ _ \
| |/ |/ / ___/\ \/ // /
|__/|__/_/  /___/____/

Version Status
---------------
  • WPSD Dashboard Web Software:
      Ver. # 07cef4a9c8
  • WPSD Support Utilites and Programs:
      Ver. # 45da274eae
  • WPSD Digital Voice and Related Binaries:
      Ver. # 20ab213163

[?] Your WPSD dashboard can be accesed from:
    • http://pi-star.local/
    • http://pi-star/
    • http://192.168.0.53/

[i] WPSD command-line tools are all prefixed with "wpsd-".
    Simply type wpsd- and then the TAB key twice to see a list.

WPSD Project: (C) Chip Cuccio, W0CHP -- Made in Winona, Minn. USA
[!] WPSD is Free Software, and comes with ABSOLUTELY NO WARRANTY.

Last login: Fri Jan  2 14:12:58 2026 from 192.168.0.72
pi-star@pi-star:~$ 

As the welcome message suggests, we can do a lot from the command line. There is a whole collection of WPSD commands that can be accessed by typing wpsd- and hitting Tab:

pi-star@pi-star:~$ wpsd-
wpsd-backup              wpsd-detectmodem         wpsd-gensslcert          wpsd-mode-manager        wpsd-modemreset          wpsd-p25link             wpsd-switch-profile      wpsd-update              wpsd-ysflink             
wpsd-bmapi               wpsd-dmr_jittertest      wpsd-hostfile-update     wpsd-modemcalibrate      wpsd-modemupgrade        wpsd-sendcw              wpsd-system-manager      wpsd-version             
wpsd-dapnetapi           wpsd-dstar-link          wpsd-mmdvmremote         wpsd-modem-flash_custom  wpsd-nxdnlink            wpsd-services            wpsd-tgifapi             wpsd-xlx_dmr_link        
pi-star@pi-star:~$ wpsd-

But in this article, we are focusing on debugging our service, so we want to find the logs. The logs for the WPSD service can be found in /var/log/pi-star/, and in there we can see the following files:

i-star@pi-star:pi-star$ ls -l
total 2364
-rw-r--r-- 1 mmdvm mmdvm    1001 Jan  1 03:46 APRSGateway-2026-01-01.log
-rw-r--r-- 1 mmdvm mmdvm    1232 Jan  2 19:14 APRSGateway-2026-01-02.log
-rw-r--r-- 1 mmdvm mmdvm     615 Jan  3 02:00 APRSGateway-2026-01-03.log
-rw-r--r-- 1 mmdvm mmdvm       0 Jan  1 02:46 DMRGateway-2026-01-01.log
-rw-r--r-- 1 mmdvm mmdvm     102 Jan  2 22:35 DMRGateway-2026-01-02.log
-rw-r--r-- 1 mmdvm mmdvm       0 Jan  3 02:01 DMRGateway-2026-01-03.log
-rw-r--r-- 1 mmdvm mmdvm 1145712 Jan  1 23:59 MMDVM-2026-01-01.log
-rw-r--r-- 1 mmdvm mmdvm 1059026 Jan  2 23:59 MMDVM-2026-01-02.log
-rw-r--r-- 1 mmdvm mmdvm  195292 Jan  3 04:38 MMDVM-2026-01-03.log
pi-star@pi-star:pi-star$ tail DMRGateway-2026-01-02.log 
W: 2026-01-02 22:35:41.190 BM_2341_United_Kingdom, Login to the master has failed, retrying login ...
pi-star@pi-star:pi-star$ 

As we can see, the log files are broken down according to the subsystem. We can review each one of these files for potential issues. In this particular case, we can see that there was a connection issue to the British BrandMeister Gateway. Eventually, it was able to reconnect, so the service recovered.

MMDVM files keep extensive logs of all the activity happening in the hotspot, including statistics for every connection.

Configuration Structure

Explaining the configuration of the WPSD service is beyond the scope of this article, but I would like to briefly explain the structure of the configuration file.

The configuration file is divided into sections. Each section sets up a different aspect of the service, and some are meant to be extended to include more networks or enable more modes.

We can see a snippet in the following example:

[General]
RptAddress=127.0.0.1
RptPort=62032
LocalAddress=127.0.0.1
LocalPort=62031
RuleTrace=0
Daemon=1
Debug=0
RFTimeout=20
NetTimeout=20
Suffix=R
Primary=1

[Log]
DisplayLevel=0
FileLevel=4
FilePath=/var/log/pi-star
FileRoot=DMRGateway

[Voice]
Enabled=1
Language=en_GB
Directory=/usr/local/etc/DMR_Audio

These sections are fairly static, and once you set them up, they won’t change. But when it comes to the DMR Gateway section, you can have several entries that look like the following:

[DMR Network 5]
Id=*****
Address=apollo.dmr.uk.pe
Password="*****"
Port=62031
Name=SystemX_Apollo
Enabled=0
TGRewrite0=2,4,2,9,1
PCRewrite0=2,44000,2,4000,1001
PCRewrite1=1,4009990,1,9990,1
PCRewrite2=2,4009990,2,9990,1
PCRewrite3=1,4000001,1,1,999999
PCRewrite4=2,4000001,2,1,999999
TypeRewrite1=1,4009990,1,9990
TypeRewrite2=2,4009990,2,9990
TGRewrite1=1,4000001,1,1,999999
TGRewrite2=2,4000001,2,1,999999
SrcRewrite1=1,9990,1,4009990,1
SrcRewrite2=2,9990,2,4009990,1
SrcRewrite3=1,1,1,4000001,999999
SrcRewrite4=2,1,2,4000001,999999
Location=0
Debug=0

[DMR Network 4]
Enabled=0
PCRewrite1=1,5009990,1,9990,1
PCRewrite2=2,5009990,2,9990,1
TypeRewrite1=1,5009990,1,9990
TypeRewrite2=2,5009990,2,9990
TGRewrite1=1,5000001,1,1,999999
TGRewrite2=2,5000001,2,1,999999
SrcRewrite1=1,9990,1,5009990,1
SrcRewrite2=2,9990,2,5009990,1
SrcRewrite3=1,1,1,5000001,999999
SrcRewrite4=2,1,2,5000001,999999
Location=0

These entries indicate where to connect, credentials/IDs if the network is enabled or not, and some rewriting rules that define how you can access each network.

In DMR, we have several talk groups under the same frequency, and a hotspot can subscribe to more than one at the same time. That allows you to listen to incoming traffic for one or more talk groups.

However, if you want to answer one of them, you need to be able to differentiate between two talk groups with the same ID number that belong to different networks.

That is where the rewriting rules come into play: by assigning a prefix to each network. So if Network 5 has the prefix 5 and Network 4 has the prefix 4, and you have a talk group with number 2021 in both, then you can access talk group 2021 in Network 4 by dialing 42021 and Network 5 by dialing 52021.

This is where a misconfiguration is most likely to happen, so by combining the logic explained here with any errors found in the logs, we can find the potential issue and fix our hotspot.

To self host or not to self host, that is the question

2025-12-31T17:00:00+00:00

Table of contents

Introduction
Why Self-Hosting?
Hosting

Introduction

I was recently asked about some technologies usually related to self‑hosting and, after I answered that I am already using most of these things, I was asked why I do not write about them. This made me start thinking about it.

The problem is that self‑hosting means something different for each of us, depending on each person’s needs, so a detailed article about what I use would probably be of interest only to technical people and not so much to people with a non‑technical background, but who might still need to self‑host some services.

So instead of going deeply into the technical details of what I use, I will explain which technologies I use and how I combine them, and I will try to keep the focus on some issues that most people face, in order to make this article as useful as possible, starting with the main question: why self‑hosting?

Why Self-Hosting?

The definition of self‑hosting varies depending on who you ask. For some people, it means self‑hosting a website. For others, it means significantly more. We can self‑host almost everything — we can replace most major cloud providers with locally hosted services. This includes data storage, multimedia, email, social media, photo‑sharing apps, collaboration apps — you name it.

Not everyone needs all of these, and a lot of people are quite happy using the services provided by the major tech companies. They might want to have only one or a few of these services locally hosted.

First of all, we need to answer the question: why would anyone want to host their own services?

Privacy

We will start by mentioning that, in almost every case, when we are using a publicly offered solution, our expectation of privacy should be zero. Even though some cloud providers support encryption, most major consumer solutions are not end‑to‑end encrypted.

This means that any government in the world can request that your cloud provider decrypt your files and hand them over. It also means that your favorite email provider (Google, Microsoft, Apple) is reading your email. Facebook is reading your private chats.

This includes data files, chat histories, photos, videos, and books. Everything stored in a public cloud can be handed over to the authorities. Of course, as an upstanding citizen, you might think, “I have nothing to hide; I am not afraid of the authorities.”

Things are not that simple. First of all, not all governments are democratic, and quite a few profile their citizens in order to control them. So if you say that you have nothing to be afraid of, then you are lucky to live in one of the countries where there is no racism or discrimination based on skin color, political ideas, or religion.

Last time I checked, there are only a handful of countries that fall into this category.

So I would suggest you reconsider and look around you. See if there are any incidents of racism and discrimination in your communities — if there are, then you are also vulnerable. It does not matter what color your skin is or your religion.

Trends change, and the people who hold the keys to power change as well. Today, you might not be targeted by those in power, but tomorrow you might be.

So one of the reasons you might want to self‑host is to have true privacy.

Independence

Another reason is that you may want to be independent. We have seen public cloud providers locking in their users and then asking for money — the users cannot leave and go to another platform, so they have to pay up. We have seen that major players in the tech industry (i.e., Google) do not allow you to delete all of your photos in one go in order to move to another service.

You have to download them in batches, and if you used to host your data with Google and have several dozen GB, then you are out of luck. If you are technical, you can script the process. If you are not, then you are locked in.

Similarly, Flickr suddenly changed their pricing model and started asking for money from people who had photos on their platform. Anyone who had set up their workflow to upload their photos to Flickr automatically was effectively locked in and done for.

They kept threatening to delete my account and photos for months. I just ignored them, but I was so annoyed by this that I stopped taking photos for a long time. Eventually, I found that there was a tool that could export a Flickr library and import it to other platforms. This was the first step toward hosting my own Pixelfed instance.

Another fairly common issue is when someone is locked out of their own account.

We have seen several cases where the major tech companies lock people out of their accounts for various reasons. When this happens, you are screwed. You lose access to all of your apps, all your email, and all of your data.

It is a really good idea to keep your own copy of your data.

Common Patterns

These are common patterns that apply to everyone. Everyone has email, everyone has data, and everyone does messaging. Some people might be doing more, but every single person who uses modern technologies has a collection of data that they need to access and a group of people that they need to connect with, collaborate with, or talk to.

This is the first step toward self‑hosting: owning your own data.

Hosting

Data Hosting

This part of self‑hosting is actually the easiest of all. There are several major companies that offer Network Attached Storage (NAS) solutions.

Some of these are turnkey solutions. You just plug a box into power and network, and you have a full solution that can be used to store data, view photos, read emails, host web pages, write text files, use spreadsheets, back up phones, and collaborate with other people.

They are multi‑tenant solutions, which means that one device can be used for a whole family — each person with their own account — or an office/company.

Synology

The most prominent of these companies is Synology. Synology offers a full range of solutions — for both consumers and companies. I am a proud Synology owner/customer and have been for more than a decade.

Synology is not very cheap, but it offers a lot of value for your money. Another problem Synology solves is data access. It offers the QuickConnect service, which allows users to connect to their NAS from anywhere in the world.

Please remember that Synology offers apps for mobile device photo and data backup. These two combined offer similar functionality to native Android and iOS cloud support for media and backups — without the privacy and lock‑in issues mentioned earlier.

In addition to its main purpose as data storage, Synology offers beefier models that can be used to run services, either directly on the NAS itself or as containers. This means that you can host workloads on a Synology as if it were a public cloud provider.

My Synology systems are not powerful enough for this, so I am using an alternative method. We will come to that shortly.

QNAP

Another prominent company in this space is QNAP. QNAP offers a similar range of products to Synology — from small systems oriented toward consumers and home users all the way to systems oriented toward companies and offices.

They also provide an integrated solution, and they are better priced than Synology. The reason I have chosen Synology over QNAP is that their platform is better tested and more stable. You really do not want to risk messing up your data store with an unstable OS upgrade.

FreeNAS/TrueNAS

FreeNAS/TrueNAS is the open source/free software community’s response to the proprietary solutions offered by various companies. Originally developed back in 2005 based on FreeBSD, FreeNAS grew and matured into TrueNAS, an enterprise NAS solution with support for OpenZFS, jails, and more.

TrueNAS can be a more budget‑friendly solution but may require technical knowledge and/or an investment in time that might or might not be an option for the user.

Name Resolution

The second most common service for self‑hosting is a private name resolver. It may not be obvious because the way name resolution works is transparent — no one who is not technical wonders how this service works — but the truth is that whoever controls name resolution controls where you connect to.

There are specific attacks that can redirect a user to a fake site, and a whole lot of infrastructure in place that allows us to check if a site is who it claims to be. Usually, we use our ISP’s name resolution service.

This allows them to control where we connect to, and it is a fact that one of the first types of blacklisting on a country level happens on this service. If your ISP or your government wants to prevent you from accessing a service, this is one of the first things they will do.

They will blacklist their domain. In addition to this, often the ISP DNS servers are not very stable, so it might be a good idea to not use them anyway.

Luckily, it is easy to work around this issue by using one of the DNS services available from major providers (Google, Cloudflare, etc.). This solves the potential performance issues but moves the control from your ISP to a major tech company that might have their own agenda.

The solution to this is to run your own DNS server. If you have one of those fancy Synology NAS systems that can run workloads, you can use it to deploy a custom DNS server. Alternatively, you can use a cheap Raspberry Pi.

Pi-hole

There is a user‑friendly DNS software package called Pi-hole. Pi-hole is designed to mostly act as a caching nameserver and ad blocker. It can be deployed as a container or a regular Linux service on a Raspberry Pi or in a VM.

This means that it expects you to send your requests to it, and it will ask the official upstream DNS servers for an answer and then cache this answer for any future similar requests. In addition to that, it also supports blocking common known domains that are used to serve ads.

Why Blocking Telemetry and Ads Is a Good Idea

Pi-hole was originally designed as an ad blocker. It takes a leaf out of your ISP’s book and, instead of letting them control which domains resolve for you, lets you do it yourself. This means that you can blacklist certain domains that are used to track you or show you ads.

Online tracking and advertising go hand in hand. All of our interactions on the internet are being monitored by the major tech players. This is one of the reasons why you might not want to use one of their DNS servers. This telemetry is then used to profile you and serve you ads.

You might ask, why should you bother doing this?

Privacy

Well, you can scroll up a few paragraphs and read the section about privacy again. Why should Facebook or Google know who you are, what your favorite color is, or your shoe size? Why should they know what your favorite restaurant or sports team is?

It is very difficult to avoid this kind of tracking, but one of the ways you can get some privacy is by blocking trackers and ads.

But there is a second reason you might want to block ads.

Security

It is called malvertising. Ads are a known method used to spread malware. It is one of the most common ways — most likely the second most common after phishing — of how malware spreads.

It is very easy for a non‑technical person to be tricked into clicking either intentionally or unintentionally on an ad and then, instead of taking the user to the advertising company’s website, have malware installed.

The major ad companies (again, Google, Facebook, and friends) do not really do any validation of the ad payload. They sort of assume that all of their advertisers are legitimate companies promoting a product, but this is not the case.

Performance

A final reason why you may want to block ads is performance. Ads consume resources. They consume memory, CPU, and bandwidth. Quite often, they overload an otherwise simple page. Most people notice an improvement in performance and browsing experience after blocking ads for this reason.

Media

Another common use case for self‑hosting — but not as common as data and DNS — is media hosting. There are a lot of people who own collections of digital media, either films or music (or both). Some people own audiobooks. So it is quite common for all these people to want to access their media on all of their devices.

Managing media collections across several devices is a tedious thing, especially when you want metadata like cover images, actor biographies, etc. So a lot of people self‑host a service for this.

Why Not Use Online Streaming?

Of course, most people will ask why they should bother hosting a service for media when they can just use Netflix.

The answer is that Netflix, Disney+, and Audible cost money, and they do not always have what you want. They have limited licenses to the majority of titles they host, which means you cannot watch your favorite series whenever you want (in my case, the series that started all this is Star Trek: The Next Generation).

Also, some of the platforms are quite busy enshittifying the whole experience by introducing ads even though you are already paying for the service. I have lost count of how many times my X-Files viewing was interrupted by Amazon Prime ads.

On the other hand, if you buy your movies as second‑hand DVDs from eBay, rip them, and watch them as many times as you want, you can do so freely. You can also download your podcasts locally and listen to or watch them on any of your devices without depending on YouTube, Spotify, or any other major platform.

You can even automate this.

There are several older films/movies or even presentations/instructional videos that are public domain or that you have paid for, and you can download them for your own pleasure/training needs.

Plex

This is exactly what I have done. I have a collection of DVDs of favorite movies/series; I have ripped them and host them in my Plex. Again, if you are the happy owner of a fancy Synology, you can host the service and the data on your Synology.

If you do not have a powerful enough Synology, then again you can host Plex either on a Raspberry Pi or a Linux system (either physical or virtual). The process is fairly straightforward and well documented.

You can use it for all your videos, music, and audiobooks, and it will automatically detect the metadata of each of your files and show it to you when you want to view or listen to it.

It supports playlists and collections for the organization of your files.

Plex Issues

Not everything is rosy in the Plex world. Even though they offer apps for all platforms and you can also use the service from a browser, the apps are not free. You have to pay a subscription or buy a lifetime pass.

I paid for a lifetime Plex Pass many years ago and have been using it ever since.

In addition to this initial cost, Plex is gradually moving away from an open ecosystem that allows third‑party plugins toward a walled ecosystem that is ad‑supported. It will not show ads on your own media, but it allows you to stream movies from their own catalog while they show you ads.

As you might have noticed, I am not a fan of ads, so what applies to Google and Facebook also applies to Plex. If you block the ads, the service is not otherwise affected. It is hosted locally on your own system, after all.

Another potential issue you may have is that the metadata detection might be a bit finicky. It requires specific naming of your files; otherwise, it might not work as expected, and you may end up with misidentified films.

Jellyfin

An alternative solution to Plex is Jellyfin. It is an open source, volunteer‑built media solution that does exactly what Plex tries to do — without the corporate agenda.

Jellyfin is younger than Plex, and it did not exist when I started setting up my personal media collection; otherwise, I would most likely have chosen it.

It uses a server/client architecture in the same way Plex does, where the server holds the collection and all metadata, and the clients connect to it and stream data from it. There is extensive support for clients on all major platforms.

Managing Your Services

So far, we have provided examples of three different services that someone might be interested in self‑hosting and given the reasons why they might want to. The list is non‑exhaustive, and most people host even more services.

After a while, even a technical person might find the management of all these services a chore, so the next question is: is there any way to make management of these services easier?

The answer is yes, and there are several solutions to this problem. One of these solutions is YunoHost.

YunoHost is a service management platform that can be used to automate and manage the lifecycle of services. It supports a large number of services, including Pi-hole, Plex, Jellyfin, and more.

It can be used to deploy the services, update them, create users, take backups, and finally uninstall the services.

It supports a Single Sign‑On (SSO) page, which means that you can access all your services by using one username and password.

You can also read more about YunoHost and self‑hosting in Elena Rossini’s very nice blog posts here and here.

Accessing Your Services

All right, so now you have quite a few services hosted at home: you have your Pi-hole that protects you from spammers and advertisers, you have your data on your NAS with your self‑hosted productivity suites, and your media library with Jellyfin or Plex.

But you can only access them when you are home. What happens when you are commuting or traveling abroad for work or holidays? Do you need to replicate your podcast setup on a different podcast player? Copy your films to your phone or your iPad so you can watch them when away?

Of course you could do that, but you do not really have to unless you happen to be somewhere without internet access, such as an airplane.

The Old Way

In the past, in order for people to use their services, they had to use Dynamic DNS and open and forward ports in their ISP router to allow incoming connections to their services. This approach did not always work great, had a lot of moving parts, and involved some technical difficulty.

There is a better way today.

The Better Way

The better way to do this is to use Tailscale. Tailscale is a company that offers consumer and business VPN solutions. It is based on WireGuard point‑to‑point VPN technology. In a nutshell, it allows us to connect all of our devices together using a virtual private network called a tailnet.

Tailscale supports all major platforms, including Linux. After enabling Tailscale on your mobile device and your Plex/Jellyfin instance, you can access it as if you were at home.

The installation is straightforward, and Tailscale offers a free tier that allows you to connect more than 100 devices to your tailnet. This way, you can easily access all of your services without having to mess with Dynamic DNS or port‑forwarding rules on your router.

You can even configure one of your systems at home to act as a gateway, which means you can use Tailscale on your phone and iPad while away to browse the internet and appear as if you are connecting from home.

The whole communication is encrypted, so you can use this instead of a VPN. The main difference is that you do not have many exit points to choose from. But it is a cracking feature since it means that by using your domestic connection, you can access all of your regular services even if they are geo‑fenced.

While regular VPN solutions might be blocked, your own network connection will not be, because it is a residential IP and not an IP owned by a VPN company. So if you want to access any streaming services or other geo‑fenced services when abroad, you will not have any issues.

Website

So far, we have covered services that someone needs and they can host in their own home for their own use, but how about hosting a service not for their own personal use but for other people? A good example of this is a website.

If you read Elena’s blog post, you might have noticed that she is using YunoHost with a Virtual Private Server (VPS). This is most of the time the correct way to go about it if you want to host a website, especially if it is a commercial one.

But would it be possible to host a website at home, not open any ports on your router for the outside world, and yet make it available to everyone?

The answer is yes.

Cloudflare offers a free service called cloudflared (Cloudflare Tunnel client). It is an agent that allows us to create a tunnel between our own system and Cloudflare systems.

It is designed to use Cloudflare’s proxy service to forward any HTTP requests coming to their side to our system without us having to expose any of our internal network to either the outside world or to Cloudflare itself.

In order to use this service, you need a domain registered with Cloudflare that will be used to establish the tunnel and the agent running on the system that is going to accept the connections. Then you can use this tunnel to host any HTTP service.

If you combine this with YunoHost and virtual hosting, you can use one system to host multiple services — your own Pixelfed, your own website, your own Mastodon — from one Cloudflare tunnel and one system in your closet, for free.

DIY home automation

2025-12-19T18:00:00+00:00

Table of contents

Introduction
The ISM Radio Bands
- 433 MHz Band
- Zigbee
Decoding the Gadgets
Using the data
- Architecture 1
- Architecture 2
Eye Candy
Potential Issues
Conclusion
References

Introduction

Smart homes, or home automation as a concept, have existed for a few decades now. The final result of the evolution of this market is that we have ended up with a patchwork of devices and protocols that may or may not talk to each other.

We have competing ecosystems, and you can find devices that use technologies such as Zigbee or Z‑Wave to talk to each other, but these two networks are incompatible with each other.

In addition to that, we may have devices that use proprietary protocols that we cannot use or control unless we buy into their ecosystem.

In this article I will present a DIY approach to how we can interact with these devices.

I will focus on devices using ISM radio bands (433 MHz, Zigbee). The same approach could be used for other home automation technologies as long as they are open and not proprietary.

The ISM Radio Bands

Many devices use what is called ISM radio bands for their communication, and within those bands they use specific protocols to talk to each other. According to Wikipedia:

The ISM radio bands are portions of the radio spectrum reserved internationally for industrial, scientific and medical (ISM) purposes, excluding applications in telecommunications.

and

Cordless phones, Bluetooth devices, near-field communication (NFC) devices, garage door openers, baby monitors, and wireless computer networks (Wi-Fi) may all use the ISM frequencies, although these low-power transmitters are not considered to be ISM devices.

These devices are going to be the focus of this article.

433 MHz Band

433 MHz is one of the bands reserved for ISM applications. But as we will see, it is also used for domestic applications and it is a very convenient way to expand your home automation if you are DIY-inclined and a bit handy with a soldering iron.

This band is a favourite with many manufacturers, hobbyists, engineers and IoT enthusiasts because it hits the sweet spot between range, signal penetration, power consumption and cost. It also does not require a licence for the manufacturer in most countries or is part of the ISM bands in others.

There are a huge number of devices that use this band, from temperature and humidity sensors, lighting, switches, remote controls—you name it. Apart from the off-the-shelf appliances that use this band, there are RF modules both for reception and for transmission that can be purchased for use in DIY projects.

Any Arduino enthusiast can acquire a couple of these modules and the sensor of their preference and build their own standalone, wireless-enabled appliance.

We can see one of these transceivers in the screenshot below.

This is in fact what led me down this road in the first place. I wanted to create a custom light detector in order to monitor the hours of sunshine at home. I wanted to use this to understand if there was any point in me installing solar panels.

Eventually I did not build this particular sensor because I found the data I was looking for from a friend who already has solar panels, but in the meantime I had built the architecture I am going to present already.

There are Arduino libraries that can easily allow us to encode and decode data using the 433 MHz band, such as RF433any and DIY articles with step-by-step instructions on how to use them.

Zigbee

Zigbee is a more modern, more powerful and more complicated technology than the ones used in the 433 MHz band. It also uses some of the ISM bands; the exact frequencies used vary from country to country.

From Wikipedia:

Zigbee is a low-power wireless mesh network standard delivering low-latency communication, and targeted at battery-powered devices in wireless control and monitoring applications. Zigbee chips are typically integrated with radios and with microcontrollers.

and

Zigbee operates in the industrial, scientific and medical (ISM) radio bands, with the 2.4 GHz band being primarily used for lighting and home automation devices in most jurisdictions worldwide.

It allows for more complicated methods of communication, a mesh network, higher transfer rates than the 433 MHz band and encryption.

It supports star, tree and mesh network topologies. Every network must have one coordinator device.

On the DIY side of things, Zigbee support exists for ESP boards.

Zigbee transceivers can be acquired in a similar manner to 433 MHz modules, although as we can see they are about 693% more expensive.

There is extensive support for Zigbee in the IoT and home automation market, so it may very well be that you already have Zigbee devices and you don’t know it.

Decoding the Gadgets

In this article we will not focus on how to build your own Zigbee or 433 MHz-enabled devices, but we will assume that you have already acquired a few of them even though it is quite possible you did not do it intentionally. Maybe you got a couple of weather stations or a smart switch.

This article will present a method of tapping into the network of your existing smart appliances and getting a copy of the data for yourself.

I will present two topologies that will allow you to monitor the data as well as enrich and transform data coming from multiple devices.

An extension of these topologies could potentially allow you to control your home automation.

We will cover both technologies but will provide examples only for the 433 MHz band.

Reading the Signals

At this point we have established that you have signals flying over your head that are either transmitted at the 433 MHz band or alternatively use the Zigbee technology.

You may wonder if it would be possible in some way to intercept and decode that traffic.

As it happens, this is indeed possible. It can be done by using a receiver connected to some kind of computer that will be able to decode whatever traffic exists. There are specialised receivers for this kind of thing, or alternatively you can use a wideband receiver such as RTL-SDR for both.

One word of warning: if you want to decode Zigbee signals at the 2.4 GHz band you will need some additional hardware as mentioned here. RTL-SDR maximum frequency is less than 2.4 GHz.

This makes it ideal for a number of applications. You can use the same device to decode ADS-B signals for aeroplane traffic, listen to radio, decode terrestrial over-the-air TV, decode sub-1 GHz Zigbee signals or decode the 433 MHz band.

RTL-SDR is not considered to be a particularly good receiver. It is wide open to a big range of frequencies, and as such sensitive to interference.

So if you wanted to use it to watch TV, for example, you could find a lot better receivers.

On the other hand, for the purpose of all-in-one receivers for a small amount of money it is indeed value for money. We can see one of those RTL-SDRs in the following screenshot.

After acquiring the SDR, all we need is a computer to decode it. A Raspberry Pi 3 is powerful enough for this. All we need to do is connect our dongle to the Pi, install the appropriate software and off we go.

Decoding Zigbee

To decode Zigbee signals, we can use a tool such as sdr4iot-zigbee-rx. This tool allows us to capture and decode the signals for further processing.

Decoding 433 MHz

To decode 433 MHz band signals, we can use a tool such as rtl_433. rtl_433 can decode signals from multiple bands, not just 433 MHz, and supports a wide range of protocols and devices.
This is the tool I have the most experience with and actively use in my setup.

When executed manually, we see something like the following screenshot:

The output shows how many signal types the tool can decode and begins decoding them automatically.

It reports the various sensors and data it receives in real time. This is useful when verifying that the setup works as expected, but we cannot do much post-processing while the data remain in this raw form.

MQTT

rtl_433 can decode signals and write them to stdout, a file, or alternatively forward them to an MQTT server.

MQTT is the go-to standard publish/subscribe (pub/sub) machine-to-machine IoT message broker. It follows a publisher/subscriber architecture — in this case, the publisher is the rtl_433 decoder.

It submits its data under specific topics for each sensor, which can then be consumed by another tool for further processing.

One of the most popular MQTT brokers is called Mosquitto. It is lightweight and can be set up either on the same Raspberry Pi that performs signal decoding or on another system used for general monitoring.

Either way, the rtl_433 process can be configured to publish its data to Mosquitto with a command similar to:

/usr/bin/rtl_433 -vv -F mqtt://127.0.0.1:1883

which will produce output similar to the following:


Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: rtl_433 version 22.11 (2022-11-19) inputs file rtl_tcp RTL-SDR SoapySDR
Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: Use -h for usage help and see https://triq.org/ for documentation.
Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: Trying conf file at "rtl_433.conf"...
Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: Trying conf file at "(null)/.config/rtl_433/rtl_433.conf"...
Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: Trying conf file at "/usr/local/etc/rtl_433/rtl_433.conf"...
Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: Trying conf file at "/etc/rtl_433/rtl_433.conf"...
Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: Publishing MQTT data to 127.0.0.1 port 1883
Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: Publishing device info to MQTT topic "rtl_433/wsjtx-pi/devices[/type][/model][/subtype][/channel][/id]".
Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: Publishing events info to MQTT topic "rtl_433/wsjtx-pi/events".
Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: Publishing states info to MQTT topic "rtl_433/wsjtx-pi/states".
Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: Registering protocol [1] "Silvercrest Remote Control"
Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: Registering protocol [2] "Rubicson, TFA 30.3197 or InFactory PT-310 Temperature Sensor"
Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: Registering protocol [3] "Prologue, FreeTec NC-7104, NC-7159-675 Temperature Sensor"
Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: Registering protocol [4] "Waveman Switch Transmitter"
Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: Registering protocol [8] "LaCrosse TX Temperature / Humidity Sensor"
Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: Registering protocol [11] "Acurite 609TXC Temperature and Humidity Sensor"
Dec 14 16:42:30 wsjtx-pi rtl_433[25671]: Registering protocol [12] "Oregon Scientific Weather Sensor"

As shown, rtl_433 publishes its decoded data to MQTT, indicating which topic structure it follows and which protocols it supports.

Using the data

The next step is to decide what to do with the data. In my case, I just wanted to visualize the data, but at this point, you could do much more. I’m going to present two alternative architectures that will allow you to do this.

One is simpler than the other. The more complex architecture allows data enrichment, transformation, and the addition of logic that can trigger specific actions.

I previously used the more complex architecture, which supported data transformation, but due to a hardware failure, I chose the simpler approach the second time around.

Architecture 1

The first architecture uses a Prometheus MQTT exporter to read data from MQTT and expose it for scraping by Prometheus. Then Grafana can connect to Prometheus for data visualization.

This is the simplest possible architecture with the fewest moving parts, but it does not allow for data transformation or enrichment. If your data does not conform to the desired format, there isn’t much you can do without adding additional components.

All this is shown in the following diagram:

Architecture 2

A more powerful architecture that uses a different approach is shown in the following diagram:

In this case, we are not using Prometheus. Instead, we use InfluxDB. InfluxDB has a push architecture, whereas Prometheus uses a pull architecture. This means that Prometheus expects to read values from an exporter, while InfluxDB expects an agent to push data to it.

In this setup, the service acting as the InfluxDB agent is Node-RED.

Node-RED is a powerful low-code, event-driven middleware platform that can be used to create event-driven workflows. It can collect, transform, and visualize data, as well as drive devices by sending events to them.

In this particular case, Node-RED reads sensor data from MQTT, transforms it, and submits it to InfluxDB. Grafana then reads the data from InfluxDB for visualization.

This architecture is more capable than the previous one because Node-RED can read data from multiple sources, transform it, and even publish it back to MQTT for further processing. In the past, I used this approach to process data from my ZigBee-based smart meter and my ADS-B receiver, which does not support MQTT or InfluxDB by default but provides an API that can be queried.

Eventually, all data were visualized in Grafana.

In addition, Node-RED can be extended with the smart-nodes library to control smart devices and create more complex workflows based on incoming data—beyond simple visualization—but that is outside the scope of this document.

Eye Candy

After all of this, you can visualize your data like so:

Of course, you can define thresholds and create alerts when certain conditions occur — for example, if the temperature falls below a set threshold — and if you are using the Node-RED architecture, you can trigger actions such as turning on the heating.

There’s no limit to what you can do.

Potential Issues

From time to time, you might encounter some issues. For example, there’s always the chance of a sensor failure or a sensor going haywire. These sensors are factory-calibrated, but you still need to keep an eye out for misbehavior — especially if you’re triggering actions based on their input.

In addition, you may experience interference from other sources — either from nearby sensors running on the same frequency or from other equipment operating in the same range. It’s quite likely that your rtl_433 decoder will pick up signals that don’t belong to you.

We can see this happening here:

In this graph, we can see an unfiltered view of all the sensors my RTL decoder detects. Most of these aren’t mine — I don’t own a soil sensor, and I’m not entirely certain what this “TwinPlus” sensor reporting -25 °C is.

Maybe someone in the neighborhood has a smart freezer?

Conclusion

The techniques presented in this article allow us to build our own DIY home automation systems and potentially re‑purpose and extend off‑the‑shelf commercial solutions in ways that go beyond their original design.

They empower us to take control of products and services that might otherwise be opaque and to understand exactly what is happening in our home environment.

As always, with power comes responsibility — it’s important to be considerate and respect other people’s privacy.

So act responsibly, and have fun with your home automation!

References

Keeping infrastructure up to date with the Phoenix server pattern

2025-12-15T21:30:00+00:00

Table of contents

The problem
- Pets vs cattle
The Phoenix server pattern
- The benefits
- The risks
Building the infrastructure
- Traditional architecture
  - Preparing and verifying the base image
  - Deploying the image
- Container-based architecture
  - Preparing and verifying the base container image
  - Deploying the image
Final thoughts
References

The problem

One of the classic challenges any infrastructure team has is that they need to keep their infrastructure up to date and secure.

This is eventually achieved by using various scanning tools that detect vulnerabilities in the systems and by designing elaborate patching schedules that are meant to solve this problem.

Container-based workloads are also vulnerable to this issue, because most product teams focus on delivering their features and do not spend enough time thinking about the base containers they are using and how these can be kept up to date.

Pets vs cattle

The goal may seem simple, but it is hiding a lot of complexity. It has been discussed extensively and a lot of ideas have been put forward about potential solutions.

A classic idea is that we should be treating systems as cattle and not as pets.

This means that if a system is sick, we do not fix it. We just kill it and replace it with a new functional system. This idea has in itself given birth to the concept of immutable infrastructure, which in turn is the foundational idea of container-based computing that has given birth to modern cloud computing.

This is a great concept, but it covers only the use case of when a system is broken; it does not cover how to make sure that our infrastructure is always secure and patched.

The pets vs cattle discussion suggests that instead of spending time fixing a system we should be replacing it. But replacing it with what exactly?

Also, what do we do with infrastructure that is not stateless or immutable? How do we ensure that this is going to be secure as well?

In this article we will answer this question by explaining the Phoenix server pattern for traditional as well as container-based workloads.

The Phoenix server pattern

The Phoenix server pattern is a flavour of the pets vs cattle idea. It suggests that instead of waiting for a system to break so we can replace it, we do this proactively.

We do this for our whole fleet in carefully defined intervals in such a way that every few months we will have replaced our whole fleet with zero downtime.

This concept can be applied to both traditional virtual machine or physical-systems-based infrastructure as well as container-based infrastructure.

The basic requirement in order to achieve this is to have a mature and robust automation framework with full testing and quality gates that will do some of the following in an automated way:

Build base images (virtual machines and containers)
Test base images
Certify base images by running tests for all the components of our product at build time (unit testing)
Deploy base images to a staging environment, apply prod-like configuration and datasets, and run tests (integration/E2E testing)
Tag and release new base images for general use
Assuming everything is approved, start a gradual replacement of existing systems by using these new base image releases
Ensure that by the end of a certain period the whole fleet has been updated and refreshed

Of course, the exact implementation depends on the nature of the service, so this is an indicative list.

The benefits

The benefits of following the Phoenix pattern are many.

Your systems are always up to date with very few security issues.
There is no snowflake configuration; everything is controlled.
Your automation is exercised on a daily basis.
Your testing suite and quality gates are top-notch.
Your monitoring and alerting solutions are also top-notch.
Your systems are always compliant.
You can use the test results as part of your compliance submissions.
Getting certified for PCI DSS is going to be a breeze.

The risks

As always, with great power there is also great responsibility, so in addition to the benefits there are also great risks. As mentioned in the three ways of DevOps, if you make a mistake and your testing/monitoring/alerting is not robust enough to catch it, then the risk of a global outage is very high.

We have seen this happen to all top players in the field. It has happened to Amazon, it has happened to Cloudflare, it has happened to Twitter, Facebook, etc.

Building the infrastructure

In this section of the article I am going to present how I have done this in the past. But this is just one way of doing this. Also, as always, the devil is in the details, so your implementation may have to differ because the architecture of your service dictates it.

But the pattern should hold and the eventual benefits should persist. I will cover two scenarios: one for traditional workloads (VM/physical-based) and one for container-based workloads.

I will use a fairly simple architecture for illustration purposes, but in many cases in real multi-master highly available services, these examples will have to be redesigned in order to fit the needs of the specific service you want to maintain.

Traditional architecture

Preparing and verifying the base image

In a traditional architecture we usually have a service that has several components that are usually using the same underlying platform, and each component may have its own configuration management role as described in my automated configuration management for non containerised workloads article.

For the sake of this example, let us assume that this common platform is an Ubuntu Linux distribution.

In order to implement the Phoenix server pattern we need to be able to build images for our target architectures from scratch. I have done this in the past by using Hashicorp Packer; our target architectures were Amazon, VMware and Xen.

We were using VMware for our production environment, Xen for our development environment and Amazon EC2 for our DR environment.

We can see this architecture in the following diagram.

If the image is too small, please right click on the image and open it in a new tab so you can zoom in.

This is a base image creation and verification pipeline, but each step is meant to be a placeholder for domain-specific operations. This means that you can replace each step with the operations you need. For example, in the “run tests” step you can verify potential kernel drivers you need, and then run tests.

Even though this is a base image creation pipeline diagram, it can be extended further to produce complete appliance images that can be delivered to the customer as ready-to-install images to their platform of choice.

I have done this in the past when this pipeline contained integrations with Jira, contained steps for virus scanning and key signing of the various artefacts contained in it, up to the point where it even produced PDFs that were ready to be printed for DVD cases for the customers that required this image to be delivered on physical media.

Deploying the image

After the creation of the base image we are ready to start the gradual rollout.

This is the most sensitive step of the operation, since it requires automation to prevent the execution of the next replacement if one replacement has failed, and robust monitoring and alerting to notify the owners of the systems that a replacement has failed.

The rollout also depends on the nature of the service.

For example, in a highly available service with load balancing you can spin up a new service, do the deployment and configuration using your configuration management system (maybe using something similar to the method defined in my previous article here), and when everything is ready run tests to verify if the new system is healthy, then destroy one of the older instances.

Repeat the process until all the older instances have been replaced.

Container-based architecture

Preparing and verifying the base container image

If you are using containers instead of virtual machines or physical systems, then everything is a lot easier. You have only one target platform to worry about, and your application is meant to be stateless.

You can reuse the same pipeline to produce your base image with all the latest patches installed, run your tests and produce your base image to your artefact repo.

After testing, promote the snapshot version created to a released version which will then be used by other pipelines to produce the final artefact that contains your service.

Deploying the image

Kubernetes supports the rolling restart strategy out of the box, so all you need to do is bump the version of your artefact to the latest release and Kubernetes orchestration will do the rest for you.

Final thoughts

Using this approach we move the focus of operations from day-to-day management of long-running systems to making sure our automation and monitoring are robust. In these situations automation, monitoring and alerting are key for the detection of faults as early as possible, ideally before even a base image is released.

There are huge benefits in doing so, because this means that all of our infrastructure is defined as code, there are no hidden configuration files that nobody knows anything about, and everything is peer reviewed and monitored.

This promotes standardisation and knowledge transfer with well-defined alerts and runbooks that will help resolve any issues with minimal disruption.

References

Phoenix server pattern