Machine learning (ML) models and IoT (Internet of Things) devices are a natural match. IoT devices collect a tremendous amount of data from the physical world, while ML models can analyze this data to derive actionable insights. However, deploying ML models on IoT devices in environments where cloud infrastructure is restricted or not desired introduces a unique set of challenges.

In this article, we explore how to deploy ML models directly to IoT devices using DevOps principles without managing traditional cloud infrastructure. We will walk through a practical example of deploying a model using containerization, edge orchestration, and CI/CD pipelines—all tailored for embedded or edge environments.

Why Avoid the Cloud in IoT ML Deployments?

There are several reasons why avoiding cloud infrastructure in an IoT ML deployment may be preferable:

  • Privacy and compliance: Sensitive environments (e.g., healthcare, defense, or industrial automation) require data to stay on-premises.

  • Low-latency requirements: Edge inference avoids round-trip latency to cloud services.

  • Intermittent connectivity: Remote locations or mobile environments may lack reliable internet.

  • Cost efficiency: Avoiding data transfer and cloud compute charges can save money at scale.

Instead of the cloud, you can use on-device or edge-local strategies, enhanced by DevOps automation and tooling to maintain model lifecycle and operational consistency.

Tools and Stack Overview

To implement a DevOps-style ML deployment pipeline on IoT devices without cloud infrastructure, you’ll use:

  • TensorFlow Lite / ONNX Runtime: Lightweight frameworks for model inference on edge devices.

  • Docker / Podman: Container runtimes for consistent environments.

  • Balena / K3s / MicroK8s: Lightweight orchestration platforms suitable for edge.

  • GitHub Actions / GitLab CI: Cloud-hosted CI/CD systems to automate builds.

  • rsync / scp / Ansible: Lightweight deployment tools for remote devices.

  • Systemd / Supervisor: For managing model inference services on devices.

Train and Optimize the Model

Start by training a model using your usual data science workflow (e.g., using TensorFlow, PyTorch, or Scikit-Learn). Once trained, convert it to an edge-compatible format.

python
# Convert a Keras model to TensorFlow Lite
import tensorflow as tf
model = tf.keras.models.load_model(‘my_model.h5’)converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()with open(‘model.tflite’, ‘wb’) as f:
f.write(tflite_model)

For PyTorch:

bash
# Convert to ONNX
python export_to_onnx.py

Optimize the model with quantization if supported, reducing memory and compute requirements.

Containerize the Inference Application

Package your model along with the inference logic into a Docker container. Here’s a basic Python Flask app serving predictions locally on the device.

Directory structure:

iot-inference/
├── Dockerfile
├── app.py
├── model.tflite
├── requirements.txt

app.py:

python
from flask import Flask, request, jsonify
import tensorflow as tf
import numpy as np
app = Flask(__name__)
interpreter = tf.lite.Interpreter(model_path=“model.tflite”)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()@app.route(‘/predict’, methods=[‘POST’])
def predict():
data = np.array(request.json[‘data’], dtype=np.float32).reshape((1, –1))
interpreter.set_tensor(input_details[0][‘index’], data)
interpreter.invoke()
output = interpreter.get_tensor(output_details[0][‘index’])
return jsonify({‘prediction’: output.tolist()})if __name__ == ‘__main__’:
app.run(host=‘0.0.0.0’, port=5000)

Dockerfile:

Dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install –no-cache-dir -r requirements.txt
COPY . .
CMD [“python”, “app.py”]

requirements.txt:

makefile
Flask==2.2.5
numpy
tensorflow==2.13.0

CI Pipeline for Model Build and Test

Using GitHub Actions (or GitLab CI), define a pipeline to automatically build and test the container.

.github/workflows/build.yml:

yaml

name: Build IoT Model Container

on:
push:
branches: [ main ]

jobs:
build:
runs-on: ubuntu-latest
steps:
uses: actions/checkout@v3

name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2

name: Build Docker image
run: docker build -t iot-inference:latest .

name: Run tests (optional)
run: docker run –rm iot-inference pytest

This avoids manual interaction and ensures every model change leads to a consistent, tested container.

Deploy to the IoT Device via SSH + Systemd

Once the container is built and tested, push it to the IoT device via scp or deploy using Ansible.

Simple shell deployment script (deploy.sh):

bash
#!/bin/bash
DEVICE=192.168.1.50
USER=pi
# Copy files
scp -r . $USER@$DEVICE:~/iot-inference# SSH in and rebuild
ssh $USER@$DEVICE <<EOF
cd iot-inference
docker build -t iot-inference .
docker stop iot-inference || true
docker rm iot-inference || true
docker run -d –restart unless-stopped -p 5000:5000 –name iot-inference iot-inference
EOF

For production-grade deployments, you could use Ansible playbooks:

yaml
- name: Deploy container to edge device
hosts: iot
tasks:
- name: Transfer project
synchronize:
src: ./iot-inference
dest: /home/pi/
recursive: yes
- name: Rebuild container
shell: |
cd /home/pi/iot-inference
docker build -t iot-inference .
docker rm -f iot-inference || true
docker run -d --restart=always -p 5000:5000 --name iot-inference iot-inference

Enable Continuous Updates

Once the system is in place, you can enable version-controlled updates:

  1. Update the model in Git.

  2. CI/CD builds the new container.

  3. Deployment tool (Ansible, rsync, or even MQTT-triggered scripts) pushes the update to devices.

  4. The device replaces the running container.

For edge environments with container orchestrators like K3s, a simple Kubernetes manifest can be used to apply changes consistently.

inference-deployment.yaml:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: iot-inference
spec:
replicas: 1
selector:
matchLabels:
app: iot-inference
template:
metadata:
labels:
app: iot-inference
spec:
containers:
- name: iot-inference
image: iot-inference:latest
ports:
- containerPort: 5000

Monitor and Secure Your Deployment

Even without cloud infrastructure, you must still:

  • Log predictions and failures locally (e.g., with SQLite or file logs).

  • Monitor uptime using watchdog timers or local Prometheus exporters.

  • Rotate keys and secure endpoints with mTLS or basic auth.

  • Regularly patch containers for vulnerabilities.

Advanced Enhancements

Here are some enhancements to increase robustness and scalability:

  • EdgeML platforms like NVIDIA Jetson with DeepStream SDK for video analytics.

  • Use BalenaOS for OTA (Over-the-Air) container updates across fleets of devices.

  • Integrate with MQTT brokers for telemetry and remote triggers.

  • Use git pull + systemd timer on device if outbound CI/CD is not allowed.

Conclusion

Deploying machine learning models on IoT devices without managing cloud infrastructure is not only feasible—it can be robust, scalable, and secure when done using the right DevOps principles. The combination of lightweight containers, automated CI/CD pipelines, and device-native deployment tools allows teams to deliver high-performing, responsive ML applications directly to the edge.

By eliminating cloud dependencies:

  • You reduce latency,

  • Increase data privacy,

  • Lower costs,

  • And unlock ML use cases in remote or offline environments.

With proper tooling—like Docker, Ansible, GitHub Actions, and TensorFlow Lite—you can orchestrate production-grade ML deployments even to the tiniest edge devices.

This approach empowers organizations to focus on delivering intelligent functionality rather than managing cloud infrastructure overhead. Whether you’re operating smart farms, industrial machinery, or offline diagnostics systems, DevOps-driven edge ML lets you deploy fast, update continuously, and operate securely—all without touching a public cloud provider.