(in)Secure IT

The Great Collapse: Taming Context Sprawl into 4 Elite Core Lead Squads

Jamz Yaneza — Fri, 17 Jul 2026 04:07:05 GMT

If you've been following Act III of this threat intelligence odyssey, you know the decentralized circus of running 23 micro-agents across isolated bare-metal Proxmox containers blew up in my face. Token drift was rampant, context windows were fragmented, and my collection sentinels were actively having middle-management arguments with the parsing agents inside the PostgreSQL backend logs. The pipeline was completely bogged down by its own organizational sprawl, and the hypervisor processors were pinned at 90% utilization just managing internal REST API webhook handoffs.

We over-engineered the engine into oblivion, and the massive spike in my electricity bill was a physical symptom of pure architectural debt. Time for some radical consolidation surgery.

The Purge: Slashing the Container Sprawl

The realization was blunt: we didn't build an agile automation factory, we built a bloated corporate committee inside a local Docker network. I yanked the cords on the 23-agent sprawl, nuked the hyper-fragmented container footprints from the Proxmox VE cluster, and ruthlessly collapsed the entire workforce down to 4 elite Lead Squads.

We packed the playbooks, stripped out the micro-role overhead, and centralized operations under four distinct, high-density worker personas running inside optimized containers:

The Director: The centralized traffic cop managing execution states and task handoffs.
The Operator: Executing raw CLI commands, managing network scripts, and handling systems patching.
The Sentinel: Continuously plumbing the dark web feeds, processing log alerts, and monitoring indicators.
The Publisher: Drafting clean markdown content and orchestrating media production hooks.

The results on the dashboard? Immediate stabilization. CPU utilization dropped from a pinned 90% back to a cool 15%, memory allocation dropped by over 60GB across the host pool, and the server rack finally quieted down to a gentle purr.

But merging four completely different cognitive layers meant we had to solve the data handoff problem. If the Sentinel extracts a raw string, how does the Director pass it to the Operator without losing context or inducing token mutations?

The Fix: The Unified JSON Envelope Schema

The "Aha!" moment was recognizing that we needed a strict, agnostic data contract. Instead of letting agents dump raw, unformatted payload variables into each other's webhooks, we implemented a global messaging envelope inside our n8n Postgres automation engine.

Every data transmission packet across cti-net must now conform to a rigid, metadata-wrapped JSON layout. It forces the agents to speak the exact same language, preserving TLP classification boundaries and context structure across every transactional step.

Here is the exact production JSON envelope schema injected into our core workflows:

{
  "$schema": "https://threatresearcher.com/schemas/cti-envelope.v1.json",
  "metadata": {
    "vmid_ingress": 201,
    "timestamp_utc": "2026-05-17T04:12:09Z",
    "tlp_classification": "AMBER",
    "origin_conduit": "sentinel-darkweb-telegram"
  },
  "payload": {
    "actor_identity": "Unknown-Uncial-Heretic",
    "observed_ttps": [
      "T1071.001-Web-Protocols",
      "T1574.002-DLL-Side-Loading"
    ],
    "raw_indicators": {
      "ipv4": ["192.0.2.14", "198.51.100.83"],
      "sha256": ["e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"]
    }
  },
  "routing": {
    "current_lead": "director",
    "next_conduit": "operator-triage-blocklist"
  }
}

Voila! No more token drift, no more dropped webhooks, and zero context fragmentation. The Sentinel drops a TLP:AMBER alert package wrapped in the envelope, the Director routes it, the Operator triages it, and the system moves on without a single resource-wasting middle-management argument.

The agent sprawl has been surgically tamed, the unified schema is live across the PostgreSQL cluster, and the hardware footprints are behaving themselves. Act III is moving into the final stretch. Next week in Post 8, we’re wrapping up the squad era by building native IDE agent plugins using local system junctions and symlink routing to tie our execution playbooks directly with the live Git workspace.

See ya later. Happy tinkering!

The 20-Agent Shift: Clawing Back Resource Footprint with Bare-Metal Containers

Jamz Yaneza — Thu, 16 Jul 2026 04:04:15 GMT

Act III:Part 6 of the ThreatLabs CTI stack is officially in the wild, and today we’re talking about what happens when you take a good architectural idea and engineer it straight into a decentralized circus. After yanking the cords on the monolithic "God Bot" to save my execution queues, I went full-scale decentralized factory. Deployed a sprawling workforce of 23 hyper-focused micro-agents across my bare-metal Proxmox LXC containers.

The plan was beautiful on paper: complete functional decoupling, absolute role isolation, zero dropped webhooks.

Then the micro-agents started talking to each other, and the entire homelab slid into a high-latency middle-management nightmare.

The Workforce Blueprint: Decoupling the Brain

The initial design layout was divided into clean, isolated operational squads, each driven by a tight, single-purpose markdown playbook file hosted in the local Forgejo repository. No overlapping scopes, no shared variables, just pure telegraphic execution loops running across hand-me-down Intel Core hardware nodes.

We broke the 23-agent registry down into three distinct, hyper-focused divisions:

The Intel Core Squad: Running deep algorithmic sorting, matching incoming indicators against historical datasets, and parsing TLP metrics.
The Infra Operational Team: Micromanaging container system states, verifying network junctions, and triggering the fix-permissions.sh janitor loops.
The Collection Sentinel Division: Continuously scraping OSINT leaks, parsing Telegram feeds, and plumbing the deep web crawling queues.

To keep things pristine, every single agent was deployed into its own isolated, resource-constrained container instance.

# Mass-provisioning the micro-agent workforce across the cti-net fabric
for AGENT_ID in {210..233}; do
  echo "Spinning up bare-metal micro-container instance ID: $AGENT_ID..."
  pct create $AGENT_ID local:vztmpl/debian-12-standard_12.2-1_amd64.tar.zst \
    -cores 1 -memory 1024 -swap 256 -features nesting=1 \
    -net0 name=eth0,bridge=vmbr0,tag=101,ip=dhcp -storage local-zfs -rootfs local-zfs:5
  pct start $AGENT_ID
done
echo "Sovereign workforce initialized. 23 containers on the grid."

Fired up the cluster, watched the containers initialize across the Proxmox VE dashboard, and it looked like a threat intelligence masterpiece.

For about forty-eight hours.

The Problem: Token Drift and Middle-Management Arguments

Then the data scaled, the context windows filled up, and the tools completely lost their collective minds.

When you create 23 individual micro-roles, your pipeline handoffs become intensely brittle. A single raw threat report dump would hit a Collection sentinel, which would extract a partial indicator string, pass it to an ingestion agent, which would then call a verification agent, which would loop back to an indexing subagent.

Because each subagent was running its own independent context window and parsing logic, small linguistic variances began compounding at every single layer of the pipeline. By the time a simple indicator packet traveled through six separate container handoffs, the original data structure had completely mutated.

Classic token drift.

Worse, the micro-agents started falling into context fragmentation traps. An Intel Core sorting agent would argue with a Collection sentinel over whether a specific string qualified as a true threat actor TTP or a standard script artifact. They were micromanaging each other's execution loops, generating massive token overhead loops, burning through system memory, and dropping actual intelligence alerts while they debated formatting parameters back into the Postgres backend logs.

The pipeline was completely bogged down by its own organizational sprawl. We didn't build a streamlined automation factory; we built a bureaucratic corporate committee inside a local Docker network.

The Realization: Breaking Under the Overhead

The breaking point was looking at the system metrics panel. The hyper-fragmented layout was chewing through processing cycles just to manage container-to-container webhook handoffs. My server processors were pinned at 90% utilization, the storage backplane was choking on internal REST API logs, and the fan noise out here was a physical symptom of pure architectural debt.

We over-engineered the engine into oblivion. Decentralization is an excellent design parameter, right up until the synchronization overhead starts eating your entire hardware compute budget.

The "Aha!" moment was recognizing that a granular 23-agent squad model is an unmitigated disaster for token stability. We needed to aggressively collapse the hierarchy, consolidate the windows, and stop the internal agent arguments before the system completely choked on its own context fragmentation.

The 23-agent sprawl trap is real, the token drift has been thoroughly mapped, and the container micro-roles are begging for a massive organizational purge. Act III is moving fast. Next week in Post 7, we are executing "The Great Collapse"—documenting the consolidation surgery where we ruthlessly collapse the sprawl of 23 micromanaging agents down to 4 elite Lead Squads running on a unified JSON envelope schema to kill off handoff overhead for good.

See ya later. Happy tinkering!

The Great Handoff Pivot: Defeating Monolithic Automation Memory Leaks

Jamz Yaneza — Wed, 15 Jul 2026 04:00:24 GMT

Act III:Part 5 of the ThreatLabs CTI pipeline is officially on the grid, and today we’re talking about the exact moment your automation stack chokes on its own success. It’s the classic architectural trap: you build a single, massive, monolithic n8n workflow—a "God Bot"—and expect it to handle everything from dark web ingestion to automated report triage without losing its mind. It works fine for a couple of weeks. Then the metadata scales, concurrent data threads start colliding, and your beautiful centralized engine turns into an unmitigated infrastructure bottleneck.

The Monolithic Choke and "Handoff Fatigue"

The setup seemed logical on paper: a massive unified canvas inside our newly upgraded n8n PostgreSQL backend. The ingress webhook would grab a raw text dump from a Telegram monitor loop, slam it into an analysis thread, try to pass variables to an external container, scrape a related URL, and then format an intelligence brief.

Then came the silent failures.

A high-volume feed of OSINT leaks would hit the ingress, but the monolithic workflow would still be too busy flushing a heavy MISP database cache from an earlier thread to respond. Webhooks started dropping into the ether, database locks were throwing timeout errors, and the entire hypervisor node was stuck in a high-latency loop trying to swap memory to disk.

The system was suffering from classic handoff fatigue. When one monolithic engine tries to micromanage twenty disparate technical tasks sequentially, a single delayed response down the line cascades into a full-scale system lockout. The physical symptom of this architectural debt? My server array was running at a noisy full-tilt scream, pulling unnecessary wattage from the wall just to process stuck execution queues.

The Fix: Splitting the Monolith into Bare-Metal LXCs

The "Aha!" moment happened while tracking the I/O bottleneck on the storage array. Running massive, generalized automation workers inside heavy, resource-bloated virtual machines is a massive waste of my limited power budget cap.

We needed to completely decouple the functions. Instead of one monolithic entity trying to execute everything, we broke the workflow down into small, specialized worker personas. More importantly, we yanked them out of standard full-blown VMs and migrated the entire automation layout into individual, bare-metal Proxmox Linux Containers (LXCs).

LXC bypasses the massive hypervisor virtualization overhead by using the host kernel directly, letting us claw back massive chunks of idle CPU cycles and keeping the VRAM footprint down to absolute bare-minimum metrics.

Here is how you manually spin up a lightweight, bare-metal automation worker node inside Proxmox using a clean storage pool, bypassing the usual manual setup traps:

# Provision a clean, resource-constrained Debian LXC node directly from the host shell
# VMID: 201, Cores: 2, Memory: 2048MB, Network bound securely to cti-net
pct create 201 local:vztmpl/debian-12-standard_12.2-1_amd64.tar.zst \
  -cores 2 \
  -memory 2048 \
  -swap 512 \
  -features nesting=1 \
  -net0 name=eth0,bridge=vmbr0,tag=101,ip=dhcp \
  -storage local-zfs \
  -rootfs local-zfs:10

# Boot the container engine instantly
pct start 201

Inside the container, we spin up lightweight, single-purpose worker environments. No GUI, no bloated system services, no system memory waste.

Decoupling into Specialized Worker Personas

By moving to this bare-metal container architecture, we established a strict handoff protocol between separate, hyper-focused agent roles. If the ingestion worker gets slammed with 500 concurrent Telegram packets, it simply dumps them onto the database queue and exits. It doesn't wait for parsing, it doesn't care about asset routing, and it never blocks the ingress pipe.

We broke the workforce down into four core baseline squads:

The Director: The centralized traffic cop managing execution states.
The Operator: Handling raw CLI tasks, network scripts, and systems patching.
The Sentinel: Continuously plumbing the dark web feeds and log alerts.
The Publisher: Drafting clean markdown content and triggering media assets.

The rack has finally quieted down, the dropped webhooks are completely gone, and the automated pipeline doesn't skip a single beat under concurrent load. Voila!

The monolithic "God Bot" is officially dead, the bare-metal LXC layout is locked in, and the power bill might actually survive the month. Act III is officially rolling. Next week in Post 6, we're deep-diving into the messy reality of the "20-Agent Shift"—tracking what happens when you let these micro-roles multiply a bit too far, leading to severe token drift and context bloat across the container fabric.

See ya later. Happy tinkering!

The Enterprise Bridge: Transactional Database Unification and Compliant Outbound-Only Conduits

Jamz Yaneza — Mon, 13 Jul 2026 03:57:27 GMT

Type-flopping into the terminal here on the Fold4, 10:42 PM, garage temperature hovering somewhere north of uncomfortable because this North Texas summer heat wave refuses to quit and the PowerEdge rack is screaming at a noisy, full-tilt whine directly behind me. Look, it's been a minute.

By March 2026, the sovereign ThreatLabs CTI stack was functionally hardened, but it was still living like an isolated island. It felt wrong. I needed a way to bridge the gap between this garage-tinkerer playground and a professional, enterprise corporate endpoint setup without dropping the lab's strict isolation boundaries or drowning in static routing debt.

Then the backend orchestration engine hit a wall under heavy threat-report ingestion loops, and the whole plan shifted from architectural aesthetics to a straight-up data restoration battle.

Pitfall 9: The Outbound Enterprise Compliance Wall (SOAR & Teleport)

The challenge: Connecting a sovereign lab infrastructure straight into a corporate Cortex XSOAR tenant.

The naive approach says you just poke a hole through your network perimeter firewall, throw some dynamic DNS tracking at your residential public IP handle, and pray corporate compliance doesn't flag the erratic inbound connections. Good luck with that. The moment security compliance detects a random Texas residential node hitting enterprise API gateways, they're going to block the pipe and yank your network privileges.

The breakthrough: The Remote Engine (D1) architecture pattern. Instead of letting the corporate tenant call in to the garage lab, we built a dedicated outbound-only compliance conduit using a strict bridge container.

To make it professionally compliant and totally auditable, we guarded the bridge node behind Teleport running in "Recording Proxy" mode. Every action is logged, every API session is recorded, and the lab maintains complete isolation because the corporate perimeter never gets a look at the actual internal cti-net network topology. It was the exact moment this setup graduated from a simple homelab experiment into a legitimate threat intelligence factory.

Pitfall 10: The n8n SQLite Lockout Loop

With the outbound enterprise conduit locked down, the ingestion workflows started feeding heavy TLP:AMBER threat data loops back from our crawlers.

Then came the silent failures.

A heavy thread of dark web data would hit the webhook ingress, the browser would spin, and n8n would lock up completely before spitting out a generic database timeout or connection error.

The trap: Default container configurations rely on a flat SQLite file for application persistence. SQLite is beautiful for a lightweight single-user script, but the second your orchestration flows spin up concurrent multi-agent ingestion loops, the file system slams into structural I/O bottlenecks. SQLite locks the entire file table during writes, concurrent webhook threads start colliding, and your automation engine chokes on its own data data backlog.

The workaround: Forceful migration to a transactional PostgreSQL backend tier. We unified the n8n application schemas onto a high-performance infra-postgres container cluster running on our local NVMe pool to eliminate the thread collisions once and for all.

# n8n postgresql persistence layout snippet
services:
  n8n:
    image: docker.n8n.io/n8nio/n8n:latest
    networks:
      - cti-net
    environment:
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=infra-postgres
      - DB_POSTGRESDB_PORT=5443
      - DB_POSTGRESDB_DATABASE=n8n_brain
      - DB_POSTGRESDB_USER=n8n_operator

Re-up the deployment, and database lockouts are instantly eradicated. Voila!

Pitfall 11: The Public Wiki Mirroring Trap

The final frustration for the week: code safety. The core repository and documentation wiki sit locally on our self-hosted Forgejo instance, but to ensure reliable offsite redundancy and coordinate updates with external collaborators, I needed a continuous mirror loop straight into a public GitHub repository.

Using standard continuous integration workflows means spinning up a blind runner that doesn't understand context, exposing plaintext credentials across systems.

The fix: A dedicated, lightweight synchronization subagent engine using a secure forgejo-mcp gateway loop. Wrote a strict automated janitor loop that securely handles our Git mirroring pipeline without dropping plaintext secrets onto the physical drive arrays.

#!/bin/bash
# mirror-repo.sh: Automated sovereign-to-public synchronization engine
echo "Initializing authenticated repository mirror loop..."

# Hard-set local workspace handles
LOCAL_REPO_DIR="/opt/stacks/forgejo-data/repositories/threatlabs/cti-stack.git"
GITHUB_TARGET="git@github.com:threatlabs-cti/mirror-stack.git"

cd $LOCAL_REPO_DIR || { echo "Error: Local repository directory missing."; exit 1; }

echo "Pushing verified master branches to public redundancy target..."
# Force outbound-only synchronization mirror string securely
git push --prune --mirror $GITHUB_TARGET

echo "Synchronization complete. Redundancy array locked."

Wrote a cron task to tick this janitor script over every night at 2:00 AM. No manual tracking, no plaintext token exposure, and total external repository consistency. And profit!

The enterprise bridge is up, the automation engine database tier is unified on Postgres, and our local Forgejo registry is mirroring seamlessly into the public grid. Act II is locked and loaded. Next up in Act III: The Squad Era, we're tracking what happens when we decouple our single monolithic automation logic into a sprawling workforce of 20 distinct subagents deployed entirely inside bare-metal Proxmox LXC containers to reclaim our system resource footprints.

See ya later. Happy tinkering!

Hardening the Stack: Dynamic Machine Identity Injection Over Brittle Plaintext .env Files

Jamz Yaneza — Sun, 12 Jul 2026 03:51:22 GMT

Lessons learned? There's more where that came from. The problem here is that whatever is git committed is also getting deployed ....

With the shared network provisioned and the database containers finally agreeing on port numbers, it was time to talk about security debt. The initial lab config was naked. Standard plaintext credentials sitting in stale text files on disk, and static OpenVPN keys that broke the second a client IP shifted. It felt wrong.

Time to move from simple, vulnerable configs to a zero-trust architecture without drowning the lab in an absolute nightmare of iptables firewall rules.

The expectation was standard DevOps laziness: "CI/CD will make our lives easier, every commit automatically deploys". Wrote some basic GitHub Action runner playbooks to push changes automatically straight to the production root.

The problem? The runner was blind. It didn't care about git tracking boundaries. I'd fix a database password locally on an experimental test branch, commit a minor markdown change, and—poof—the runner would push the experimental branch directly to the production root, overwriting our working infrastructure config with a templated draft. We were fighting ourselves.

The discovery: Environment Isolation. The runner needed to be context-aware. Wrote a branch-aware staging workaround that splits development into a separate sandbox root (/opt/cti-dev) while production is strictly guarded behind a main branch trigger.

The Workaround: Branch-Aware Deployment

#!/bin/bash
# deploy-cti.sh: Context-aware pipeline routing logic
TARGET_BRANCH="${GITHUB_REF##*/}"

if [ "$TARGET_BRANCH" == "main" ]; then
    echo "Routing to production baseline conduit..."
    DEPLOY_ROOT="/opt/stacks/cti-prod"
else
    echo "Sandbox branch detected ($TARGET_BRANCH). Routing to staging sandbox..."
    DEPLOY_ROOT="/opt/cti-dev"
fi

mkdir -p $DEPLOY_ROOT
rsync -avz --exclude='.git' ./config/ $DEPLOY_ROOT/
cd $DEPLOY_ROOT && docker compose up -d --remove-orphans
echo "Deployment synchronized at $DEPLOY_ROOT."

Sandbox problem solved. Staging happens in the sandbox, production is guarded.

Pitfall 7: The Static VPN Tracking Debt

Remote access was the next brittle debt wall. Running standard OpenVPN with static profiles meant port-forwarding and constant dynamic IP tracking. One IP change on a client device broke the routing tables entirely, locking out the n8n automation ingress while I was away.

The drama: Teleport vs. Headscale. Teleport is incredibly shiny, but Headscale is lean, open-source, completely self-hosted, and fits the Proxmox DNA flawlessly.

The insight: The mesh network model. Re-routing our coordinator automation traffic through a dedicated Headscale node on LXC 137 allowed the stack to bridge from VLAN 107 (IoT) to VLAN 101 (CTI) via authenticated identity rules, not brittle network firewall parameters.

The Blueprint

graph LR subgraph Sovereign Mesh Client[Remote Client Device] -->|OIDC Identity Auth| HS[Headscale VPN LXC 137] HS -->|Secure Tunnel| CTI[CTI Stack Space VLAN 101] end

Zero-Trust isn't just an enterprise buzzword; it's the only way to scale orchestration access without losing your mind in a tangled mess of static routing tables.

Pitfall 8: Stale Text Files and plain-text `.env` Leaks

The final frontier for the week: the public repository trap. "The repository is public, my .env files are private—how do I bridge them without committing a credential to source control by mistake?". Worse, managing 10+ identical plaintext configuration files across different Proxmox nodes was a certified recipe for sync failures.

The breakthrough: Machine Identities. Wrote off text file management entirely, deployed a self-hosted Infisical instance, and moved directly to runtime environment injection.

The gold standard of server hardening is never writing a password to a physical host disk at all—securely authenticating the node instance itself via token-free local identity verification to pull what it needs at initialization.

# Verify the Infisical machine identity handle token offline
infisical login --method=universal-auth \
  --client-id=$INFISICAL_CLIENT_ID \
  --client-secret=$INFISICAL_CLIENT_SECRET

# Inject secrets straight into the container memory footprint at startup
infisical run --env=production -- docker compose up -d

And profit! Plaintext configurations completely expunged from the file system.

The stack is officially hardened, the automatic deployments are context-aware, and the plain text credentials have been wiped from the drive matrix. Act II is locked down. Next up in Post 4, we're building the enterprise SOAR bridge and migrating our n8n automation cluster database directly onto a PostgreSQL transactional storage tier to stop database lockouts under heavy thread loads.

See ya later. Happy tinkering!

Taming the Ports: Debugging Infinite Redirect Loops and Paranoiac Wazuh Deployments

Jamz Yaneza — Sat, 11 Jul 2026 03:41:51 GMT

It’s going on 10:23 PM, and it looks like I'm making good time kicking these entries out.

Act I:Part 1 of the sovereign ThreatLabs CTI stack continues, but today we're talking about what happens right after you build the network roads and the actual containers start throwing absolute tantrums. You think containerization solves your deployment hurdles, right? Wrong. The second you drop enterprise security platforms behind a prosumer Traefik ingress, reality hits you fast in the form of infinite browser spins and kernel panics.

Time to pull back the curtain on why raw systems administration tissue always trumps basic theoretical architecture.

Pitfall 3: The Infinite Redirect Loop of MISP

So, the cti-net shared network was live, Docker was happy, and I fired up the MISP stack. Navigated to the interface page. Typed in the baseline credentials. Hit enter.

Browser immediately went into a loopy existential crisis before spitting back ERR_TOO_MANY_REDIRECTS.

Spent three hours furiously ripping apart my Traefik frontend configurations, tracking headers, and cursing under my breath. Here is the trap: Traefik was listening on external port 8443, handling the SSL termination, and passing clean traffic down to the internal proxy on port 80. But MISP's internal code is hyper-paranoid; it saw an incoming secure request but its internal Nginx webserver assumed it was supposed to be living on standard port 443. They couldn't agree on basic port reality, so they just kept bouncing the request back and forth forever.

The fix wasn't an ingress re-write—it was forcing MISP to look at the world through our lens. You have to explicitly inject the CORE_HTTPS_PORT environment variable straight into the container environment so the internal app engine stops guessing.

# Snippet from the isolated MISP stack configuration
services:
  misp:
    image: misp-os:latest
    networks:
      - cti-net
    environment:
      - Variable_Port_Mappings=True
      # Tell MISP's internal engine exactly how the public sees it:
      - CORE_HTTPS_PORT=8443

Save config. Re-up the stack. Voila! Browser settles down, the login registers instantly, and we are in.

Pitfall 4: The Elasticsearch Memory Hog Dilemma

With MISP behaving, it was time to spin up the logging engine—Elasticsearch—to drive the indexing for TheHive and our Wazuh SIEM components. Fired it up, watched the initial process strings, and then the entire node ground to a miserable, choking crawl.

Out of Memory (OOM) killed. Standard container deployment behavior when an enterprise app hits consumer bare-metal limits.

Everyone forgets that Elasticsearch is a ravenous data hoarder. By default, it wants to allocate massive virtual memory regions and map the entire host structure directly into its heap. If your host OS kernel isn't tuned to allow massive memory allocation handles, the container drops dead on line one. Running a CTI pipeline isn't just about lazy containerized isolation—it's deep host-level systems engineering.

Had to jump directly onto the host terminal and forcefully alter the Linux kernel configurations on the fly to support the database indexing load.

# Temporarily patch the host kernel boundaries
sudo sysctl -w vm.max_map_count=262144

# Lock it down permanently so a power failure won't brick the stack
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf

The Ingress Media Hook

Tinker Note: Always set your internal ES cluster heap size explicit constraints (ES_JAVA_OPTS="-Xms2g -Xmx2g") inside the Compose environment definitions, or it will attempt to swallow every byte of RAM available in your rack space.

Pitfall 5: The Wazuh Certificate Exception Tantrum

Then came the grand finale: adding Wazuh for centralized SIEM logging capabilities. Wazuh is rightfully paranoid; it flatly refuses to pass threat data over its internal APIs without mutual TLS (mTLS) verification.

Ran their automated certificate generation tool. Total failure.

The automated script generated default credentials bound strictly to localhost. But inside our internal cti-net fabric, the containers talk to each other using explicit hostnames like wazuh.indexer. The Java engine inside the platform took one look at the hostname mismatch and threw a massive CertificateException tantrum.

Never rely on magical black-box installation scripts when things fail. We yanked the automated tools, threw together a custom OpenSSL bash script—generate-certs.sh—and hand-crafted our own Subject Alternative Names (SANs) directly into the cryptographic extensions. Controlling the root CA yourself turns a broken deployment into an absolute security fortress.

#!/bin/bash
# generate-certs.sh: Hand-crafting TLP-compliant mTLS certificates with strict SAN definitions
echo "Generating authenticated certificates for wazuh.indexer..."

# Create custom openssl configuration inline
cat < san.cnf
[req]
distinguished_name = req_distinguished_name
req_extensions = v3_req
[req_distinguished_name]
[v3_req]
keyUsage = keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth, clientAuth
subjectAltName = @alt_names
[alt_names]
DNS.1 = wazuh.indexer
DNS.2 = localhost
EOF

# Generate private key and sign the certificate with host SAN extensions
openssl req -new -newkey rsa:4096 -nodes -keyout wazuh-indexer.key \
  -out wazuh-indexer.csr -subj "/CN=wazuh.indexer" -config san.cnf

openssl x509 -req -in wazuh-indexer.csr -CA root-ca.crt -CAkey root-ca.key \
  -CAcreateserial -out wazuh-indexer.crt -days 365 -extensions v3_req -extfile san.cnf

echo "Cryptography verified. Host alignment locked down."

Injected the signed certificate files into the production volumes, restarted the deployment sequence, and the indexers initialized flawlessly on the first pass. And profit!

The infrastructure ports are officially tamed, the indexers are stable, and the internal cryptography isn't lying to itself anymore. Act I is officially wrapped up and behind us. Next month, we're moving into Act II: Zero-Trust and Secrets, mapping out how we re-routed coordination through self-hosted Headscale VPN infrastructure on LXC 137 and killed off brittle plaintext .env configurations via runtime Infisical dynamic machine identities.

Until then, see ya later. Happy tinkering!

Building the Sovereign Foundation: Why We Had to Build the Infrastructure Roads Before the Houses

Jamz Yaneza — Sat, 11 Jul 2026 03:10:04 GMT

It's a Saturday night but the new season doesn't start until October! Well, then, time to finally get this set of blog entries out from draft to live, instead.

The ThreatLabs CTI stack started simple enough: a standard open-source sovereign threat intelligence factory running MISP, OpenCTI, TheHive, and DFIR-IRIS. It was stable, purring along beautifully over a unified layout until we tried to hook it into the cloud.

The plan was to let an n8n workflow feed unstructured dark web posts and red team scripts to a cloud LLM provider to extract indicators of compromise (IOCs) and draft intelligence briefs automatically. It worked for about a month, right up until the automated pipeline hit a report detailing a cartel's operational security practices, and the corporate API model threw a safety tantrum and refused to process it. Then it happened again on a standard threat actor TTP analysis.

Corporate cloud LLMs are trained to act like overly sensitive customer-service reps. We needed a cynical forensic analyst who doesn't flinch at raw malicious text. The conclusion was obvious but highly annoying: we had to yank the cords, go completely local, and run our own intelligence fabric on bare metal where no hyperscaler could dictate what our threat data is allowed to look like.

But before we could even load a model, we had to build the roads. And that’s where the infrastructure design fallacies kicked down the door.

Pitfall 1: The "Simple" Network Fallacy (`cti-net`)

The expectation was classic homelab laziness: spin up a few separate Docker Compose stacks for each application, let them map their default bridges, and expect them to magically resolve each other by hostname.

The messy reality? Independent Compose stacks love creating isolated, siloed networks by default. TheHive couldn’t resolve MISP, the web crawlers were blind, data handoffs were dropping webhooks like crazy, and the whole stack was yelling network timeout errors back into the logs.

We debated the classic architecture fork: a monolithic docker-compose.yml that wraps everything under one giant config versus totally isolated stacks. Monoliths are an absolute nightmare to maintain or update independently; pure isolation breaks the integration entirely.

The fix required a fundamental mental shift—infrastructure has to come first. We had to manually define a shared external system network called cti-net across the host hypervisor space before deploying a single tool. You build the roads before you construct the houses.

The Blueprint

graph TD subgraph Host Network Space External[cti-net: Shared External Network] end MISP[MISP Container] --> External TheHive[TheHive Container] --> External OpenCTI[OpenCTI Container] --> External n8n[n8n Automation Ingress] --> External

# Manual network provision on the host command line:
# docker network create cti-net

networks:
  cti-net:
    external: true

Voila! Every service now registers onto the same sovereign highway, resolving each other seamlessly by their internal container handles. Network fallacy solved. Volume permissions? That was a far uglier beast.

Pitfall 2: The Volume Permission Nightmare

Docker makes downloading enterprise software incredibly lazy, right up until you try to persist the data onto physical host disks and Linux file permissions turn into absolute hell.

When you scale a CTI stack, you're dealing with disparate upstream container standards. PostgreSQL runs natively as UID 70. ElasticSearch wants to run as UID 1000. Redis drops onto another custom UID entirely. The second you map these container directories to persistent host folders on your local NVMe storage array, the ownership structures mismatch, the deployment drops into a CrashLoopBackOff, and the console logs start screaming Permission denied.

I've seen folk try to bypass this by blindly running chmod 777 across their entire storage pool. Don't do that. It’s a lazy, shameful quick-fix that completely breaks file system security and leaves your underlying infrastructure naked.

We needed automation, not manual dirty hacks. The turning point was crafting a dedicated baseline janitor script—fix-permissions.sh—that executes at the host level right before the stack is brought up. It parses our targeted database and directory layouts, forcefully aligning the host folder ownership parameters to match exactly what the internal container engines require.

The Workaround: `fix-permissions.sh`

#!/bin/bash
# fix-permissions.sh: Host-level janitor script for the CTI storage layout
echo "Automating volume ownership parameters for cti-net stack..."

# Hard-set exact paths relative to your local storage mount
CTI_DATA_DIR="/opt/stacks/cti-data"

# PostgreSQL volume alignment (UID 70)
sudo chown -R 70:70 ${CTI_DATA_DIR}/postgres

# ElasticSearch volume alignment (UID 1000)
sudo chown -R 1000:1000 ${CTI_DATA_DIR}/elasticsearch

# Redis volume alignment
sudo chown -R 999:999 ${CTI_DATA_DIR}/redis

echo "Volume ownership verified. Keys to the castle distributed safely."

Now, we wrap this step directly into our host deployment workflow. No manual directory tracking, no security compromises, and no more silent volume launch failures.

With the roads laid and the volume maps behaving themselves, Act I is officially on the grid. Next up in Post 2, we’re tackling what happens when MISP and Traefik get into an existential fight over port mapping, Nginx falls into an infinite redirection loop, and ElasticSearch tries to eat every byte of system memory on the server array.

Until then, see ya later. Happy tinkering!

Revamping My Homelab Network with AI Assistance: A Year with the UniFi Cloud Gateway Max (Part 1 – The Brainstorming Spark)

Jamz Yaneza — Sat, 10 Jan 2026 22:33:26 GMT

Hey everyone, it's January 10, 2026, and looking back, it's been almost exactly a year since I unboxed that UniFi Cloud Gateway Max on January 17, 2025. What started as a quick upgrade has turned into a quiet game-changer for my setup—mostly because life got in the way and the homelab sat mostly idle for months. But now that things are calming down, I've had time to reflect, optimize, and even lean on AI to guide the rearchitecture. This ties right into my ongoing experiments with local LLMs and AI workflows (Ollama evals, GPT-4o vs. locals, etc.)—using chat models not just for quick answers, but to frame real decisions when time is short.

The unboxing was low-key: excited phablet-typing at low battery, but the real story is the upgrade path. I kept the old Cloud Key Gen2 Plus (UCK-G2-PLUS) since UniFi Protect was already set up perfectly—cameras, NVR, all humming. Instead of a full migration (which bombed on my first restore attempt—network reconfiguration fail, thumbtack reset drama), I just adopted it into the new Gateway Max as a pseudo-dedicated NVR. Smooth controller integration, offloads video processing, and keeps the main gateway focused on routing/firewall/visibility. Pro tip: Document everything—VLANs, firewall rules, port maps. I learned that the hard way.

No AI for choosing between Cloud Gateway Max and Ultra—that was standard research, heavily influenced by Evan McCann's excellent comparison charts. The Max's compact form, 2.5Gbps ports, and solid IDS/IPS headroom for home/SOHO fiber made it the clear pick for low upkeep.

But when it came to the deeper homelab rearchitecture—storage pools, ZFS tweaks, degrowth from power-hungry enterprise gear—I threw the initial prompt at several big chatbots: Perplexity, Grok, Copilot, and Gemini. Gemini Pro (specifically Gemini 3 in that session) stood out with the most comprehensive, context-aware response that built directly on my details without needing much re-prompting. It suggested ditching RAIDZ1 for striped mirrors, provided copy-paste commands, and spotted pitfalls I missed. We can dive deeper into why Gemini won (and how context size played a role) in a follow-up post—I'll link it here once it's live. For now, the key takeaway: AI framed the whole pivot perfectly, saving me hours during a quick sous vide wait (because why not?). Project done, celebrated with a nice steak and a glass of Argentinian Malbec—pure instant gratification.

Looking back at this past year, the real starting point of my homelab rearchitecture wasn't the hardware changes or the final ZFS commands—it was a single, focused chat session with Gemini Pro (Gemini 3) in November 2025, about half a year past the time I wrapped-up the Proxmox VE three-part series (May 2025). I needed to solve a classic homelab dilemma: my existing ZFS pools (basin, pond, stank) were mismatched RAIDZ1 setups that throttled random-write IOPS for heavy log ingestion in Wazuh and OpenCTI, while the MD1200 DAS was a noisy, power-hungry relic I no longer needed for capacity. The goal was a cleaner, faster flashpool using striped mirrors, zero-downtime migration of all LXCs/VMs, and future-proofing without buying new drives.

Instead of spinning up n8n (general workflow automation with 300+ nodes, great for complex pipelines) or Flowise (LangChain-based low-code builder for RAG chatbots, vector search, chunking, and multi-agent flows), I just pasted my full hardware context into Gemini and let the conversation flow naturally. Modern LLMs' context windows (Gemini 3 Pro supports up to 1 million input tokens, though practical chat sessions often operate effectively in the 32K–128K range for speed) meant it remembered every detail across turns—no re-explaining hardware, no chunking strategies, no vector DB setup. I could discuss concerns ("Striped mirrors will sacrifice some capacity—worth it?"), refine aspirations ("Prioritize reliability for self-hosted sec tools over bulk storage"), and get grounded suggestions with built-in reasoning.

It felt like having a knowledgeable homelab partner at 2 a.m.—when no one else in the house cares about RAID trade-offs or ashift values. I argued points, it pushed back with logic, and it architected the end state first (flashpool as the target), then handed tactical steps—including the key storage migration I used to move every LXC (from Pi-hole to Immich) to rpool via Proxmox's "Move Disk" feature (GUI or qm migrate CLI).

The biggest value here isn't just code generation (though the copy-paste commands were spot-on)—it's the shift from "build your own RAG pipeline" to instant, conversational architecture. In the past, for a similar deep dive I'd spin up n8n or Flowise, requiring nodes, connectors, chunking strategies, vector DB setup, and testing loops—hours of plumbing before you even get useful output. With modern LLMs' massive context windows and native capabilities, I skipped all that. No pipeline setup, no chunking debates—just describe the problem, argue concerns, refine aspirations, and get grounded suggestions with search grounding if needed. It feels like a 2 a.m. homelab buddy who remembers your entire setup across turns, no one else in the house needs to care.

This usability leap has opened doors for more efficient "code vibing" homelab projects—iterating fast on configs, scripts, or even full services without heavy orchestration. (That's fodder for another post: what's up with Antigravity, Cursor, and Claude Code in the agentic coding space? Early 2026 vibes are wild—Google's Antigravity as an agent-first IDE with autonomous planning/execution via Gemini 3, Cursor as the refined VS Code AI accelerator, Claude Code for terminal-deep reasoning and autonomous refactors. More on that soon.)

Here's the key part of that brainstorming flow (lightly edited for readability, but preserving the back-and-forth feel—full history attached for reference). I've pulled directly from the conversation to ground everything in truth: Gemini's suggestions were tailored to my described future requirements (high-IOPS for sec tools, single-node simplicity, power/noise reduction), and it emphasized migration to rpool first as a safety net.

My initial prompt (dumping hardware + goals, as in chat):

i'm rebuilding my tech stack and plan to host most everything in Proxmox VE via LXD as much as possible. this is currently in my plan: Shuffle aka Shuffler, Wazuh, OpenCTI, MISP, AIL Project

Gemini's response (key excerpts—framed the strategy, split into native vs. Docker-in-LXC based on my requirements):

This is a highly feasible plan. You can host nearly 100% of this stack in LXC containers on Proxmox... Prioritizes the Proxmox VE Helper-Scripts... Type A: Native LXC (Best Performance) Tools: Wazuh, MISP, AIL... Type B: Docker-in-LXC (Easiest Maintenance) Tools: Shuffle, OpenCTI...

Wazuh (SIEM) Recommendation: Use the Community Helper Script... Command: Run the community script command in your Proxmox shell to spawn a dedicated "Wazuh" LXC... Allocate at least 8GB RAM...

Shuffle (SOAR) Recommendation: Docker-in-LXC... Use the Docker LXC helper script... Caveat: Shuffle uses Opensearch. You must set vm.max_map_count=262144 on the Proxmox Host...

And so on for each tool, with detailed methods.

Follow-up prompts/responses (iterating on Dockge, Podman, CasaOS—Gemini argued against Podman for my stack, suggested Dockge):

User: I also previously used Portainer CE but can forego that if there is a simpler solution... I've also started to experiment on Podman...

Gemini: This is a common pivot... Switch to Dockge... Why avoid Podman for Shuffle/OpenCTI? Networking Complexity... Socket Binding... Documentation... Revised Plan: ... bash -c "$(wget -qO - https://github.com/community-scripts/ProxmoxVE/raw/main/ct/docker.sh)" ... curl https://raw.githubusercontent.com/louislam/dockge/master/install.sh | sh

On TrueNAS/CasaOS/MD1200 pivot (Gemini suggested PCI passthrough for TrueNAS VM, warned on MD1200 issues like fan noise):

User: I'm also interested in Casa OS... re-install TrueNAS Scale as a container but physically tie it to control my Dell MD1200 array...

Gemini: This part of your plan needs a course correction... CasaOS: Skip it, redundant with Dockge... TrueNAS Scale: Must run as VM, not container... PCI Passthrough for HBA card... Warnings for MD1200 + SSDs: Fan Noise (The "Jet Engine" Problem)... 6Gbps Limit... Interposers...

Core storage rearchitecture (the big one—Gemini suggested migration first, then destroy/wipe/create based on my goals):

User: ...the script did not seem to detect the previous installation

Gemini: ...Before doing anything, stop the container and take a manual snapshot... Verify your Current Version: /var/ossec/bin/wazuh-control info... Manual Update Sequence: apt-get update... apt-get install --only-upgrade wazuh-indexer -y... etc.

Later turns focused on pool states:

Gemini: pool: basin... config: raidz1-0... (full before states as extracted)

After user shared zpool status: Suggest striped mirrors for flashpool, with exact create command matching my devices.

Migration: "Use Proxmox's Move Disk in the GUI to relocate containers/VMs to rpool first—zero downtime—before destroying old pools."

Full commands from Gemini (as I executed—always test/back up first!):

Bash

# Migration (per container/VM)
qm config   # Check current disk
qm migrate  --targetstorage rpool  # Or GUI: Hardware > Disk > Move Disk

Bash

# Destroy (post-migration)
zpool destroy basin
zpool destroy pond
zpool destroy stank

Bash

# Wipe
for disk in sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm; do
  wipefs -af /dev/$disk
  sgdisk --zap-all /dev/$disk
done
partprobe /dev/sd[c-m]

Bash

# Create flashpool
zpool create -f -o ashift=12 flashpool \
  mirror scsi-36b083fe0dffbfa002d21cca4172514ec scsi-36b083fe0dffbfa002d21ccbb187797e4 \
  mirror scsi-36b083fe0dffbfa002d21cf9227b92ed4 scsi-36b083fe0dffbfa002de694ac1f246e22 \
  mirror scsi-36b083fe0dffbfa002de694ab1f06447c scsi-36b083fe0dffbfa002de694ab1f10ce50 \
  mirror scsi-36b083fe0dffbfa002de694ac1f16b6a1 scsi-36b083fe0dffbfa002de694ac1f1e4a2c \
  mirror scsi-36b083fe0dffbfa002de694ad1f2bbbbf scsi-36b083fe0dffbfa002de694ad1f3282b5

Bash

# Tuning
zfs set compression=lz4 flashpool
zfs set atime=off flashpool
zfs set xattr=sa flashpool
zfs set recordsize=64k flashpool
# Quotas for sec datasets
zfs create flashpool/sec-stack
zfs set quota=4T flashpool/sec-stack

Bash

# Register
pvesm add zfs flashpool --pool flashpool --content images,rootdir --sparse 1

Before/after pool states (direct from chat—Gemini echoed my zpool status, then suggested the new config):

Before (basin example): raidz1-0 with three drives, ONLINE.
After (flashpool): Five mirrors, ONLINE—matched my SSD count for striped performance.

This collaborative vibe—argue, refine, confirm—turned a potential headache into a sous-vide-session project (steak + Malbec celebration still the highlight). No RAG pipeline, no chunking debates, just instant, context-aware architecture.

The payoff? A clean flashpool rebuild, MD1200 shutdown (kept as optional expansion), and a lighter, more reliable homelab—all sparked by one chat that felt collaborative, not mechanical. Sous-vide steak + Malbec celebration? Still unbeatable.

In a follow-up, we'll compare why Gemini Pro edged out Perplexity/Grok/Copilot here (context retention, native tools), and tease more on emerging coding agents.

Drop your own 2 a.m. AI partner moment below—what's the wildest late-night homelab win you've had with chat alone? 😄

Waiting for the Home Assistant CLI to be ready...

Jamz Yaneza — Sun, 07 Sep 2025 21:17:28 GMT

Let me start by saying that I've figured out a repeatable way to get around this issue which had stumped me all week! And, here's the proof:

Let's back up a second and talk about what happened and what got us here. Previously, these things happened (not in any specific order):

An urgent message was received one day from someone at home that the "internet was dead". That situation was true for about a fully day overnight since it required an ISP technician to come inspect their fiber hardware. I was able to get some very minimal IoT smart devices working but nothing that needed the "cloud" worked.
At an average of about 11-cents/kWh the servers typically hit at $30~50/month prior to summer on-set and more than likely, due to their garage location in North Texas hea, would register more if I did a new Kill-a-Watt measurement. As it is, they're running at a noisy full-tilt instead of the normal 20% speed.
A mix of both ARM64 and AMD64 machines in a Proxmox VE cluster is not the most optimal way to run things due to inconsistencies of storage media and other dependencies.
Recently de-clustering my high-availability setup due to the resulting heat and aforementioned cost of running a #homelab and job function change.
Wanting to find a way to setup Frigate and leverage the Coral Dual Edge TPU and PCIe 1x adapter but finding all current market options too costly or power hungry

These are our current cast of characters: old laptop hand-me-downs from my now no-longer teenage son in the forms of an Acer Aspire V5-122P (AMD) and an Acer Aspire E5-576 (Intel with QuickSync). These had previously been pre-installed with Windows 7 and I had then flashed with the latest Ubuntu Desktop at the time of being received.

I then followed the appropriate hardware install instructions for the official Home Assistant Operating System Generic x86-64 which called for enabling UEFI Boot and disabling Secure Boot (because HassOS uses Buildroot and a customized version of Linux).

Let me digress here and mention that about two-(2) weeks prior I had been looking into converting at least one of the Nano Pi R5S into Home Assistant hosts but wasn't getting very far since I had to figure out how to merge separate Buildroot processes, one for HA and the other to incorporate the Rockchip drivers.

There are some older instructions out there that talks about setting EFI and all that but in my experience those options only came into play when Secure Boot was enabled initially in-order for you to basically reset the options to factory defaults. But if you find yourself in a bind, then feel free to try it out just to remove variables:

I initially decided to go full steam ahead and use the Aspire E5-576 as the main HA unit and, also, wanted it to be running Frigate, Jellyfin, Plex, and a whole host of different things to make full use of the overpowered hardware and Intel Quick Sync. However, the way HassOS is envisioned right now is that it needs to boot pristine without any other things loaded with it during the back-end Docker boot-up. Anything more than that then the installation will complain and you will get the Unsupported nod from Home Assistant observer and that would not be a happy experience, trust me:

So, I pivoted and decided to use the Acer Aspire V5-122P, which only had 500GB of storage, to instead be a sort of head-end and call the Quick Sync enabled machine remotely as a service. Those of us dabbling with AI automation will recognize this as using Agents that do their own autonomous activities.

But, that wasn't before I had already been hitting my head on the wall no matter what I did as a work-around to get past booting into this frustrating message:

Waiting for the Home Assistant CLI to be ready...

I've been all around the support groups and discussions and even found a bug report, which did give me a clue!

Within those workarounds someone talked about renaming labels and partitions and that may be true in some situations where the installation has progressed to completion. I'm only repeating the post here because it taught me a little about getting a separate terminal out of HA without breaking it further: Ctrl-Alt-F2

However, the whole thing about workarounds and renaming labels falls apart because there is nothing to fix, and by that I mean that is what if the partitions and all that underlying stuff may have failed in various ways and there's nothing to operate those fixes on. The key for me was noticing the Dependency failed part of the boot process:

This, my friends, was my aha! moment, the part where latent pieces of the boot and file structure might not have been correctly expunged from the system.

Whether it is a bug or not is debatable. What is important is that removing this part of the information is the key. So, back to the recommended Method 1 of the official HA installation, before I restored the HassOS image I decided to wipe the disk with zero's as well as remove all partitions.

However, as you can see that was going to be more than a seven-(7) hour process for a 1TB disk:

I would not fault you, dear readers, if you haven't had any digital forensic background to understand the GUID Partition Table (GPT) But, it should suffice to say that overwriting that minimum 1% (one percent) of the disk would be more than enough to clear out the data and disassociate most of the disk information from mapping. Back in the days of boot sector viruses this place of the disk was always under contention because it presented a way for malware to load prior to the operating system and circumvent security protections and many other shenanigans!

To proof my theory, I decided to do the same data clearing for the 500GB disk on the Acer V5:

Once completing the image restore you should get a layout similar to this:

The only things to do now are to shutdown Live Ubuntu session, disconnect all external drives, connect any special hardware, boot-up, and (optionally) restoring from backup.

I'm going to give it about a week of live testing for functionality as well as looking at the power consumption from the wall. If things are stable then I might just shutdown the virtualized version and stick with this laptop version.

The next step of the puzzle is how to make use of the space beyond what the default HassOS image allocates by default for itself, approximately 30GB. I'm going to try to tackle that in a follow-up:

There is a mismatch somewhere in terms of what HassOS reports compared to how things are actually mapped on-disk. Here's what GParted reports from a Live Ubuntu session:

As can be seen, and similarly claimed during the installation process, the hassos-data partition did, in fact, expand to the rest of the disk. Why then is the built-in Disk metrics only reporting about 10GB left of free space? If you happen to come by this post and have an explanation, feel free to add it in the comments.

As for the Aspire E5, I'm currently leaning towards making it into just a full blown Docker or Podman host. The reason being that if I installed Proxmox VE on top of it then I'd have to contend with the extra hypervisor overhead on an already system memory challenged system. A simple Ubuntu or Debian Bookworm bare metal server installation might net me just the right amount of processing utility that I need.

See you next time.

I restarted my homelab with Proxmox VE Helper-scripts and a phablet! - Part 3

Jamz Yaneza — Mon, 19 May 2025 22:15:40 GMT

Welcome back, it's literally been a minute since I published Part 2 of this series where I found a fix for the Pi-hole NTP sync errors.

I don't think there's going to be a separate need to discuss blogging on Ghost via mobile. It should suffice to say that copy-pasting images works better on Chrome mobile mode; you can get better native desktop-mode experience when using Firefox. The capability to flip back-and-forth seamlessly is a game changer for speed edits. I'm not sure about the experience using an actual tablet like the Google Pixel Tablet or a Chromebook like the Acer Chromebook Plus 14" (CB514-4HT-359X) but that's going to be one of the next experiments. My final edits have been on the old MacBook Pro, whenever it's still got a charge.

Part 3: Proxmox VE Helper scripts meets XDA Developers

Let me start with the elephant in the room, why use LXC instead of using the existing Docker VMs in my current stack? Well, let's talk about that.

I've been running MISP and OpenCTI for the longest time in a cluster for high-availability. Specifically, ElasticSearch can become unwieldy when you try to give it the best possible scenario when you've got limited rack space expansion and a power budget cap. Essentially, because of this I'm considering a re-do and clawing back resources these projects have taken over in resources. Plus, I've recently had some bad spells with old Portainer versions that over time has given me a bad taste in the mouth. And then, from the last time I had touched edits on Docker did Podman enter the picture. So far, in the past few months this might be the direction I take if only so I can learn more about it's pitfalls and the fact it is what you would get to use in a locked-down Red Hat enterprise image. There's alot to unpack there, but my point is that I don't take this decision lightly. There's going to be a learning curve and trade-offs to be made. I'll have to be OK with that. So, this is why we're going back to individual containers in the form of LXC instead of full VMs if we don't have to.

Now, with that out of the way, a brief note on XDA Developers. Hailing from the Philippines, I've been on the cutting edge of mobile computing since back in the Nokia days. If I couldn't get new stuff locally then I could always hop on a plane to Singapore and visit Sim Lim Square. And, on one of these trips is where I got hold of my first O2 XDA (and a bunch of Havaianas). Mind you, at around this time I was sporting the latest Nokia 9500 Communicator. I was ready for another form factor, let those parts sink in. Rooting and hacking bleeding edge mobile devices is how XDA and I crossed paths. Years later after transformation I really enjoy the self hosted section topics for my use case.

About Proxmox VE Helper scripts, I discovered this group while looking to experiment with lightweight versions of my homelab projects. What I would do was learn from the scripts and incorporate or improve on them for my specific use case. Now, however, I plan to use several of the templates to get things restarted based on availability and if something gets mentioned over at XDA.

I've already showed what a basic phablet install experience and experiment looks like over in Part 1 of this series. Here's where I've landed so far:

There's a big part of myself that cringes at the auto-numbering since my original schema was to map IP to VMID. If it matters that much later then, in theory, it should just be a matter of a backup and then restore to a specific VMID. Well, in my mind, that's how I'm coping with the situation.

In Part 4, we can discuss about the use cases and groupings of what's been installed. Or, I may just launch a new series based on specific group topic projects linked stemming from here.

Happy trails.

I restarted my homelab with Proxmox VE Helper-scripts and a phablet! - Part 2

Jamz Yaneza — Mon, 19 May 2025 13:48:15 GMT

Strike while the iron is hot? Well, I'm going to do a version of type on this phablet until it powers down because its now at 11% and hopefully not loose any content. I guess that is part of the curse when you've got the ideas and time is measured in either bars of battery or wifi strength.

In Part 1, I talked about success in getting Pihole setup on ARM64. There might be some other lightweight resourced projects to install or migrate later, perhaps Uptime Kuma?

Part 2: Pi-hole NTP sync sidequest

There was an NTP setting that needed to be addressed related to permissions, which by some reports is a bug or a setting that doesn't make sense depending on how you have Pihole set-up:

Navigate to Settings, and notice the green Basic toggle:

Click on this and you'll enter Expert mode:

A new option will then appear called All settings:

Within All settings navigate to the Network Time Sync tab and disable most of the settings:

The most important edit is zeroing the setting for ntp.syc.interval:

Save your settings and you will then notice the error alert disappear:

With that out of the way, onward to Part 3.

Yes, I'm still doing my edits from the Flip4!

I restarted my homelab with Proxmox VE Helper-scripts and a phablet!

Jamz Yaneza — Sun, 18 May 2025 23:49:42 GMT

It's been a minute.

We'll, several months in fact since getting most of the Lab situated. The Unifi gateway has proven to be a useful addition to complete the end-to-end visibility for my network. I've also walked myself back from over-engineering the access points, meaning to say I've held back deploying all of the nanoHD APs and instead forced myself to find the house layout and leverage the Unifi Designer page to map out some better configuration options given the limitations of where I can drill or line some CAT6a.

Yesterday, I looked at my simple stack of containers and decided it's about time to put back some of the tools that I've been missing given my current job function use cases. I still have MISP and OpenCTI running and they're due for a version refresh. While Pihole and Adblocker get updated, they're no longer part of my network's flow and I'll rectify that when I transition to using the new firewall zones feature in the latest Unifi firmware. Home Assistant is running but the Frigate integration hasn't been done until I decide on a spare machine to run the Coral TPU on. Smartthings needs a configuration update since I no longer use or host a Blue Iris integration server and set of IP cameras, I've peppered the place with cheap Wyze cameras, instead.

For today's topic, the main goals I had were:

Install something that works on both of the NanoPi R5 installed with Proxmox VE on ARM64
Comb through the recent self-hosting posts from the XDA Developers site and select stuff that aligned with my past and future usage goals
Decide to use individual LXC containers and leverage those that exist in Proxmox VE Helper scripts site
Forego the temptation to use Docker or Podman, but Install fresh containers for future experiments
Do all of this using my Samsung Galaxy Fold4 phone while testing the new U7 Pro Outdoor with omni-directional antennas I just installed the day before

Part 1: NanoPi R5 with Proxmox VE ARM64

This previous project is sort of a heartache for me. While I did get Proxmox VE installed, my ventures into installing a working VM and a basic OS have been unstable and mainly unusable probably due to system requirements. And, this is probably where the beauty of LXC shines due to it simply using what's already there.

The helper script for Pihole will fail and complain to you and suggest you use a different installation script. No need, since ARM64 is a supported architecture:

And so, let's see if we can get Debain installed copying the basic specs from:

But, we're going to use the default Proxmox VE container template to create it:

This got created fine:

We now have a running Debain LXC to build upon!

Let's use the one liner installer:

But first, get curl installed and update packages:

Configure the easy defaults:

Finalize checks:

Browse to the web address and enter credentials:

Update the rules:

Inspect alerts:

This NTP sync error has a fix and sort of controversial in its implementation. So, I'll post the specific steps I had to do separately.

Stay tuned for Part 2.

The 3rd Return from Being Offline

Jamz Yaneza — Sun, 26 Jan 2025 00:50:07 GMT

... and we're back !!!

Today just so happened to be a standard month of setting-up shop in the new digs. Most of the boxes have been unpacked except for the holiday stuff. I don't think we knew how much decorations we'd amassed over the years and that already included having gotten rid of all the almost decade old disintegrating stuff (which had actually been great for Halloween, by the way).

The only ISP offering fiber in my area had already installed their hardware before we moved in by drilling a hole through brick straight into the middle of the living room wall which limited my options unless I wanted to make another one to feed out from the ONT modem onto CAT6A. As much as I would have loved to forego a separate server room it felt like a blessing of silence when the Mikrotik router was turned off.

And what did it take for me to do that?

I was hoping you'd ask. But first, here's my server rack hastily recreated in the garage just to get this blog back up and running:

Yes, I know, all those spaghetti cables!

What you don't see is a CAT6A shielded cable snaking 60-feet across open ground from the living room through brick. Instead, there's a black CAT6 cable coming from the UniFi 60W switch down-linked from the PoE injector of the UniFi NanoHD access point into the Mikrotik CRS328-24P-4S+RM switch. Apparently, this works via the magic of how UniFi configures Wi-Fi meshing on access points at the moment and it results in what would have been just power to instead the AP becoming a network media bridge.

I'm not going to complain, the alternative is to pre-order the recently announced official UniFi Bridge, which is currently out of stock. I'm probably still going to order one when it becomes available given some network stability due to the signal having to travel through layers of brick to the detached garage. Or ... I might just consider some kind of upgrade to WiFi 6E/7.

It is pretty interesting starting from scratch. I am now realizing that I had over-engineered (of course!) last time by surrounding all corners of the house with access points, even if I did turn down their transmit signals. This sort of explains why all my IoT devices still preferred to connect to the UniFi 6 Mesh that I'd placed on the roof rather than the nearest access point. So, really, in terms of access point coverage less is more and only add if you really must.

In my next post I'll talk about my experience with finally getting a real UniFi Gateway onto the rack and the status of the rest of the gear as a result of that network hardware change. I'll be posting and update towards progress on hosting this Ghost blog platform on ARM64 as a result of my previous experiment, too!

Until then, Happy New Year of the "Dragon"!

Installing Proxmox VE 8.x on ARM64 (NanoPi R5s)

Jamz Yaneza — Thu, 05 Dec 2024 08:11:23 GMT

This little journey started the night before Thanksgiving as we were packing for a long-ago planned trip to Japan.

As is my usual wont, I found myself puttering around restlessly with the HomeLab trying to eek out a little bit more from the little bits of hardware I had accumulated since my most recent personal research project. The PCIe board for the Coral Dual Edge TPU had arrived but my previous attempt at installing it was hijacked when I accidentally ended up shutting down my Proxmox VE node that was hosting several of my Docker projects – including about one(1)-TB worth of OpenCTI metadata that hadn't had time to settle and get written correctly to disk. How was I to know that the hardware backplane on the Dell R720xd it was hosted on was going to bork and make my ZFS cluster disappear into the ether?

That unfortunate incident wasn't all too bad. At this point having all that data just happened to be not as critical or immediately usable for work stuff and I'd already rebuilt the setup several ways enough that I've got a working project fork of OpenCTI to my own to help ease the recovery and deployment process. Then, too, I've been back to having my router, switches, and WiFi on physical machines such that my whole house network and its entertainment side of the equation are not affected. But, now, I did have a pair of FriendlyELEC NanoPi R5s sitting cold and unused and just begging to be put back into use – not as FriendlyWrt travel routers but as Docker hosts. In fact, I mused, could I run an ARM64 version of Proxmox VE on these things?

The bigger plan, of course, was if it would also be viable to to install Proxmox VE onto a high performance AArch64/ARM64 server and see compute gains on something like Apple Silicon.

It started well enough. I was mildly surprised to know that on the latest 2024-10-16 firmware release it literally said "Added Proxmox VE system". What luck!

Per the instructions, I downloaded the latest Official image from the link and used the Flash Official OS to eMMC option via TF Card.

Further perusing lead me to the "Getting Started with Proxmox" page. I used balenaEtcher to write the file onto a 32-GB SDXC card (the multi-OS version takes up about 16GB decompressed), inserted it into the TF card slot and plugged-in the power brick. And then watched from the old spare monitor connected via HDMI as the system kept booting into a loop-cycle for a few hours before I gave up, powered the whole thing down, took a nap, then left for the airport to Japan.

Returning from Nara then Osaka via Haneda to the Narita airport a few days ago, I suddenly wondered if the power brick that I'd used from the UniFi Flex Mini was the culprit. My assumption was that it provided 5V/2A but to my surprise it was actually only rated for 5V/1000mA and the boot loop immediately righted itself once I plugged in the power to an Anker Prime Power Bank. Basic TF to eMMC installation took less than a minute for each SBC, after that:

I made sure to add all the pve8 repos and do an initial update of everything to bring up the Proxmox VE version from 8.2.x to 8.3.0

But, there's one other problem, I wanted ZFS. Thankfully, I was able to find the source of the Proxmox-Port that FriendlyELEC had used to build their image and it had part of the installation instructions.

In the case of the NanoPi R5s, I was able to find the header information and kernel that matched it from:

Here's that download:

I gave it a go with just the headers, initially:

And discovered that ZFS wasn't loading:

Checking the available kernels, I noticed that 6.1.57 wasn't even one of the available options:

I'll save you some tinker time and say make sure to install the Linux kernel headers and images first. Then, reconfigure zfs-dkms for good measure, and then reboot:

And, now this error which was sort of weird:

This didn't fix things all the way and I found a reference to try and do a dkms autoinstall which would help bake-in the missing ZFS modules into the kernel. I tried two-(2) different approaches to see which sequence works best but in the end they both worked.

Sequence A

Sequence B

And profit!

Once the ZFS pool(s) are created to your liking then you can try some of the tips and tricks to create dataset folders. Because we're dealing with ZFS, it is these dataset folders that you can map from the Proxmox VE Datacenter -> Storage options to contain whatever image types you prefer.

My initial impression based on a quick test, from a usability standpoint, its easier to create an ARM64 virtual machine inside a standard x86 host machine than the other way around. That may simply be due to the limitations of the SBC hardware that I have on hand.

So, in summary, this works thanks to the magic of Linux and hardware cross-compatibility, and all the hard work poured over this Proxmox VE fork. I'd really love to see it officially adopted and it might have even happened if large enough volume installs merited official developer support. As it is, Apple no longer makes servers; while Intel and AMD have finally caught-up to Qualcomm since I last wrote about it a few months ago.

If I do find an Apple Silicon to play around with, you can be sure I'm going to give this build another try and update y'all. Happy tinkering!

Large Action Models vs the Rabbit R1 (and others)

Jamz Yaneza — Fri, 05 Jul 2024 23:24:00 GMT

The Rabbit R1 was introduced during CES 2024 and it was probably one of the promised products that helped make the the term Large Action Models (LAM) more widely known compared to mainstream Large Language Models (LLM) and, to a degree, the future iteration which is Large Agentic Models (LAM).

This got me excited since this would essentially be a physical example of what an LLM stand-alone device could be. Yes, I knew it would be buggy. And, so, I pre-ordered as soon as it became possible to do so. I'll get to the unboxing later below.

Rabbit notes what their definition of what a Large Action Model as:

Whereas, the newsletter Towards AI published their mapping comparison last March of how these currently map in their estimation:

Towards AI LLM vs LAM vs LAM comparison Q1 2024

However ....

I think some earlier example of agentic model reasoning

In a paper published on Arxiv towards the end of January 2024 talked about LLM Situational Awareness. One of the examples was a generated image and an action-flow based on what was seen, including a all to restock whatever was needed for refill.

The thing is, back in 2013 a similar function was available on some refrigerators as reported on CNET: The LG Smart Refrigerator know what you have, knows what you need. But, maybe that was probably a low hanging fruit choice example. The other examples in the paper were more interesting, such as a potentially to automatically call emergency services during life threatening situations. I doubt my smart fridge, 10-years old technology as it is, would do that. But, perhaps an SOS from a smart watch? Or in the case of vehicle accidents then the onboard computer (the old OnStar emergency services comes to mind). Since everyone has a phone, then Android location services?

My point here, being that any new use of GenAI should be showing improvements from things that already exist on many consumer devices.

Once the Rabbit R1 arrived, I let it sit on the table top for a few weeks. I gave everyone the chance to either love or pan it, as well as the developers a chance to fix the pre-release bugs they could. Today, I decided to review it. What are weekends for, right?

Rabbit R1 - Unboxing and First Impressions

I found it odd that Rabbit's service link calls out to a link called VPN Proxy. Maybe its to circumvent or hedge against some network filtering? Whatever it is, the result was a wait-time to get to the log-in screen for each service.

In the case of Door Dash it wouldn't even let me in because there were "numerous logins from my IP". Well, uh, oh! Me thinks that its possible that the use of that transitional VPN could be the culprit? I'll try to give it 24-hours and if it still does the same then that more or less confirms my hunch.

LAM Agents

(to be continued!)

(in)Secure IT

The Great Collapse: Taming Context Sprawl into 4 Elite Core Lead Squads

The Purge: Slashing the Container Sprawl

The Fix: The Unified JSON Envelope Schema

The 20-Agent Shift: Clawing Back Resource Footprint with Bare-Metal Containers

The Workforce Blueprint: Decoupling the Brain

The Problem: Token Drift and Middle-Management Arguments

The Realization: Breaking Under the Overhead

The Great Handoff Pivot: Defeating Monolithic Automation Memory Leaks

The Monolithic Choke and "Handoff Fatigue"

The Fix: Splitting the Monolith into Bare-Metal LXCs

Decoupling into Specialized Worker Personas

The Enterprise Bridge: Transactional Database Unification and Compliant Outbound-Only Conduits

Pitfall 9: The Outbound Enterprise Compliance Wall (SOAR & Teleport)

Pitfall 10: The n8n SQLite Lockout Loop

Pitfall 11: The Public Wiki Mirroring Trap

Hardening the Stack: Dynamic Machine Identity Injection Over Brittle Plaintext .env Files

Pitfall 6: The Blind CI/CD Overwrite

The Workaround: Branch-Aware Deployment

Pitfall 7: The Static VPN Tracking Debt

The Blueprint

Pitfall 8: Stale Text Files and plain-text .env Leaks

Taming the Ports: Debugging Infinite Redirect Loops and Paranoiac Wazuh Deployments

Pitfall 3: The Infinite Redirect Loop of MISP

Pitfall 4: The Elasticsearch Memory Hog Dilemma

The Ingress Media Hook

Pitfall 5: The Wazuh Certificate Exception Tantrum

Building the Sovereign Foundation: Why We Had to Build the Infrastructure Roads Before the Houses

Pitfall 1: The "Simple" Network Fallacy (cti-net)

The Blueprint

Pitfall 2: The Volume Permission Nightmare

The Workaround: fix-permissions.sh

Revamping My Homelab Network with AI Assistance: A Year with the UniFi Cloud Gateway Max (Part 1 – The Brainstorming Spark)

Waiting for the Home Assistant CLI to be ready...

I restarted my homelab with Proxmox VE Helper-scripts and a phablet! - Part 3

Part 3: Proxmox VE Helper scripts meets XDA Developers

I restarted my homelab with Proxmox VE Helper-scripts and a phablet! - Part 2

Part 2: Pi-hole NTP sync sidequest

I restarted my homelab with Proxmox VE Helper-scripts and a phablet!

Part 1: NanoPi R5 with Proxmox VE ARM64

The 3rd Return from Being Offline

Installing Proxmox VE 8.x on ARM64 (NanoPi R5s)

Large Action Models vs the Rabbit R1 (and others)

Pitfall 8: Stale Text Files and plain-text `.env` Leaks

Pitfall 1: The "Simple" Network Fallacy (`cti-net`)

The Workaround: `fix-permissions.sh`