DataAcuity — Architecture Overview
System-level picture of everything on .106, how it connects to the rest of The Geek Network, and the architectural patterns it follows. Read this before going deeper into any of the individual service docs.
1. The four servers (where things live)
The Geek Network production runs on four servers; this doc focuses on .106 but everything connects:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ .104 PRIMARY │ │ .105 STANDBY │ │ .106 DATAACUITY│ │ .118 LB / EDGE │
│ ───────────── │ │ ───────────── │ │ MAPS / BI │ │ ───────────── │
│ Windows Server │ │ Windows Server │ │ Ubuntu 24.04 │ │ Windows Server │
│ IIS + 36 APIs │ │ IIS + 36 APIs │ │ Docker (54 c.) │ │ IIS + ARR │
│ PostgreSQL R/W │ │ PostgreSQL R/O │ │ PostgreSQL, │ │ Let's Encrypt │
│ (primary) │ │ (replica) │ │ Valhalla, │ │ ButlerAPI │
│ │ │ │ │ GeoGlobal, │ │ CircleVpnAPI │
│ All TGN app │ │ Read replica │ │ dbt, Superset │ │ │
│ data lives │ │ for ETL + │ │ Grafana, etc. │ │ Traffic split │
│ here │ │ failover │ │ │ │ + SSL term │
└─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘
197.97.200.104 197.97.200.105 197.97.200.106 197.97.200.118
.106 is a different beast than the other three. The other three run the TGN ecosystem (apps + APIs + LB). .106 runs a data/intelligence stack that consumes TGN data, enriches it, and surfaces it back as BI + map services.
2. .106 — at a glance
54 Docker containers across 8 logical groups:
┌────────────────────────────────────────────────────────────────────────┐
│ .106 — DataAcuity / Maps / BI │
│ ────────────────────────── │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ 1. MAPS STACK (DataAcuity Maps product) │ │
│ │ maps_api, maps_db, maps_nominatim, maps_osrm, maps_tiles, │ │
│ │ maps_prerender, maps_redis, dataacuity_portal │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ 2. GEO-GLOBAL STACK (new — geocode/POI/routing) │ │
│ │ geo_mcp (MCP server :5026), valhalla (:5027), │ │
│ │ geo_db (PostGIS, 13.4M places + 3.5M POIs) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ 3. APP-SUPPORT APIS (subset of TGN APIs that live here) │ │
│ │ tagme_api (:5023), transit_api (:5030) │ │
│ │ (most TGN APIs live on .104/.105; these two are exceptions) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ 4. DATA WAREHOUSE + BI (the heart of the BI strategy) │ │
│ │ data_warehouse (postgres :5001), dbt_transform │ │
│ │ superset (:5003) + superset_db │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ 5. MARKETS / FINANCIAL DATA │ │
│ │ markets_api (:8000), markets_db, markets_dashboard, │ │
│ │ markets_openbb_backend, markets_redis │ │
│ │ ⚠️ STATUS: scaffolding exists, live data ETL is broken │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ 6. WORKFLOW / AUTOMATION │ │
│ │ n8n (:5008) — actively used (55 MB sqlite, content unknown) │ │
│ │ automatisch (:5004) — empty, 0 flows │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ 7. AUTH / IDENTITY / GATEWAY │ │
│ │ keycloak (:8180) + keycloak_db │ │
│ │ api-gateway-external (:8084), api-gateway-internal (:8083), │ │
│ │ gateway-db, gateway-redis, api-docs (:8082) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ 8. AI / TOOLING / OBSERVABILITY │ │
│ │ ai_brain_webui (Open WebUI :5000), morph_convertx (:5011), │ │
│ │ sandbox_webstudio (:5012), sandbox_developer_ide (:5013), │ │
│ │ twenty_crm (:5005), bio_onelink (:5009), │ │
│ │ dashboard-backend (:5007), markets_dashboard (:5010), │ │
│ │ grafana (:5015), prometheus (:9090), loki (:3100), │ │
│ │ promtail, alertmanager (:9093), cadvisor (:8081), │ │
│ │ node-exporter (:9100), nginx-exporter, redis-exporters │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────────┘
3. Networks (Docker)
The 54 containers sit on 13 Docker networks. The two important ones:
| Network | What lives there | Notes |
|---|---|---|
data-warehouse_data_stack |
Most of the .106 stack (geo_db, geo_mcp, valhalla, maps_api, dbt_transform, markets_*, superset, grafana, prometheus, loki, n8n, automatisch, twenty, bio, ...) | Service-name DNS works → geo_mcp reaches valhalla by hostname |
maps_maps_network |
Maps-specific subset (maps_api, maps_db, etc.) | Older / legacy partition; some maps services dual-homed |
dataacuity_network |
dataacuity_portal-only | Public portal isolation |
ai-brain_data_stack |
ai_brain_webui only | LLM workload isolation |
keycloak_network |
keycloak + keycloak_db | Auth isolation |
monitoring |
grafana + prometheus + loki + alertmanager + exporters | Observability isolation |
Container-to-container traffic is unencrypted but contained within Docker bridge networks — it doesn't touch the host network interface. Anything published with
-p HOST:CONTAINERis on the public NIC; see §5.
4. Data flow — three patterns
4.1 App-driven (user-initiated, real-time)
User (mobile/web)
→ .118 (Traefik → SSL term, ARR routing)
→ .104 or .105 (TGN APIs)
→ app's postgres DB on .104
→ on map-related calls: out to .106:5020 maps_api → geo_mcp → geo_db / valhalla
→ on B!/Butler calls: → maps_api or geo_mcp via MCP
4.2 ETL (scheduled, batch)
.104 postgres (primary, R/W) — write traffic from apps
⇣ streaming replication ⇣
.105 postgres (standby, R/O)
⇣ extract job pulls (15 min cadence for events, hourly for dims) ⇣
.106 data_warehouse (raw.*) — landing zone
⇣ dbt transformations (staging → intermediate → marts → analytics) ⇣
.106 data_warehouse (marts.*) — business-ready
⇣ push-back via postgres_fdw or logical replication ⇣
.104 analytics_db — per-app aggregates, app reads locally for low-latency in-app analytics
4.3 Enrichment loop (cross-service, real-time during ETL)
.106 dbt (during a transformation)
→ .106 geo_mcp (reverse_geocode, interesting_nearby — enrich location events with city/POI context)
→ .106 valhalla (route distance/duration — enrich trip events with travel context)
→ external FX provider (when markets_api is repaired or via fallback)
5. Public exposure — what's reachable from the internet
.106 currently publishes 35+ ports directly to 0.0.0.0 (raw port mapping, no reverse proxy). This is documented in detail in DataAcuity_Security_Posture.md. The summary:
- ✅ Intended public (with their own auth): keycloak (8180), grafana (5015), superset (5003), api-gateway-external (8084), dataacuity_portal (5006)
- ⚠️ Public but should be gateway-only (currently bypassable): geo_mcp (5026), valhalla (5027), maps_api (5020), markets_api (8000), n8n (5008), automatisch (5004), twenty_crm (5005), morph_convertx (5011), ai_brain_webui (5000), bio_onelink (5009), dashboard-backend (5007)
- 🚨 Critical leaks (must be internal-only): data_warehouse postgres (5001), maps_db postgres (5433), loki logs (3100), cadvisor (8081), all the prometheus exporters (9100, 9113, 9121-3, 9188)
- 🟢 Already internal-only (good): geo_db, gateway-db, gateway-redis, keycloak_db, markets_db (internal port), maps_osrm, maps_prerender, maps_redis, superset_db, twenty_db, twenty_redis, automatisch_db, automatisch_redis, bio_db
See DataAcuity_Security_Posture.md for the hardening plan.
6. Storage layout
| Path on .106 | What | Approximate size |
|---|---|---|
/var/lib/docker/volumes/ |
All Docker volumes (DBs, caches) | ~50 GB |
/home/geektrading/valhalla/tiles/ |
Africa Valhalla tiles + tile.tar | 8.4 GB |
/home/geektrading/geo-mcp/ |
geo_mcp source code | ~50 KB |
/home/geektrading/dbt/ |
dbt project (DataAcuity DBT Project) | ~10 MB |
/home/geektrading/api-gateway/ |
API gateway config | small |
/home/geektrading/suite/traefik/ |
Traefik config + ACME certs | small |
/home/geektrading/suite/keycloak/ |
Keycloak config | small |
/home/geektrading/monitoring/ |
Grafana/Prometheus/Loki config | small |
/home/geektrading/backups/ |
Nightly DB dumps | ~10 GB rolling |
/home/geektrading/airbyte/ |
Airbyte (status: empty placeholder?) | TBD |
Disk: 352 GB total, ~270 GB used as of 2026-05-28, ~80 GB free.
7. Service catalog — at a glance
What every container does in one line:
| Container | Purpose | Status |
|---|---|---|
geo_db |
PostGIS — 13.4M places + 3.5M POIs | 🟢 LIVE |
geo_mcp |
MCP server v0.4 — 6 geo tools | 🟢 LIVE (⚠️ no auth) |
valhalla |
Africa turn-by-turn routing | 🟢 LIVE (⚠️ no auth) |
maps_api |
DataAcuity Historical Maps API | 🟢 LIVE (⚠️ no auth) |
maps_db |
PostGIS for maps_api | 🟢 LIVE (🚨 public port) |
maps_nominatim |
OSM Nominatim geocoder | 🟢 LIVE |
maps_osrm |
OSRM router — but no data loaded | 🟡 EMPTY |
maps_tiles |
Vector/raster tile server | 🟢 LIVE |
maps_prerender |
SEO prerender for maps pages | 🟢 LIVE |
maps_redis |
Cache for maps_api | 🟢 LIVE |
tagme_api |
TagMe backend | 🟢 LIVE (but should be on .104/.105 per pattern) |
transit_api |
Transit routing | 🟢 LIVE |
data_warehouse |
Postgres — BI warehouse | 🟢 LIVE (🚨 public port 5001) |
dbt_transform |
dbt CLI for transformations | 🟢 LIVE (cron-driven) |
superset |
BI dashboards | 🟢 LIVE |
superset_db |
Postgres for Superset | 🟢 LIVE |
markets_api |
Markets/financial REST API | 🟡 SCAFFOLDING (no data) |
markets_db |
Postgres for markets | 🟡 PARTIAL |
markets_dashboard |
Markets UI | 🟡 PARTIAL |
markets_openbb_backend |
OpenBB backend | 🔴 DEAD (port 8080 timeout) |
markets_redis |
Cache for markets | 🟢 LIVE |
n8n |
Workflow automation | 🟢 LIVE (content audit pending) |
automatisch |
Workflow automation (alternate) | 🟡 EMPTY |
automatisch_db / automatisch_redis |
Backing for automatisch | 🟡 |
keycloak |
Identity / SSO | 🟢 LIVE |
keycloak_db |
Postgres for keycloak | 🟢 LIVE |
api-gateway-external |
External API gateway | 🟢 LIVE |
api-gateway-internal |
Internal API gateway | 🟢 LIVE |
gateway-db / gateway-redis |
Backing for gateway | 🟢 LIVE |
api-docs |
API documentation portal | 🟢 LIVE |
dataacuity_portal |
Public DataAcuity portal | 🟢 LIVE |
dashboard-backend |
DataAcuity dashboard backend | 🟢 LIVE |
ai_brain_webui |
Open WebUI (local LLM frontend) | 🟢 LIVE |
twenty_crm |
Open-source CRM | 🟢 LIVE |
twenty_db / twenty_redis |
Backing for twenty | 🟢 LIVE |
bio_onelink |
Link-in-bio service | 🟢 LIVE |
bio_db |
Postgres for bio | 🟢 LIVE |
sandbox_webstudio |
Web dev sandbox | 🟢 LIVE |
sandbox_developer_ide |
IDE sandbox | 🟢 LIVE |
morph_convertx |
File conversion (ConvertX) | 🟢 LIVE |
grafana |
Dashboards (ops) | 🟢 LIVE |
prometheus |
Metrics scraping | 🟢 LIVE |
loki |
Log aggregation | 🟢 LIVE (🚨 public port 3100) |
promtail |
Log shipper | 🟢 LIVE |
alertmanager |
Alert routing | 🟢 LIVE |
cadvisor |
Container metrics | 🟢 LIVE (🚨 public port 8081) |
node-exporter |
Host metrics | 🟢 LIVE (🚨 public port 9100) |
nginx-exporter |
nginx metrics | 🟢 LIVE (🚨 public port 9113) |
redis-exporter-* |
Redis metrics (3 instances) | 🟢 LIVE (🚨 public ports) |
postgres-exporter-* |
Postgres metrics (2 instances) | 🟢 LIVE (🚨 public ports) |
frosty_cray (gone) |
Previous valhalla build container | — |
8. Architectural patterns we follow (or should)
8.1 Layered data
raw → staging → intermediate → marts → analytics (see DataAcuity_BI_Pipeline.md §8). Never skip a layer.
8.2 Read from replicas, write to primary
ETL only reads from .105. Pushes back to .104 go through analytics_db (a separate DB on .104 with its own write user). Never read or write to .104 user DBs from .106 directly.
8.3 Service-name DNS inside Docker
Containers reach each other by name (geo_mcp → http://valhalla:8002). The host network IP is for external traffic only. Anything internal that uses localhost:5026 instead of geo_mcp:8000 is a smell.
8.4 Gateway > raw ports (target state)
Public traffic should hit Traefik → routed to the appropriate container internally. The current state with 35 raw ports published is a transient dev convenience and a security gap (see DataAcuity_Security_Posture.md).
8.5 SSO via Keycloak (target state)
Anything that an internal user touches (Superset, Grafana, n8n, twenty_crm, ai_brain_webui, dashboard-backend) should authenticate against Keycloak. Today most of these use their own user/password.
8.6 PII gate at intermediate.* (BI pipeline)
The intermediate.* layer in the warehouse is the only place where PII is allowed to go from "present" to "absent". Every model in this layer has compliance review. Every downstream model (marts.*, analytics.*, ml.*) has a PII-absence dbt test that fails the build if anything leaks.
8.7 K-anonymity ≥ 5 on user aggregates
Any aggregated user-level mart (per-city, per-app, per-day) is checked for k ≥ 5 by dbt test. Suppress or generalise below that.
9. Tech stack
| Layer | Choice | Notes |
|---|---|---|
| OS | Ubuntu 24.04 LTS | Standardised |
| Containerisation | Docker + docker-compose | Per-stack compose files in /home/geektrading/<stack>/ |
| Reverse proxy | Traefik (deployed, partially wired) | Target state owner of all public TLS |
| Identity | Keycloak | Realm: master + project-specific |
| DBs | PostgreSQL 14/15/16, PostGIS | Per-service version pinning is OK |
| Data warehouse | PostgreSQL 15 | Schema-based separation, dbt-managed |
| Transformations | dbt Core 1.7 (postgres adapter) | CLI in dbt_transform container |
| BI | Apache Superset (latest) | Wired to data_warehouse via SQLAlchemy |
| Workflow | n8n (active), Automatisch (dead) | Plus dbt cron for batch |
| Routing | Valhalla 3.5.1 (Africa tiles, gis-ops image) | OSRM is also installed but data-less |
| Geocoding | Custom geo_mcp + Nominatim (legacy) | geo_mcp is the new primary |
| Tile serving | tileserver-gl (maptiler) | For DataAcuity Maps frontend |
| LLM access | Open WebUI + Ollama | For local model access from B! |
| Monitoring | Prometheus + Grafana + Loki + Alertmanager | Standard observability stack |
| Logs shipping | Promtail | To Loki |
| Container introspection | cAdvisor | For Grafana dashboards |
| CRM | Twenty (open-source) | B2B pipeline |
| File conversion | ConvertX (morph_convertx) |
10. What this folder does NOT cover
For these, look elsewhere in the repo:
| Topic | Where |
|---|---|
| TGN app code | code/Apps/*/ |
| TGN API code | code/APIs/*/ |
| TGN deployment scripts | Deployment/*.ps1 |
| Banking compliance rules | .claude-memory/banking-compliance-rules.md |
| PgBouncer pattern | .claude-memory/deploy-pattern-pgbouncer-cascade.md |
| Server credentials | Deployment/deployment-credentials.ps1 |
| TGN service registry | code/Config/TheGeekNetworkServices.json |
| App design guidelines | docs/DESIGN_GUIDELINES.md |
| Wolverine healing | docs/WOLVERINE_HEALING_PIPELINE.md |
11. Cross-references within this folder
DataAcuity_README.md ← you read this to find docs
DataAcuity_Architecture_Overview.md ← you are here (the picture)
DataAcuity_Security_Posture.md ← what's exposed, how we fix it
DataAcuity_BI_Pipeline.md ← the BI data pipeline design
GeoGlobal_README.md ← GeoGlobal service overview
GeoGlobal_API_Reference.md ← every endpoint
GeoGlobal_Integration_Guide.md ← C#/Blazor patterns
GeoGlobal_Data_Schema.md ← geo_db schema
GeoGlobal_Deployment.md ← GeoGlobal ops runbook