DataAcuity — Security Posture

Status: ⚠️ Current state is NOT production-ready for data traffic. See §3 for blockers and §6 for the hardening plan. Last audited: 2026-05-28 Owner: Tinashe Bhengu

This document is the honest accounting of what's secure on .106, what isn't, and the prioritised path to making the box safe for real customer data flowing through the BI pipeline.

1. TL;DR

.106 runs 54 Docker containers. The infrastructure for proper security exists (Traefik for TLS, Keycloak for auth, fail2ban active, internal-only Docker networks) but most services are exposed raw on the public IP without authentication because they were originally set up for dev convenience and never hardened.

Before the BI pipeline carries real customer data, the following must be true:

No PostgreSQL port is reachable from the public internet
No internal-only service (Loki, cAdvisor, exporters, dashboards) is reachable from the public internet
Every customer-touching API has either Keycloak auth or a documented "this is intentionally public" decision
Backups of data_warehouse are verified restorable
Monitoring + alerting on auth failures, anomalous query volumes, and disk pressure is wired
A documented incident response runbook exists

We're at 0 of 6 right now. Estimated work: 1–2 weeks for the lockdown, 1 week for monitoring + runbooks.

2. Threat model

What we're defending against, in priority order:

Threat	Impact	Mitigation strategy
Unauthenticated read of customer PII from data warehouse	POPIA/GDPR breach, regulatory fines, brand damage	Anonymisation pipeline (§6 of BI Pipeline doc) + lock down `data_warehouse` to internal-only
Public PostgreSQL exposure with brute-force credential attack	Total compromise of warehouse data, lateral movement to other DBs	Close port 5001/5433 to public, require VPN or jump host for direct DB access
Unauthenticated MCP / API abuse (geo_mcp, valhalla, markets)	Rate-limit-free scraping, data exfiltration of POIs/places, denial of service	Front everything with Traefik + API key or Keycloak token
Container escape via cAdvisor / Docker socket	Host compromise, all-container compromise	Close cAdvisor public port, restrict Docker socket access
Log exfiltration via public Loki	PII visible in app logs leaks publicly	Close Loki public port
Admin UI takeover (n8n, twenty, ai_brain_webui, automatisch)	Workflow tampering, CRM data leak, LLM cost runaway	Put behind Keycloak SSO via Traefik
Credential leak via .env files / Docker inspect	Lateral movement	Secrets in env-var-only, never in committed files; use Docker secrets where supported
DDoS on public services	Service degradation, cost spike	Cloudflare in front of Traefik (TBD) + per-service rate limits
Backup compromise	Data loss + RTO blown	Encrypted backups, off-server replication, restore drills

Out of scope (covered elsewhere):

App-layer authn/authz (lives in TGN AuthAPI per CLAUDE.md)
App-layer business logic abuse (lives in each app's threat model)
Banking compliance specifics (lives in .claude-memory/banking-compliance-rules.md)

3. Current state — the audit

Audit run on .106 on 2026-05-28. Findings grouped by severity.

3.1 🚨 P0 — must fix before any production data traffic

#	Issue	Detail
1	`data_warehouse` PostgreSQL on port 5001 published to 0.0.0.0	Anyone on the internet can attempt to connect. Only the password protects the warehouse. Source: `docker ps` shows `0.0.0.0:5001->5432/tcp`
2	`maps_db` PostgreSQL on port 5433 published to 0.0.0.0	Same as above for the maps database
3	`loki` log aggregator on port 3100 published to 0.0.0.0	All container logs (potentially with PII in app log lines) reachable by anyone who knows the LogQL API. No auth required
4	`cadvisor` on port 8081 published to 0.0.0.0	Full container introspection — anyone can see images, env vars, resource usage, command lines
5	Prometheus exporters (node-exporter :9100, nginx-exporter :9113, redis-exporters :9121/9122/9123, postgres-exporters :9187/9188) all published to 0.0.0.0	Host and DB metrics leak, useful for attackers fingerprinting the system

3.2 ⚠️ P1 — must fix before customer-facing scale

#	Issue	Detail
6	`geo_mcp` on 5026 has no authentication	Anyone can call MCP tools (geocode, reverse_geocode, search_places, route, discover_quest, nearby POIs). Confirmed HTTP 200 on `/sse` with no credentials
7	`valhalla` on 5027 has no authentication	Africa routing free to anyone. Could be abused for free routing-as-a-service
8	`maps_api` on 5020 has no authentication on root or /docs	OpenAPI exposed, all endpoints callable
9	`markets_api` on 8000 has no authentication	Even though data is currently broken, the surface is public
10	Admin UIs published raw: n8n (5008), automatisch (5004), twenty_crm (5005), morph_convertx (5011), ai_brain_webui (5000), bio_onelink (5009), dashboard-backend (5007), api-docs (8082)	Each has its own auth (varies in strength); none are behind a unified SSO. Should all front through Traefik + Keycloak
11	No TLS certificates found at `/etc/letsencrypt/live/` on the host	HTTPS termination is happening somewhere (likely Cloudflare or `.118` ARR) but `.106` itself doesn't terminate TLS. Container-to-container is unencrypted. Internet-to-container plaintext if hit on the host IP directly
12	`tagme_api` (5023) and `transit_api` (5030) live on .106	Per `CLAUDE.md` all 36 APIs should be on `.104`/`.105`. These two are exceptions. Should either be moved or the rule updated and documented

3.3 🟡 P2 — should fix in next 30 days

#	Issue	Detail
13	Keycloak deployed but barely wired	Master realm exists with public_key, but most services don't authenticate against it. Wiring this is the cleanest way to fix #10
14	No verified backup restore drill for `data_warehouse`	`/home/geektrading/backups/` exists but no documentation of when restoration was last tested
15	No documented incident response runbook	If `data_warehouse` is compromised, what do we do? Who's paged? Where's the rotation key? No answer
16	No anomaly alerting on auth failures or query volumes	We have Prometheus/Grafana but no rules for "unusual query rate against geo_db" or "auth fail spike on Keycloak"
17	`replicator` credential in plaintext in `Deployment/deployment-credentials.ps1` (committed to repo)	Should be in a secret manager (Vault, AWS Secrets Manager, Azure Key Vault) — even though the repo is private, this is the wrong pattern
18	No automated PII-leak scanning on dbt models	The BI pipeline §6.3 specifies it; not yet implemented
19	n8n's 55 MB sqlite content is black-box to ops	Could contain credentials, workflows touching customer data, etc. Audit overdue
20	`maps_osrm` runs without data but still consumes resources and presents attack surface	Either load Africa OSM into it or decommission
21	`automatisch` has 0 flows but is publicly exposed	Same as #20 — either commit to using it or remove

3.4 🟢 P3 — nice to have

#	Issue	Detail
22	`fail2ban` is active but the policy isn't documented	Verify policy covers SSH + WebUI auth failures + Postgres
23	No CIS / hardening baseline scan ever run on the host	Run `lynis audit system` for a baseline
24	No SBOM / image vulnerability scan	Run `trivy` against every container image, schedule monthly
25	No automated TLS renewal monitoring	If Traefik's ACME fails silently, certs expire. Need an alert
26	No audit log for who SSHed and what they ran	OS-level audit (auditd, falco) not deployed

4. What's already good ✅

Credit where due — not everything is broken:

Docker network isolation is correctly used for DBs: geo_db, gateway-db, keycloak_db, superset_db, twenty_db, automatisch_db, bio_db, markets_db (internal port), maps_redis, gateway-redis, twenty_redis, automatisch_redis, markets_redis — none are publicly exposed
maps_osrm and maps_prerender correctly internal-only
fail2ban is active on the host (mitigates SSH brute force)
Keycloak is deployed and ready to be wired — the auth infrastructure exists, just isn't used
Traefik is deployed with ACME — TLS infrastructure exists
api-gateway-external / api-gateway-internal containers exist — the gateway pattern is set up
Monitoring stack is comprehensive — Prometheus + Grafana + Loki + Alertmanager + multiple exporters
dbt and the data warehouse infrastructure are sound — schema layering pattern works
Replication from .104 to .105 is healthy — ETL won't have to touch the primary

5. Compliance — what regulators care about

For each regulation the BI pipeline must satisfy, what's the current security gap?

Regulation	Requirement	Current gap
POPIA (SA)	"Reasonable technical and organisational measures to prevent unauthorised access"	Public Postgres ports (P0 #1, #2) — direct violation. Public Loki (P0 #3) — likely violation if any PII in logs
POPIA	"Limit access to what is necessary for the purpose"	No role-based access on warehouse (everyone reading uses `dwh_user`) — partial gap
POPIA	"Mandatory breach notification within 72 hours"	No incident response runbook (P2 #15) — gap in capability to comply
FICA	"Records retained 5 years, securely stored"	Backups not verified restorable (P2 #14) — capability gap
GDPR Art. 32	"Pseudonymisation and encryption of personal data"	Anonymisation pipeline designed but not deployed; encryption-at-rest assumed but not verified
GDPR Art. 25	"Privacy by design and by default"	Public Postgres ports violate this on its face
GDPR Art. 30	"Records of processing activities"	No data processing register exists
GDPR Art. 33-34	"Breach notification 72 hours"	Same as POPIA — capability gap
PCI-DSS (if any card data)	Network segmentation, no card data in logs	Out of scope for `.106` (SDPKT handles cards, on `.104`) but verify no card data leaks into `data_warehouse` via the pipeline

Bottom line for compliance: the warehouse cannot accept production customer data until P0 items 1–3 are closed, full stop. Items 4–5 are also blockers given anti-fingerprinting expectations under "reasonable technical measures."

6. Hardening plan — phased

Sequenced so each phase is independently shippable and each phase reduces risk meaningfully.

Phase A — Lock down public ports (3-5 days)

Acceptance criteria: All 🚨 P0 issues from §3.1 closed.

Steps:

Change PostgreSQL port mappings for data_warehouse and maps_db from 0.0.0.0:5001->5432 to 127.0.0.1:5001->5432. External access requires SSH tunnel or VPN.
```
# in docker-compose.yml
ports:
  - "127.0.0.1:5001:5432"  # was "5001:5432"
```
Same change for loki, cadvisor, all *-exporter containers. These don't need any public access — Prometheus scrapes them from inside the Docker network already.
Update Prometheus scrape configs if any used host.docker.internal:9100 style — switch to container-name targets.
Verify with external nmap from a different IP that ports 5001, 5433, 3100, 8081, 9100, 9113, 9121-9123, 9187, 9188 no longer respond.
Document the new access path for the ops team (SSH tunnel command, Tailscale config, or bastion host).

Estimated time: 1-2 days. Risk: low (the consumers of these ports are all internal).

Phase B — Front public APIs with Traefik + auth (5-7 days)

Acceptance criteria: All ⚠️ P1 issues from §3.2 closed. No service published raw to 0.0.0.0 except the few intentionally public.

Steps:

Inventory all currently-public services — categorise as: (a) intentionally public (api-gateway-external, dataacuity_portal), (b) needs auth wrapper (geo_mcp, valhalla, maps_api, markets_api), (c) admin (n8n, twenty, ai_brain_webui, etc.) needs Keycloak SSO.
Configure Traefik routes for each category, terminating TLS at Traefik. Issue Let's Encrypt certs via existing ACME setup.
For category (b): wrap with an API-key middleware initially (faster), wire to Keycloak token validation as Phase B.5.
For category (c): integrate Keycloak SSO via Traefik's ForwardAuth middleware (or use oauth2-proxy as the intermediate).
Remove 0.0.0.0: from every docker-compose for the wrapped services. Internal Docker DNS handles container-to-Traefik routing.
Verify with nmap that only ports 80 (redirect), 443 (Traefik), 22 (SSH), and 8084 (api-gateway-external if intentionally separate) respond.

Estimated time: 1 week. Risk: medium (touches every public surface — needs careful change windows).

Phase C — Backups + DR drill (2-3 days)

Acceptance criteria: P2 #14 closed.

Verify the nightly backup job at /home/geektrading/backups/ is actually running and what it backs up
Run a restore drill into a fresh empty container — measure RTO
Set up off-server backup replication to either .118 or external S3
Document the runbook for "data_warehouse is gone, restore it"
Add backup-freshness alert to Prometheus + Alertmanager (warn if backup older than 26 h)

Phase D — Incident response + monitoring (3-5 days)

Acceptance criteria: P2 #15, #16 closed; P1 PII alerting from BI pipeline §11.2 in place.

Write incident response runbook — who's paged, escalation chain, breach notification template
Wire Alertmanager to actually page someone (currently it just collects)
Add Prometheus rules for the alert classes in BI Pipeline §11.2 (P1 alerts page on-call, P2 alerts create tickets)
Auth-failure alerts: Keycloak login failure spike, postgres auth failure spike, fail2ban ban-rate spike
Anomaly alerts: warehouse query volume 3σ above baseline, disk free <20 GB, container restart loop

Phase E — n8n audit + secret rotation (2-3 days)

Acceptance criteria: P2 #17, #19 closed.

Copy n8n's sqlite out of the container, read it, document every workflow
For workflows touching customer data, ensure they go through the BI pipeline path (not direct DB access) — refactor if needed
Migrate replicator and other secrets out of Deployment/deployment-credentials.ps1 into a secret manager
Rotate any credential that was visible in committed files

Phase F — Hardening baseline + image scans (2 days)

Acceptance criteria: P3 #23, #24 closed.

Run lynis audit system on .106, document findings
Run trivy image against every image used, document CVEs above HIGH
Schedule both as monthly Grafana-tracked jobs

Phase G — Decommission dead services (1 day)

Acceptance criteria: P2 #20, #21 closed.

Decide: maps_osrm — load Africa OSM data or decommission. Same as Valhalla but redundant if Valhalla covers the use case
Decide: automatisch — start using it or decommission
Stop, remove, document the decision in a CHANGELOG

Total

Approximately 3-4 weeks of focused security work to get .106 to production-ready for data traffic.

7. Definition of "production-ready for data traffic"

For the BI pipeline to start carrying real customer data, ALL of these must be true:

All P0 issues from §3.1 are closed (no public Postgres, no public infra ports)
All P1 issues from §3.2 are closed (no unauthenticated API, no public admin UI)
Backups verified restorable (Phase C complete)
Alerting wired and tested (Phase D complete)
Incident response runbook exists and the on-call rotation knows where it is
PII-absence dbt tests are running on every marts.* model
Compliance sign-off recorded (a named compliance reviewer has audited a sample and signed off)
A documented "kill switch" — a single action that stops all data flow into the warehouse if something is wrong

Anything less is "dev / staging quality only" and the warehouse must contain only synthetic or already-anonymised data.

8. Specific service hardening notes

8.1 `geo_mcp`

Add X-API-Key header check before any tool call — keys issued per consumer (TagMe, Takemehome, Butler)
Rate limit: 600 calls/min per key by default, lower for browser-origin
Log every tool call (consumer, tool, args, latency) to Loki for audit
Behind Traefik with TLS; container itself only listens on Docker network

8.2 `valhalla`

Same API-key gate
Disable /expansion, /trace_attributes, /height if not needed (smaller attack surface)
Cache aggressively (24h on routes — they don't change)

8.3 `maps_api`

Already designed for it (slowapi rate limiter present) — verify the limit is sensible (currently 60/min/IP)
API-key auth for /api/v2/* (the new BI-relevant proxies)
Disable /docs in production unless authenticated

8.4 `data_warehouse`

Move to internal-only port
Role separation: etl_user (write to raw.* only), dwh_user (read all, write staging/intermediate/marts/analytics), analytics_user (read analytics.* only), compliance_user (read everything + access to compliance.token_vault)
Connection limit per role; warn if any role hits 80% of limit
pg_stat_statements enabled for query analytics
Audit logging enabled (or at minimum log all SELECT * FROM intermediate.* access)

8.5 `keycloak`

Master realm: admin password rotated and stored in secret manager
Per-app realm (or client) for: BigBruh!, n8n, Superset, Grafana, twenty_crm, dataacuity_portal, ai_brain_webui
Backup the realm exports nightly

8.6 `superset`

Behind Keycloak SSO via OAuth/OIDC
Row-level security on the warehouse connections so analysts only see appropriate marts
Disable Public role's access to any dataset

8.7 `n8n`

After Phase E audit, behind Keycloak
Encrypt the sqlite at rest (filesystem-level encryption on the volume)
Webhooks (the public attack surface) get their own API-key validation

9. Trusted access patterns (for ops + developers)

After hardening, how does a developer / ops engineer access .106 services?

Need	Path
Web UI access (Superset, Grafana, n8n, dataacuity_portal)	https://.dataacuity.co.za → Traefik → Keycloak SSO → service
API call from TGN app	https://maps.dataacuity.co.za/api/v2/... with `X-API-Key` header → Traefik → maps_api → backend
Direct Postgres access (DBA, analyst)	SSH tunnel: `ssh -L 5001:data_warehouse:5001 geektrading@.106` → connect locally to `localhost:5001`
Direct container shell (debugging)	SSH to `.106` → `docker exec -it <container> bash` (requires being in the `docker` group, which is locked to named users)
Read logs (Loki)	Grafana UI → Loki data source → LogQL queries. No direct Loki port access
Prometheus metrics	Grafana UI → Prometheus data source. No direct port access
Backup restore (DR drill)	Pull backup from off-server location → load into a fresh container per the documented runbook

10. Open questions — with findings and recommendations

Repo-wide search done 2026-05-28. For each question, what exists today and the recommendation:

10.1 Compliance reviewer / DPO — NOT ASSIGNED

Finding: No named DPO or compliance reviewer anywhere in the repo. Compliance rules are documented (.claude-memory/banking-compliance-rules.md, AppInfo/TrustSeal/TRUSTSEAL_IMPLEMENTATION_PLAN.md) but no person owns sign-off.

Why it matters: Anonymisation Standard §11 requires a named reviewer to sign off on every intermediate.* model before it ships. Without this role, the BI pipeline cannot legally start carrying production PII.

Recommendation: This is a hiring/appointment decision, not a technical one. Three options:

(a) Assign internally — likely the most senior backend/data lead with compliance training; lowest cost, real ongoing time commitment (~2 h / week)
(b) Contract external counsel — POPIA-specialist law firm in SA (Webber Wentzel, ENS, Bowmans all have practices). Higher cost, lower internal burden, more credible to regulators
(c) Hire dedicated DPO — only justifiable at scale; GDPR mandates this once you process EU PII at meaningful volume

Action: Pick (a) or (b) before BI Pipeline Phase 2 starts.

10.2 Incident response process — DOES NOT EXIST

Finding: No IR runbooks, no on-call rotation, no escalation chain, no PagerDuty / Opsgenie / similar. The only related artifacts are .claude-memory/security-audit-critical.md (which documents past P0 findings but no response procedure) and the "ONE connection attempt, then ask" rule in CLAUDE.md (a safeguard, not an IR plan).

Why it matters: POPIA Sec 22 mandates 72-h breach notification. GDPR Art 33 same. We can't comply without a defined process.

Recommendation: Build it. Doesn't need to be elaborate to start:

Pager: PagerDuty free tier (5 users free) OR Opsgenie free tier (5 users free) OR self-hosted (KumaHQ exists as part of Uptime Kuma which we could deploy on .106 cheaply)
Runbook: One markdown file covering: detection paths, severity classes, escalation chain (named humans with phone numbers), breach-notification template, post-incident review template
Rotation: Even a 1-person "always on-call" is better than nothing; expand to 2-3 once team grows
Tabletop exercise: Quarterly — pick a scenario, walk through the runbook, find gaps

Action: Build during Phase D. Estimate 1 week to ship v1.

10.3 Cloudflare in front of Traefik — PARTIAL USAGE TODAY

Finding: Cloudflare R2 (object storage) is the only Cloudflare service in use (AppInfo/Infrastructure/R2_QUICK_REFERENCE.md). No DNS, no CDN, no WAF.

Why it matters: DDoS mitigation is hard without a CDN. Traefik + fail2ban can handle slow/medium attacks but not real volumetric ones. WAF rules block common attack patterns (SQLi, XSS, path traversal) before they reach our services.

Recommendation: Yes, add Cloudflare in front of public TGN endpoints. Specifics:

Free tier covers most needs: DNS, basic DDoS (Layer 3/4), free SSL, basic WAF rules
Pro tier ($25/mo per zone) adds: image optimisation, advanced rate-limiting, WAF managed rules
Business tier ($200/mo per zone) adds: 100% uptime SLA, advanced WAF, bypass-cache rules
Most pragmatic: Free tier for most domains, Pro for maps.dataacuity.co.za once it carries paid traffic
TLS termination: Cloudflare → re-encrypts to Traefik (full strict). Traefik's ACME stays for internal mutual TLS

Action: Onboard .106 services to Cloudflare during Phase B (Traefik wiring). Don't try to do it before Traefik is wired — Cloudflare-in-front of raw exposed ports is worse than current state.

10.4 Secret manager choice — NOTHING CENTRALISED TODAY

Finding: Secrets live in Deployment/deployment-credentials.ps1 (plaintext, committed-but-meant-to-not-be) and GitHub Actions secrets (~50 of them). No Vault, no AWS Secrets Manager, no Azure Key Vault, no Doppler.

Why it matters: The credentials file is in the repo (committed) — even with .gitignore warning, you can't unship that horse. Rotation is manual. Audit trail is git log only. This is a serious gap when banking compliance is in scope.

Recommendation: Pick one and consolidate. My ranking for this situation:

HashiCorp Vault (self-hosted on .118) — best feature set, full audit, transit encryption, dynamic credentials. Cost: ops time to run it. Steep learning curve.
Doppler (SaaS, free tier for small teams) — fastest to adopt, good DX, native Docker / CLI integration. Cost: $0–10/user/mo. Outsources your secrets to a third party.
AWS Secrets Manager / Azure Key Vault — only if you already use that cloud for other things; otherwise adds operational surface
Bitwarden Secrets Manager — newer, free self-hosted Vaultwarden + paid SM tier. Worth watching but young

For TGN's current scale and SA jurisdiction, my pick is Vault self-hosted on .118 — keeps secrets in-country, full control, no SaaS lock-in. Doppler is the second-best if simplicity matters more.

Action: Decide + start migration during Phase E (n8n audit + secret rotation are paired in the plan).

10.5 SOC2 readiness — NOT IN ACTIVE PROGRAM

Finding: Listed as a Phase 6 future item in AppInfo/TrustSeal/TRUSTSEAL_IMPLEMENTATION_PLAN.md with $25K budgeted. No active controls inventory, no auditor engagement, no timeline.

Why it matters: SOC2 is enterprise-customer table-stakes if TGN wants to sell DataAcuity / GeoGlobal / BI services to large companies. POPIA + GDPR alone are sufficient for B2C operations but limit B2B sales.

Recommendation: Don't pursue SOC2 now. Reasons:

The hardening work in this doc (Phases A-G) addresses ~70% of SOC2 Type II controls organically — defer formal audit until those land
SOC2 audit takes 6-12 months and ~$25K. Should be timed for when a specific big customer needs it
Trying to "build for SOC2" prematurely tends to over-engineer for hypothetical needs

If/when a deal demands it: revisit. Until then, the work we're doing aligns with future SOC2 readiness without paying the audit tax.

10.6 24/7 on-call — DOES NOT EXIST

Finding: No rotation schedule, no paging integration, no shift docs. De-facto policy is "best effort during business hours."

Why it matters: If the BI pipeline goes down at 2am and the morning's analytics are stale, that's mildly bad. If geo_db is breached at 2am and we don't notice until 9am, the POPIA 72-h notification clock has already burned 7 hours.

Recommendation: Tier it. Full 24/7 with paid shifts is overkill for current scale. But:

Critical alerts only at night — page only on: confirmed PII breach, all-services-down, payment gateway outage. Everything else waits for morning.
Single-person rotation with PagerDuty/Opsgenie scheduling
Define what's "critical" so the page only fires for things that truly can't wait
Document a 30-min response SLA for critical pages; everything else is best-effort
Quarterly review of pages fired — too many false positives, tune the rules; too few, expand the criteria

Action: Set up during Phase D, paired with the incident response work in §10.2.

Summary of recommended decisions

#	Decision needed	My recommendation	Block on which phase
10.1	Compliance reviewer	Assign internal (a) for v1, retain external counsel (b) for audit	BI Pipeline Phase 2
10.2	Incident response	Build it (PagerDuty/Opsgenie free + 1 markdown runbook)	Security Phase D
10.3	Cloudflare	Yes, free tier; Pro for revenue-bearing domains	Security Phase B
10.4	Secret manager	HashiCorp Vault self-hosted on .118	Security Phase E
10.5	SOC2	Defer; revisit when a customer requires it	none — not blocking
10.6	24/7 on-call	Tiered: critical-only at night, single-person rotation	Security Phase D

These need a thumbs-up before the corresponding phase ships.

11. Cross-references

DataAcuity_BI_Pipeline.md §6 — the anonymisation framework that this security posture supports
DataAcuity_Architecture_Overview.md §5 — the public exposure summary this doc expands on
GeoGlobal_Deployment.md §11 — service-specific hardening for geo_mcp/valhalla
Deployment/deployment-credentials.ps1 — the credentials file that's part of P2 #17
.claude-memory/banking-compliance-rules.md — the SARB/FICA/POPIA rules informing the compliance map in §5
.claude-memory/deploy-pattern-pgbouncer-cascade.md — connection discipline that informs Phase A and B

12. Inspection findings after Phase A — what changed (2026-05-28 PM)

Post-Phase-A inspection of the wider stack turned up a few items that the original audit missed. They affect the upcoming phases.

12.1 Traefik is not running

Compose file at /home/geektrading/suite/traefik/docker-compose.yml ✅ exists
ACME cert data at /home/geektrading/suite/traefik/acme/acme.json (140 KB) ✅ exists
Dynamic config at /home/geektrading/suite/traefik/config/{middlewares,services,tls}.yml ✅ exists with routes for dataacuity.co.za, traefik.dataacuity.co.za etc.
Container itself: not running. docker ps -a --filter name=traefik returns empty
Image: not present locally. docker images | grep traefik returns empty

Implication for Phase B: First task is docker pull traefik:v3.0 then docker compose -f /home/geektrading/suite/traefik/docker-compose.yml up -d. Once the cert refreshes (or proves valid from acme.json), then add the API routes.

12.2 DNS is wired correctly for Traefik

Verified dig +short from .106:

Hostname	Resolves to
`dataacuity.co.za`	`197.97.200.106` (direct A record)
`maps.dataacuity.co.za`	CNAME → `dataacuity.co.za` → `.106`
`auth.dataacuity.co.za`	CNAME → `dataacuity.co.za` → `.106`
`traefik.dataacuity.co.za`	CNAME → `dataacuity.co.za` → `.106`

ACME HTTP-01 challenge will work once Traefik is up.

12.3 Restic backups verified healthy

/home/geektrading/backups/restic-repo inspection:

358 snapshots total, daily cadence verified
Most recent: 2026-05-28 03:00 (this morning's backup)
restic check returns "no errors were found"
Backup script (/home/geektrading/backups/scripts/backup-databases.sh) targets only markets_db + data_warehouse — missing geo_db, maps_db, keycloak_db, superset_db, gateway-db, twenty_db, automatisch_db, bio_db
Snapshots are LOCAL to .106 — off-server replication is still P2 #14

Implication for Phase C: Less work than expected; the restic infrastructure is solid. Two real gaps:

Expand backup-databases.sh to cover all warehouse-affecting DBs (especially geo_db)
Set up off-server replication of restic-repo to .118 or to an external S3-compatible target

12.4 .105 → .106 PostgreSQL connectivity blocked

docker run --network data-warehouse_data_stack postgres:15-alpine psql -h 197.97.200.105 ... times out — no successful connection in 60 seconds.

Likely causes (probable order):

Windows Firewall on .105 blocks inbound TCP 5432 from .106's IP
Postgres on .105 (Windows IIS-hosted setup) is bound to localhost / specific IPs only
Network routing between Windows servers (.104/.105) and the Ubuntu DataAcuity server (.106) requires a specific path

Implication for BI Pipeline Phase 1: The extract framework cannot run until this is resolved. Needs a workstream on the .104/.105 side: DBA opens a firewall rule + pg_hba.conf entry for .106's IP, using the existing replicator user from Deployment/deployment-credentials.ps1.

12.5 dbt warehouse is empty scaffolding

Real row counts in data_warehouse.datawarehouse (verified 2026-05-28):

Schema	Tables	Real data?
`bronze`, `silver`, `gold` (medallion)	0 tables	Empty
`dbt_dev_marts`	2 tables, 3 rows each	Build-verification only
`dbt_dev_staging`	empty	—
`tgn`	13 monthly event partitions, all 0 rows except `tgn.events_2025_12` (4 rows)	Essentially empty
`public`	dbt metadata only	—

The dbt models listed in DataAcuity_BI_Pipeline.md §8.4 as "already running" — they exist as SQL but have never produced real output. The pipeline is greenfield from a data-flow perspective. BI Pipeline doc has been corrected.

12.6 Action items added to the hardening plan

Phase B: pull + start Traefik before adding routes (add to §6 Phase B steps)
Phase C: expand backup script to cover all DBs; off-server replication (add to §6 Phase C steps)
BI Phase 0: get DBA to open .105 firewall + pg_hba.conf for .106 (add as pre-req)
DataAcuity_BI_Pipeline.md: corrected to reflect empty-warehouse reality (done 2026-05-28)

13. Change log

Date	Change	By
2026-05-28 (am)	Initial document — audit findings + hardening plan	Tinashe Bhengu
2026-05-28 (pm)	Phase A executed; §10 open questions investigated with recommendations	Tinashe Bhengu
2026-05-28 (pm)	Added §12 inspection findings: Traefik state, restic health, .105 gap, warehouse reality	Tinashe Bhengu