Skip to content
DA DataAcuity by The Geek Network

GeoGlobal — Deployment & Operations

Where the service lives, how it's wired, and how to keep it healthy.

For app consumers: you don't need this. See GeoGlobal_README.md and GeoGlobal_Integration_Guide.md. For ops, SRE, on-call: read this top to bottom.


Topology

All GeoGlobal containers run on .106 — DataAcuity Maps server (197.97.200.106, Ubuntu 24.04.2 LTS).

                                  ┌────────────────────────────────┐
                                  │  maps.dataacuity.co.za (public) │
                                  │  Traefik / nginx                │
                                  └──────────────┬─────────────────┘
                                                 │
                                                 ▼
                            ┌────────────────────────────────────────┐
                            │  maps_api  (FastAPI :5020)              │
                            │  - existing geocode/route endpoints     │
                            │  - NEW: /api/v2/* proxies to geo_mcp    │
                            │       and valhalla                      │
                            └────────────┬───────────────┬────────────┘
                                         │               │
        ┌────────────────────────────────┘               └──────────────────────────────────┐
        ▼                                                                                    ▼
┌──────────────────────────┐                                          ┌────────────────────────────────────┐
│  geo_mcp  (FastMCP SSE   │                                          │  valhalla  (gis-ops 3.5.1          │
│  :8000 internal / :5026  │                                          │   :8002 internal / :5027 external) │
│  external)               │                                          │  - 38,822 Africa tiles             │
│  - 6 MCP tools           │                                          │  - 8.3 GB tile.tar mounted from    │
│  - reads geo_db          │                                          │     /home/geektrading/valhalla/    │
│  - calls valhalla for    │                                          │     tiles                          │
│    route / quest         │                                          └────────────────────────────────────┘
└────────────┬─────────────┘
             │
             ▼
┌──────────────────────────┐
│  geo_db  (PostGIS 16     │
│  :5432 internal)         │
│  - geonames    13.4 M    │
│  - interesting_locations │
│      3.5 M               │
└──────────────────────────┘

All four GeoGlobal containers (geo_db, geo_mcp, valhalla, and the proxy endpoints in maps_api) sit on the data-warehouse_data_stack Docker network. Container names resolve via Docker DNS — geo_mcp reaches valhalla by hostname.

Container inventory

Container Image Internal port Host port Network Restart policy
geo_db postgis/postgis:16-3.4-alpine 5432 (none — internal) data-warehouse_data_stack unless-stopped
geo_mcp geo-mcp:0.4 (locally built) 8000 5026 data-warehouse_data_stack unless-stopped
valhalla ghcr.io/gis-ops/docker-valhalla/valhalla:latest 8002 5027 data-warehouse_data_stack unless-stopped

Environment variables

geo_mcp

Var Default Notes
DB_DSN host=geo_db user=geo password=geoG10balInit2026 dbname=geoglobal Connection string to geo_db
VALHALLA_URL http://valhalla:8002 In-cluster URL of the routing container
MCP_TRANSPORT sse Either stdio (for CLI invocation) or sse (HTTP)

valhalla

Var Default Notes
serve_tiles True Must be True for the serve container
server_threads 2 Increase if CPU headroom (server has 4 cores)
use_tiles_ignore_pbf True Skip rebuilding; consume the prebuilt tile.tar
build_tar False Don't rewrite the tar on each restart
force_rebuild False Don't trigger a rebuild even if config changes

maps_api (the existing FastAPI proxy — /api/v2/* endpoints to be added)

Var Default Notes
GEO_MCP_URL http://geo_mcp:8000 NEW — for /api/v2/* proxies
VALHALLA_URL http://valhalla:8002 NEW — for /api/v2/route
DATABASE_URL (existing — maps_db) Unchanged
REDIS_URL (existing — maps_redis) Unchanged
ENVIRONMENT production
Rate limit 60/min/IP via slowapi Bump via env if needed

Volumes / persistence

Path on host Mount point Contents Backup
/home/geektrading/valhalla/tiles/ /custom_files on valhalla valhalla_tiles.tar (8.3 GB) + admin_data/ (99 MB) + valhalla.json (7.5 KB) NO — rebuildable in ~3 h
/home/geektrading/geo-mcp/ /app on geo_mcp (via docker cp) server.py (v0.4), Dockerfile, helper scripts Source is in this repo; backups in git
Postgres volume geo_db_data /var/lib/postgresql/data on geo_db 8.4 GB DB YES — nightly via dataacuity_backup_job

Health checks

Quick green-light checklist (30 seconds)

# From .106 host
docker ps --filter name='^(geo_db|geo_mcp|valhalla)$' --format 'table {{.Names}}\t{{.Status}}'
# All three should be "Up <duration>"

# geo_db responds
docker exec geo_db psql -U geo -d geoglobal -c "SELECT COUNT(*) FROM geonames;"
# expected: 13434746 (or close)

# geo_mcp serves SSE
curl -sS -o /dev/null -w "%{http_code}\n" -H "Accept: text/event-stream" http://localhost:5026/sse
# expected: 200 (curl may hang — that's correct for SSE)

# valhalla answers /status
curl -sS http://localhost:5027/status | head -1
# expected: {"version":"3.5.1", ...}

# end-to-end Africa route
curl -sS -X POST http://localhost:5027/route -H 'Content-Type: application/json' \
  -d '{"locations":[{"lat":-33.9249,"lon":18.4241},{"lat":-26.2041,"lon":28.0473}],"costing":"auto","units":"km"}' \
  | python3 -c "import sys,json; d=json.load(sys.stdin); print('OK', d['trip']['summary']['length'], 'km')"
# expected: OK 1399.x km

Continuous monitoring (Grafana)

The prometheus + grafana stack on .106 already collects:

  • Container up/down state via cadvisor
  • HTTP response codes via nginx-exporter
  • Postgres connection counts via postgres-exporter-warehouse / -markets
  • Disk usage via node-exporter

TODO: add a Grafana dashboard for geo_mcp and valhalla specifically — request rates, p50/p95/p99 latencies, error rates. Currently we just have raw container metrics.

Restart procedures

geo_db — never restart casually

geo_db holds 13.4 M + 3.5 M rows. Recovery from an unclean shutdown takes ~5 s but it crash-loops if disk is full (we hit this on 2026-05-28). Always check df -h /home first.

docker restart geo_db
# Wait ~10 s
docker logs --tail 5 geo_db   # look for "database system is ready to accept connections"

geo_mcp — fast restart, safe anytime

docker restart geo_mcp
# ~5 s. Will reconnect to geo_db on first query.

To deploy a new server.py:

# 1. Upload new file to /home/geektrading/geo-mcp/server.py
# 2. Hot-swap into the container (don't need to rebuild the image)
docker cp /home/geektrading/geo-mcp/server.py geo_mcp:/app/server.py
docker restart geo_mcp
docker logs --tail 10 geo_mcp   # confirm "v0.X starting" line

valhalla — careful, slow to load

The serve container takes ~30 s to mmap the 8.3 GB tile.tar on startup. During that window /status returns 503/connection-reset.

docker restart valhalla
# Wait 45 s, then:
curl -sS http://localhost:5027/status

Don't restart valhalla from cron or any unattended automation — coordinate with humans.

Rebuilding Africa routing tiles

The full rebuild script is at /home/geektrading/build_africa_valhalla.sh on .106.

Phases (with timings from the 2026-05-28 build):

Phase Duration Disk delta
0. Stop serve container <1 s 0
1. Clear previous tile dir ~5 s +58 GB freed
2. Download africa-latest.osm.pbf (8.4 GB) ~15 min @ 10 MB/s -8.4 GB
3. osmium tags-filter prefilter ~1 min unchanged
4. valhalla_build_admins ~27 min +99 MB (admin.sqlite)
5. Parse ways → relations → nodes ~1 h 15 min grows then shrinks
6. Build graph (33,162 tiles) ~1 h 25 min +40 GB
7. Enhance graph (1 h 40 min on 2 threads) ~1 h 40 min unchanged
8. Validate / clean / tar ~5 min tile dir 40 GB → 8.4 GB
9. Launch serve container ~30 s 0
10. Smoke tests (6 city pairs) ~30 s 0

Total wall-clock: ~4 h 50 min from script start to live serve container.

Run it as geektrading user, not root. Output goes to /tmp/build_africa_valhalla.log.

Critical: delete the prefilter PBF immediately after the parse phase completes. The script does this in Phase 5, but if you need to free disk during a build, rm /home/geektrading/valhalla/tiles/africa-latest-filt.osm.pbf is safe once "Parsing nodes..." has finished.

Capacity & limits

Server hardware (.106):

  • 4 CPU cores, 23 GB RAM, 4 GB swap, 352 GB disk
  • Africa Valhalla build needs ~50 GB peak disk and fits in RAM comfortably
  • World Valhalla build was attempted 6 times and failed every time (RAM and disk both short). Don't try a world build on this box.

Current load (2026-05-28 idle): 38 GB / 352 GB disk used. 8 GB / 23 GB RAM used.

Burst capacity for routing: ~50 concurrent route calls before the 2-thread server saturates. For more, scale server_threads and add CPU.

Common problems and fixes

"geo_db crash loops with No space left on device"

The cause is always / filling up. Check docker system df, du -sh /home/geektrading/*, docker image prune -a -f, and delete stale tile builds in /home/geektrading/valhalla/tiles/*.bin. Then restart geo_db. Recovery takes a few seconds once disk is freed.

"Valhalla returns No suitable edges near location"

Two causes:

  1. Outside Africa. Expected. Document this in the calling app's UI.
  2. Coordinate is on a non-routable feature (lake, sea, private road). Snap to nearest road manually or call Valhalla's /locate endpoint to find the nearest routable edge.

"geo_mcp returns {"detail":"Not Found"}"

You're hitting the wrong path. MCP traffic is SSE on /sse, not /mcp or /api. Or you're hitting geo_mcp with REST and confusing it with maps_api — the REST proxy lives in maps_api.

"Docker registry credential errors when building"

.106 Docker has a broken credential helper. Workarounds:

  • Don't pull new images on .106 — build locally and docker save/docker load
  • For hot-fixes use docker cp into a running container instead of rebuilding

"discover_quest returns POIs but routing_error is set"

Means POIs were found but Valhalla's /optimized_route failed. Usually because the chosen POIs span outside Africa. Retry with a smaller within_km or a more selective theme.

Disaster recovery

If .106 dies completely:

  1. Spin up a new Ubuntu 24.04 box with at least 4 vCPU / 24 GB RAM / 100 GB disk (200 GB recommended for headroom)
  2. Install Docker + docker-compose
  3. Restore geo_db from the latest nightly dump at /home/geektrading/backups/geo_db/. The dump is ~2 GB compressed.
  4. Re-clone /home/geektrading/geo-mcp/ from this repo (the source is mirrored in code/Apps/Tools/geo-mcp/)
  5. Run build_africa_valhalla.sh to rebuild Valhalla tiles (~5 h)
  6. Re-attach DNS for maps.dataacuity.co.za

Total RTO if disk image survived: ~30 min. If we need to rebuild Valhalla from scratch: ~5 h.

  • Build script: /home/geektrading/build_africa_valhalla.sh (on .106)
  • MCP source: /home/geektrading/geo-mcp/server.py (on .106) — mirrored in this repo
  • Postgres data: /var/lib/docker/volumes/geo_db_data/ (on .106)
  • Tile.tar: /home/geektrading/valhalla/tiles/valhalla_tiles.tar (on .106)
  • Last build log: /tmp/build_africa_valhalla.log (on .106, ephemeral — copy elsewhere if you need to retain)

Future ops work

  1. Add Grafana dashboards for GeoGlobal-specific request rates, latencies, error rates
  2. Set up alerts on geo_db connection count > 80, valhalla 5xx rate > 1%, disk free < 20 GB
  3. Build the /api/v2/* proxy endpoints in maps_api (not yet implemented — see GeoGlobal_API_Reference.md for the spec)
  4. Wire geo_mcp into Butler as a registered MCP source (config snippet in GeoGlobal_Integration_Guide.md section 2.7)
  5. Schedule monthly OSM-Africa rebuild so routing stays current with road changes
Something went wrong on this page. Reload