Skip to content
DA DataAcuity by The Geek Network

GeoGlobal — Data Schema

The backing database is geo_db on .106 (PostgreSQL 16 + PostGIS 3.4). This document is for direct-DB consumers (BigBruh dashboards, analytics, dbt models, ad-hoc queries) and anyone who needs to know what's queryable beyond the API surface.

For most app integration you don't need this — use the REST/MCP API. Direct DB access bypasses caching and rate limits, and is intended for backend/analytical workloads only.

Connection

host:      geo_db    (in-cluster)  | 197.97.200.106:5432 (external — NOT YET EXPOSED, use SSH tunnel)
database:  geoglobal
user:      geo
password:  geoG10balInit2026
ssl:       optional (in-cluster traffic is unencrypted but isolated)

To get a connection string for Deployment/deployment-credentials.ps1 style usage, look at the env var DB_DSN injected into geo_mcp.

Tables overview

Schema Table Purpose Row count (as of 2026-05-28)
public geonames All populated places worldwide 13,434,746
public interesting_locations Points of interest (tourism, historic, natural, etc.) 3,507,164
public spatial_ref_sys PostGIS internal SRID lookup 8500+
tiger.* (US TIGER street network) NOT USED by GeoGlobal — shipped with PostGIS image irrelevant
topology.* (PostGIS topology) NOT USED irrelevant

The tiger.* and topology.* schemas come bundled with the postgis Docker image and aren't populated. Ignore them.


public.geonames

The backbone of geocoding. Every populated place on Earth (cities, towns, villages, hamlets) plus a few other GeoNames feature classes.

Columns

Column Type Notes
id bigint PRIMARY KEY GeoNames ID (the integer ID from geonames.org)
name text NOT NULL Primary name
asciiname text ASCII-folded variant (no diacritics)
alternatenames text Pipe-separated alternate names (e.g. "Cape Town|Kaapstad|iKapa")
country varchar(2) ISO 3166-1 alpha-2 ("ZA", "NG", "KE", ...)
fclass char(1) GeoNames feature class: P = populated place (we filter to P everywhere)
fcode varchar(10) GeoNames feature code (PPL, PPLA, PPLC, PPLX, PPLG, etc.)
population bigint May be 0 or NULL for very small places
lat double precision NOT NULL WGS-84
lng double precision NOT NULL WGS-84
geom geometry(Point, 4326) Generated ST_SetSRID(ST_MakePoint(lng,lat),4326)
timezone text IANA timezone (e.g. "Africa/Johannesburg")
admin1_code varchar(20) First-order subdivision code
admin2_code varchar(80) Second-order subdivision code

Indexes

-- Spatial (PostGIS GIST)
CREATE INDEX geonames_geom_gix ON geonames USING GIST (geom);

-- Lower-name lookup (exact geocoding)
CREATE INDEX geonames_lower_name_idx ON geonames (lower(name));

-- Trigram (fuzzy search via pg_trgm)
CREATE INDEX geonames_name_trgm_idx ON geonames USING GIN (lower(name) gin_trgm_ops);

-- Population for ranking
CREATE INDEX geonames_population_idx ON geonames (population DESC NULLS LAST);

-- Country + class filter (used by all the API queries)
CREATE INDEX geonames_country_fclass_idx ON geonames (country, fclass);

Feature code reference (subset of GeoNames P codes)

Code Meaning
PPLC Capital of a political entity
PPLA Seat of first-order admin division
PPLA2PPLA5 Seat of lower-order admin divisions
PPL Populated place (generic)
PPLG Seat of government
PPLF Farm village
PPLX Section of populated place
PPLS Populated places (cluster)
STLMT Israeli settlement (special case)

Example queries

-- Exact geocode by name (used by /api/v2/geocode)
SELECT name, country, fclass, fcode, population, lat::float, lng::float
FROM geonames
WHERE lower(name) = lower('Cape Town') AND fclass = 'P'
ORDER BY population DESC NULLS LAST
LIMIT 5;

-- Reverse geocode (KNN over PostGIS) — what's near a coord?
SELECT name, country, fclass, population, lat::float, lng::float,
       ST_Distance(geom::geography, ST_SetSRID(ST_MakePoint(18.4241, -33.9249), 4326)::geography)::int AS meters
FROM geonames
WHERE fclass = 'P'
ORDER BY geom <-> ST_SetSRID(ST_MakePoint(18.4241, -33.9249), 4326)
LIMIT 5;

-- Fuzzy search with trigram similarity
SELECT name, country, population, similarity(lower(name), lower('Johanesburg')) AS sim
FROM geonames
WHERE lower(name) % lower('Johanesburg') AND fclass = 'P'
ORDER BY sim DESC, population DESC NULLS LAST
LIMIT 5;

Data refresh

Source: https://download.geonames.org/export/dump/allCountries.zip

Run prefilter_pbf.sh and _load-geonames.sh on .106 to refresh. The full reload takes ~10 min once allCountries.zip is downloaded (~420 MB compressed).


public.interesting_locations

Points of interest. Sourced primarily from OSM tags (tourism=*, historic=*, natural=peak, amenity=place_of_worship, etc.) plus a smaller Wikipedia-geo-tagged set. Each POI has an interest_score between 0 and 1 derived from OSM popularity, Wikipedia presence, and external visitor data.

Columns

Column Type Notes
id bigserial PRIMARY KEY Internal ID
osm_id bigint Source OSM element ID (NULL for non-OSM rows)
osm_type char(1) n = node, w = way, r = relation
name text NOT NULL Primary name
name_local text Local-language variant
description text Free-form description (when known)
category text NOT NULL Top-level: tourism, historic, natural, amenity, religion, landmark, wikidata
subcategory text More specific: museum, viewpoint, peak, place_of_worship, attraction, hotel, ...
source text NOT NULL osm, wikipedia, manual, dataacuity
country varchar(2) ISO 3166-1 alpha-2
lat, lng double precision NOT NULL WGS-84
geom geometry(Point, 4326) Generated
interest_score real 0.0 – 1.0 (NULL allowed for un-scored rows)
tags text[] OSM/Wikipedia tags — {"tourism","museum","must-see"}
wikidata_id text Q-number if known
wikipedia_url text Full URL if known
image_url text Optional thumbnail (Wikimedia Commons or other)
last_updated timestamptz When this row was last refreshed

Indexes

CREATE INDEX poi_geom_gix ON interesting_locations USING GIST (geom);
CREATE INDEX poi_country_idx ON interesting_locations (country);
CREATE INDEX poi_score_idx ON interesting_locations (interest_score DESC NULLS LAST);
CREATE INDEX poi_tags_gin ON interesting_locations USING GIN (tags);
CREATE INDEX poi_category_idx ON interesting_locations (category, subcategory);

Tag distribution (top 20)

As of 2026-05-28:

Tag Rows
tourism 1,596,567
natural 731,797
peak 606,832
historic 585,228
information 543,537
amenity 508,727
place_of_worship 445,433
memorial 272,383
wikidata 269,633
hotel 225,995
artwork 169,726
attraction 143,943
guest_house 115,727
wikipedia 88,966
viewpoint 83,470
spring 72,594
camp_site 57,465
museum 49,384
apartment 45,274
archaeological_site 45,238

Example queries

-- Top-5 most interesting POIs within 20 km of Cape Town
SELECT name, category, subcategory, interest_score,
       ST_Distance(geom::geography,
                   ST_SetSRID(ST_MakePoint(18.4241,-33.9249), 4326)::geography)::int AS meters
FROM interesting_locations
WHERE ST_DWithin(geom::geography,
                 ST_SetSRID(ST_MakePoint(18.4241,-33.9249), 4326)::geography,
                 20000)
ORDER BY interest_score DESC NULLS LAST, geom <-> ST_SetSRID(ST_MakePoint(18.4241,-33.9249), 4326)
LIMIT 5;

-- All POIs tagged "history" within a bounding box (analytics)
SELECT country, COUNT(*)
FROM interesting_locations
WHERE 'history' = ANY(tags) AND ST_Within(geom, ST_MakeEnvelope(15, -35, 35, -22, 4326))
GROUP BY country
ORDER BY COUNT(*) DESC;

-- Top museums by interest score in Africa
SELECT name, country, interest_score, lat, lng
FROM interesting_locations
WHERE 'museum' = ANY(tags) AND country IN ('ZA','NG','KE','EG','MA','TN','GH','SN','TZ','UG')
ORDER BY interest_score DESC NULLS LAST
LIMIT 20;

Interest score derivation

The score is a weighted sum (recomputed nightly by the dbt model geo_interest_scores):

Factor Weight Source
OSM star-tag (tourism=attraction, wikidata=Q123) 0.30 OSM tags
Wikipedia article exists 0.25 Wikipedia geo-tag dump
Wikimedia Commons image 0.15 Wikimedia
Number of visitor reviews / mentions (heuristic) 0.20 external sources, optional
Manual curation override 0.10 manual_overrides table (not yet exposed)

Rows from source='wikipedia' typically score 0.7+. Pure OSM rows score 0.4–0.85 depending on tag richness.

Data refresh

POIs are seeded by:

  1. harvest_pois.sh — pulls OSM extracts per continent, runs osmium tags-filter, loads into interesting_locations via poi_to_tsv.pyCOPY
  2. Wikipedia geo-tagged article import — not yet implemented (planned next phase)

Refresh cadence: monthly is plenty for OSM. Wikipedia geo-tag dumps come out monthly too.


Valhalla tile metadata (read-only)

Valhalla doesn't expose its tile metadata through SQL. If you need stats:

  • Total tiles in build: 38,822
  • Total graph nodes: 36,474,800
  • Total directed edges: 97,307,658
  • Total ways from OSM: 20,919,662 (~363 M nodes considered)
  • Admin polygons: 996 (~52,500 ways)
  • Turn restrictions: 23,405
  • Ferry edges reclassified: 28,496 across 881 ferry endpoints

Inspect the tile manifest at /home/geektrading/valhalla/tiles/valhalla_tiles/tile_manifest.json on .106.


dbt models (analytics)

The dbt_transform container on .106 has a geo_marts/ package that's still being built out. When ready, it'll expose:

  • dim_geonames — denormalised places with admin1/admin2 names joined in
  • fct_poi_density — POI density per 10 km grid cell (for heatmaps)
  • dim_interest_scores_history — interest score over time (so the BigBruh dashboards can show drift)

Schema names will live in data_warehouse.public_geo_marts.*.

Backups

  • geo_db is dumped nightly by the dataacuity backup job to /home/geektrading/backups/geo_db/ (kept for 7 days, weekly snapshot kept for 4 weeks).
  • Valhalla tile.tar is NOT backed up — it's rebuildable from the OSM PBF in ~3 hours. The build script is at /home/geektrading/build_africa_valhalla.sh.
  • The OSM PBF source itself is freely downloadable from openstreetmap.fr.

Future schema work

  1. POI canonicalisation table — collapse duplicate POIs across osm and wikipedia sources by Wikidata Q-ID
  2. geo_places_aliases table — multi-language aliases for cities (currently in alternatenames text blob)
  3. geo_traffic_signals — derived from Valhalla tiles, used for "is this intersection signalled?" lookups
  4. Africa-only POI quality scores — currently the score is globally uniform; African POIs have less Wikipedia coverage so score lower unfairly
Something went wrong on this page. Reload