GeoGlobal — Data Schema
The backing database is geo_db on .106 (PostgreSQL 16 + PostGIS 3.4). This document is for direct-DB consumers (BigBruh dashboards, analytics, dbt models, ad-hoc queries) and anyone who needs to know what's queryable beyond the API surface.
For most app integration you don't need this — use the REST/MCP API. Direct DB access bypasses caching and rate limits, and is intended for backend/analytical workloads only.
Connection
host: geo_db (in-cluster) | 197.97.200.106:5432 (external — NOT YET EXPOSED, use SSH tunnel)
database: geoglobal
user: geo
password: geoG10balInit2026
ssl: optional (in-cluster traffic is unencrypted but isolated)
To get a connection string for Deployment/deployment-credentials.ps1 style usage, look at the env var DB_DSN injected into geo_mcp.
Tables overview
| Schema | Table | Purpose | Row count (as of 2026-05-28) |
|---|---|---|---|
public |
geonames |
All populated places worldwide | 13,434,746 |
public |
interesting_locations |
Points of interest (tourism, historic, natural, etc.) | 3,507,164 |
public |
spatial_ref_sys |
PostGIS internal SRID lookup | 8500+ |
tiger.* |
(US TIGER street network) | NOT USED by GeoGlobal — shipped with PostGIS image | irrelevant |
topology.* |
(PostGIS topology) | NOT USED | irrelevant |
The tiger.* and topology.* schemas come bundled with the postgis Docker image and aren't populated. Ignore them.
public.geonames
The backbone of geocoding. Every populated place on Earth (cities, towns, villages, hamlets) plus a few other GeoNames feature classes.
Columns
| Column | Type | Notes |
|---|---|---|
id |
bigint PRIMARY KEY |
GeoNames ID (the integer ID from geonames.org) |
name |
text NOT NULL |
Primary name |
asciiname |
text |
ASCII-folded variant (no diacritics) |
alternatenames |
text |
Pipe-separated alternate names (e.g. "Cape Town|Kaapstad|iKapa") |
country |
varchar(2) |
ISO 3166-1 alpha-2 ("ZA", "NG", "KE", ...) |
fclass |
char(1) |
GeoNames feature class: P = populated place (we filter to P everywhere) |
fcode |
varchar(10) |
GeoNames feature code (PPL, PPLA, PPLC, PPLX, PPLG, etc.) |
population |
bigint |
May be 0 or NULL for very small places |
lat |
double precision NOT NULL |
WGS-84 |
lng |
double precision NOT NULL |
WGS-84 |
geom |
geometry(Point, 4326) |
Generated ST_SetSRID(ST_MakePoint(lng,lat),4326) |
timezone |
text |
IANA timezone (e.g. "Africa/Johannesburg") |
admin1_code |
varchar(20) |
First-order subdivision code |
admin2_code |
varchar(80) |
Second-order subdivision code |
Indexes
-- Spatial (PostGIS GIST)
CREATE INDEX geonames_geom_gix ON geonames USING GIST (geom);
-- Lower-name lookup (exact geocoding)
CREATE INDEX geonames_lower_name_idx ON geonames (lower(name));
-- Trigram (fuzzy search via pg_trgm)
CREATE INDEX geonames_name_trgm_idx ON geonames USING GIN (lower(name) gin_trgm_ops);
-- Population for ranking
CREATE INDEX geonames_population_idx ON geonames (population DESC NULLS LAST);
-- Country + class filter (used by all the API queries)
CREATE INDEX geonames_country_fclass_idx ON geonames (country, fclass);
Feature code reference (subset of GeoNames P codes)
| Code | Meaning |
|---|---|
PPLC |
Capital of a political entity |
PPLA |
Seat of first-order admin division |
PPLA2 … PPLA5 |
Seat of lower-order admin divisions |
PPL |
Populated place (generic) |
PPLG |
Seat of government |
PPLF |
Farm village |
PPLX |
Section of populated place |
PPLS |
Populated places (cluster) |
STLMT |
Israeli settlement (special case) |
Example queries
-- Exact geocode by name (used by /api/v2/geocode)
SELECT name, country, fclass, fcode, population, lat::float, lng::float
FROM geonames
WHERE lower(name) = lower('Cape Town') AND fclass = 'P'
ORDER BY population DESC NULLS LAST
LIMIT 5;
-- Reverse geocode (KNN over PostGIS) — what's near a coord?
SELECT name, country, fclass, population, lat::float, lng::float,
ST_Distance(geom::geography, ST_SetSRID(ST_MakePoint(18.4241, -33.9249), 4326)::geography)::int AS meters
FROM geonames
WHERE fclass = 'P'
ORDER BY geom <-> ST_SetSRID(ST_MakePoint(18.4241, -33.9249), 4326)
LIMIT 5;
-- Fuzzy search with trigram similarity
SELECT name, country, population, similarity(lower(name), lower('Johanesburg')) AS sim
FROM geonames
WHERE lower(name) % lower('Johanesburg') AND fclass = 'P'
ORDER BY sim DESC, population DESC NULLS LAST
LIMIT 5;
Data refresh
Source: https://download.geonames.org/export/dump/allCountries.zip
Run prefilter_pbf.sh and _load-geonames.sh on .106 to refresh. The full reload takes ~10 min once allCountries.zip is downloaded (~420 MB compressed).
public.interesting_locations
Points of interest. Sourced primarily from OSM tags (tourism=*, historic=*, natural=peak, amenity=place_of_worship, etc.) plus a smaller Wikipedia-geo-tagged set. Each POI has an interest_score between 0 and 1 derived from OSM popularity, Wikipedia presence, and external visitor data.
Columns
| Column | Type | Notes |
|---|---|---|
id |
bigserial PRIMARY KEY |
Internal ID |
osm_id |
bigint |
Source OSM element ID (NULL for non-OSM rows) |
osm_type |
char(1) |
n = node, w = way, r = relation |
name |
text NOT NULL |
Primary name |
name_local |
text |
Local-language variant |
description |
text |
Free-form description (when known) |
category |
text NOT NULL |
Top-level: tourism, historic, natural, amenity, religion, landmark, wikidata |
subcategory |
text |
More specific: museum, viewpoint, peak, place_of_worship, attraction, hotel, ... |
source |
text NOT NULL |
osm, wikipedia, manual, dataacuity |
country |
varchar(2) |
ISO 3166-1 alpha-2 |
lat, lng |
double precision NOT NULL |
WGS-84 |
geom |
geometry(Point, 4326) |
Generated |
interest_score |
real |
0.0 – 1.0 (NULL allowed for un-scored rows) |
tags |
text[] |
OSM/Wikipedia tags — {"tourism","museum","must-see"} |
wikidata_id |
text |
Q-number if known |
wikipedia_url |
text |
Full URL if known |
image_url |
text |
Optional thumbnail (Wikimedia Commons or other) |
last_updated |
timestamptz |
When this row was last refreshed |
Indexes
CREATE INDEX poi_geom_gix ON interesting_locations USING GIST (geom);
CREATE INDEX poi_country_idx ON interesting_locations (country);
CREATE INDEX poi_score_idx ON interesting_locations (interest_score DESC NULLS LAST);
CREATE INDEX poi_tags_gin ON interesting_locations USING GIN (tags);
CREATE INDEX poi_category_idx ON interesting_locations (category, subcategory);
Tag distribution (top 20)
As of 2026-05-28:
| Tag | Rows |
|---|---|
tourism |
1,596,567 |
natural |
731,797 |
peak |
606,832 |
historic |
585,228 |
information |
543,537 |
amenity |
508,727 |
place_of_worship |
445,433 |
memorial |
272,383 |
wikidata |
269,633 |
hotel |
225,995 |
artwork |
169,726 |
attraction |
143,943 |
guest_house |
115,727 |
wikipedia |
88,966 |
viewpoint |
83,470 |
spring |
72,594 |
camp_site |
57,465 |
museum |
49,384 |
apartment |
45,274 |
archaeological_site |
45,238 |
Example queries
-- Top-5 most interesting POIs within 20 km of Cape Town
SELECT name, category, subcategory, interest_score,
ST_Distance(geom::geography,
ST_SetSRID(ST_MakePoint(18.4241,-33.9249), 4326)::geography)::int AS meters
FROM interesting_locations
WHERE ST_DWithin(geom::geography,
ST_SetSRID(ST_MakePoint(18.4241,-33.9249), 4326)::geography,
20000)
ORDER BY interest_score DESC NULLS LAST, geom <-> ST_SetSRID(ST_MakePoint(18.4241,-33.9249), 4326)
LIMIT 5;
-- All POIs tagged "history" within a bounding box (analytics)
SELECT country, COUNT(*)
FROM interesting_locations
WHERE 'history' = ANY(tags) AND ST_Within(geom, ST_MakeEnvelope(15, -35, 35, -22, 4326))
GROUP BY country
ORDER BY COUNT(*) DESC;
-- Top museums by interest score in Africa
SELECT name, country, interest_score, lat, lng
FROM interesting_locations
WHERE 'museum' = ANY(tags) AND country IN ('ZA','NG','KE','EG','MA','TN','GH','SN','TZ','UG')
ORDER BY interest_score DESC NULLS LAST
LIMIT 20;
Interest score derivation
The score is a weighted sum (recomputed nightly by the dbt model geo_interest_scores):
| Factor | Weight | Source |
|---|---|---|
OSM star-tag (tourism=attraction, wikidata=Q123) |
0.30 | OSM tags |
| Wikipedia article exists | 0.25 | Wikipedia geo-tag dump |
| Wikimedia Commons image | 0.15 | Wikimedia |
| Number of visitor reviews / mentions (heuristic) | 0.20 | external sources, optional |
| Manual curation override | 0.10 | manual_overrides table (not yet exposed) |
Rows from source='wikipedia' typically score 0.7+. Pure OSM rows score 0.4–0.85 depending on tag richness.
Data refresh
POIs are seeded by:
harvest_pois.sh— pulls OSM extracts per continent, runsosmium tags-filter, loads intointeresting_locationsviapoi_to_tsv.py→COPY- Wikipedia geo-tagged article import — not yet implemented (planned next phase)
Refresh cadence: monthly is plenty for OSM. Wikipedia geo-tag dumps come out monthly too.
Valhalla tile metadata (read-only)
Valhalla doesn't expose its tile metadata through SQL. If you need stats:
- Total tiles in build: 38,822
- Total graph nodes: 36,474,800
- Total directed edges: 97,307,658
- Total ways from OSM: 20,919,662 (~363 M nodes considered)
- Admin polygons: 996 (~52,500 ways)
- Turn restrictions: 23,405
- Ferry edges reclassified: 28,496 across 881 ferry endpoints
Inspect the tile manifest at /home/geektrading/valhalla/tiles/valhalla_tiles/tile_manifest.json on .106.
dbt models (analytics)
The dbt_transform container on .106 has a geo_marts/ package that's still being built out. When ready, it'll expose:
dim_geonames— denormalised places with admin1/admin2 names joined infct_poi_density— POI density per 10 km grid cell (for heatmaps)dim_interest_scores_history— interest score over time (so the BigBruh dashboards can show drift)
Schema names will live in data_warehouse.public_geo_marts.*.
Backups
geo_dbis dumped nightly by thedataacuitybackup job to/home/geektrading/backups/geo_db/(kept for 7 days, weekly snapshot kept for 4 weeks).- Valhalla
tile.taris NOT backed up — it's rebuildable from the OSM PBF in ~3 hours. The build script is at/home/geektrading/build_africa_valhalla.sh. - The OSM PBF source itself is freely downloadable from openstreetmap.fr.
Future schema work
- POI canonicalisation table — collapse duplicate POIs across
osmandwikipediasources by Wikidata Q-ID geo_places_aliasestable — multi-language aliases for cities (currently inalternatenamestext blob)geo_traffic_signals— derived from Valhalla tiles, used for "is this intersection signalled?" lookups- Africa-only POI quality scores — currently the score is globally uniform; African POIs have less Wikipedia coverage so score lower unfairly