Teaching Valhalla That Hiking Is Not Flat
A walking route across a city and a hiking route up a mountain should not share the same clock.
That sounds obvious until you ask a routing engine for an ETA. Most routing systems are good at distance, turn costs, access rules, and road classes. Hiking time is different. A two-kilometer trail can be easy, slow, dangerous, or all three depending on slope, descent, surface, visibility, and how the local trail network publishes reference times.
For Globus routing, this matters because we use Valhalla deeply. Valhalla already has the graph, the route shape, and elevation in the tiles. The question was whether we could use that information to make hiking ETAs less optimistic without breaking tile compatibility or inventing a regional formula that only works in one mountain range.
Over the last research pass, we built the corpus, tested the formulas, rejected several tempting models, used a deliberately expensive research hook to learn what worked, and then moved the result into production-shaped Valhalla code.
The problem
Flat walking speed is not a hiking model.
If the route is a paved path through a park, distance explains most of the time. If the route climbs 700 meters, distance alone becomes a bad predictor. If the route descends a steep alpine trail, even "downhill is faster" stops being true.
The failure mode is usually optimism. The route looks short on the map, so the ETA looks short in the app. Anyone who has followed a marked mountain route knows the posted time often tells a more honest story.
The goal was not to make a perfect hiking guide. The goal was narrower:
- use information Valhalla already has
- preserve old-tile and new-code compatibility
- avoid a formula that only works in Poland, Switzerland, or Italy
- keep the model explainable enough to debug
- measure against real reference hiking times, not intuition
The data we used
We started in Lesser Poland because many OSM hiking relations there include reference durations in both directions. Those are useful because the same trail can have different uphill and downhill times.
Then we added Switzerland. Swiss hiking time is especially interesting because the public signpost system has a long-running mathematical tradition. The exact coefficient table is not public, but the official guidance is clear: time depends on horizontal distance and slope, with special care for very steep sections.
Finally, we added an Italian Alps stress corpus. That gave us more short, steep, difficult routes and a separate check on whether a model tuned on Poland and Switzerland was learning hiking time or just learning local habits.
The final comparison used exact relation-shape elevation profiles where possible. Endpoint routing is useful for service testing, but for model validation it can add detours that are not part of the published hiking relation. Exact-shape validation is cleaner: compare the route geometry that owns the reference duration against the model prediction for that same geometry.
What did not survive
The first obvious idea was accumulated elevation gain and loss.
That gets you much further than distance alone. It knows that a route with 600 meters of ascent is not a normal walk. It also preserves direction: the uphill and downhill versions of the same trail can differ.
But gain/loss alone loses too much detail. It does not know whether the ascent is steady or broken into steep steps. It does not know that mild downhill can be fast while steep downhill can be slow. It also reacts badly when noisy elevation samples inflate tiny ups and downs.
We also tested difficulty tags such as sac_scale. The signal is real on some Polish high-alpine
segments. It also overfits easily. Some Swiss routes with high difficulty tags were already
predicted too slow by slope-based models, so adding a generic technical penalty made them worse.
We tested Tobler-style walking functions. The shape is useful: mild downhill is fastest, climbing is slower, and steep descent eventually becomes slow too. But the literal formula was too optimistic for our combined data. Tuned Tobler variants worked better, especially in Switzerland, but still did not become the best single answer across all corpora.
We tested Weber-like slope polynomials inspired by the Swiss signpost method. These performed very well, and one blended polynomial was the best aggregate MAE in our clean check set. The problem was production shape: high-order polynomial coefficients are hard to reason about and easier to destabilize outside the fitted range.
The useful lesson was not "use this exact formula." It was: hiking ETA wants a slope-speed curve.
The model we kept
The model we kept is a bounded slope curve.
During research we first ran it at path level. Valhalla collected the elevation profile along the routed path, smoothed slope over a 150-meter window, then assigned seconds per kilometer from a small piecewise curve:
slope band: -40% -20% -8% -5% +3% +8% +20% +40%
seconds/km: 2520 1270 1130 880 1020 1110 2330 4120
The shape is intentionally boring:
- mild downhill is fastest
- steep downhill slows down
- climbing slows down as grade increases
- extreme slopes are clamped instead of extrapolated
- the same formula runs in every region
This is less mathematically elegant than a fitted polynomial. It is easier to inspect, easier to bound, and easier to explain when a route is wrong.
That path-level version was useful for research because it saw the whole route. It was not the right production shape. Valhalla routes are made of directed edges, and route costing needs to be fast. So after the model looked good, we checked whether the same curve still behaved well when computed edge by edge, using 150-meter chunks inside each graph edge and a shorter final chunk when needed.
The numbers
After quality filtering, the clean check set had 250 directional route samples across Poland, Switzerland, and Italy.
The best aggregate model was the Weber/global blend:
clean combined check MAE: 7.532 minutes
clean combined check MAPE: 13.312%
bias: -1.694 minutes
The selected bounded slope curve was almost tied:
clean combined check MAE: 7.578 minutes
clean combined check MAPE: 13.135%
bias: -2.123 minutes
So we gave up 0.046 minutes of average absolute error, roughly three seconds per route, for a model with a clearer physical shape and slightly better percent error.
That tradeoff is worth it. Routing models live longer than benchmark scripts. A tiny metric win is not enough if the formula is harder to debug, harder to port, or easier to break with new data.
The remaining errors were instructive too. Quality-flagged rows had roughly 30 minutes MAE no
matter which model we used. Some were likely source-format issues. One Swiss relation had a bare
numeric duration 334; as minutes that means 5:34, but it may have been intended as compact 3:34.
Other failures had elevation mismatch between the route profile and source ascent/descent.
That is important: some outliers are not model problems. They are corpus problems, geometry problems, or missing machine-readable trail difficulty.
The production-shaped edge-local check used Poland and Switzerland exact edge traces. On the clean check group, edge-local 150-meter segments produced:
clean check MAE: 8.361 minutes
clean check MAPE: 11.053%
bias: -2.590 minutes
The path-level and edge-local versions were close enough to continue with the faster edge-local implementation. The edge-local version was slightly better on the combined check set we used for that comparison, and, more importantly, it can be precomputed in tiles.
Compatibility mattered
Compatibility mattered from the beginning.
The research hook did not require new tiles. It decoded elevation that was already present in existing Valhalla tiles and computed the path-level slope curve while building the route response. That let us iterate without rebuilding the planet.
The production implementation keeps the same compatibility idea, but moves the work to tile
generation. Valhalla already has an extended directed-edge block that can carry extra per-edge
attributes without changing the base DirectedEdge layout. We use that extension to store one
precomputed hiking duration per directed edge.
The storage is intentionally small:
- 16 bits for hiking seconds
- one validity bit
- remaining bits reserved for future extension
Two bytes gives us up to 65,535 seconds for a directed edge, more than 18 hours. That is enough for real graph edges. If tile generation ever computes more than that, it logs a warning and clamps the stored value so the edge can be investigated.
This gives us the compatibility property we wanted:
- old code can still read new tiles because the base edge layout is unchanged
- new code can still read old tiles because missing extension data falls back to the old pedestrian timing formula
- the new model is opt-in through pedestrian
type=hiking - normal foot, wheelchair, and blind pedestrian behavior stays unchanged
There is one subtle but important detail: edges are directed. The two directions of the same trail can have different hiking times, because uphill and downhill are not symmetric. The extension stores only one value per directed edge. Tile generation may compute forward and reverse values while processing a shared shape, but each directed edge receives only the value for its own orientation.
What this means for routing
The important result is not one magic coefficient table. It is a safer direction for hiking ETA:
- distance-only walking is too optimistic for mountain routes
- raw accumulated gain/loss helps but is not enough
- generic difficulty penalties overfit quickly
- slope-window models transfer better across countries
- quality flags are necessary before trusting outliers
- the production formula should be bounded and inspectable
For product users, the expected behavior is simple. Flat walks should stay close to normal walking time. Moderate downhill can be faster. Steep climbs and steep descents should stop looking like short city walks.
For Valhalla deployments, the operational point is also simple. We can improve hiking ETA using
existing elevation data, without a breaking tile-format migration. The production version computes
the hiking seconds during elevation building and reads them during pedestrian costing when the user
asks for type=hiking.
What comes next
The research path is finished. The production code is in place.
Hiking ETA will never be as clean as road speed. Trails carry local timing conventions, ambiguous geometry, incomplete difficulty tags, and human caution. But it can be much less naive than flat walking speed.
That is the kind of routing work we like at Globus: small enough to explain, measured against real routes, and compatible with the map data people already have.
If you are building routing, field service, travel planning, outdoor navigation, or offline maps, start with Globus. Register at user.globus.software, create an API key, and try the routing stack in your own product. If you want to evaluate Valhalla navigation tiles or discuss a custom routing deployment, write us at [email protected].
