Teaching Valhalla That Hiking Is Not Flat

June 19, 2026 · 10 min read

Evgen Bodunov

CEO @ Globus Software

A walking route across a city and a hiking route up a mountain should not share the same clock.

That sounds obvious until you ask a routing engine for an ETA. Most routing systems are good at distance, turn costs, access rules, and road classes. Hiking time is different. A two-kilometer trail can be easy, slow, dangerous, or all three depending on slope, descent, surface, visibility, and how the local trail network publishes reference times.

For Globus routing, this matters because we use Valhalla deeply. Valhalla already has the graph, the route shape, and elevation in the tiles. The question was whether we could use that information to make hiking ETAs less optimistic without breaking tile compatibility or inventing a regional formula that only works in one mountain range.

Over the last research pass, we built the corpus, tested the formulas, rejected several tempting models, used a deliberately expensive research hook to learn what worked, and then moved the result into production-shaped Valhalla code.

The problem

Flat walking speed is not a hiking model.

If the route is a paved path through a park, distance explains most of the time. If the route climbs 700 meters, distance alone becomes a bad predictor. If the route descends a steep alpine trail, even "downhill is faster" stops being true.

The failure mode is usually optimism. The route looks short on the map, so the ETA looks short in the app. Anyone who has followed a marked mountain route knows the posted time often tells a more honest story.

The goal was not to make a perfect hiking guide. The goal was narrower:

use information Valhalla already has
preserve old-tile and new-code compatibility
avoid a formula that only works in Poland, Switzerland, or Italy
keep the model explainable enough to debug
measure against real reference hiking times, not intuition

The data we used

We started in Lesser Poland because many OSM hiking relations there include reference durations in both directions. Those are useful because the same trail can have different uphill and downhill times.

Then we added Switzerland. Swiss hiking time is especially interesting because the public signpost system has a long-running mathematical tradition. The exact coefficient table is not public, but the official guidance is clear: time depends on horizontal distance and slope, with special care for very steep sections.

Finally, we added an Italian Alps stress corpus. That gave us more short, steep, difficult routes and a separate check on whether a model tuned on Poland and Switzerland was learning hiking time or just learning local habits.

The final comparison used exact relation-shape elevation profiles where possible. Endpoint routing is useful for service testing, but for model validation it can add detours that are not part of the published hiking relation. Exact-shape validation is cleaner: compare the route geometry that owns the reference duration against the model prediction for that same geometry.

What did not survive

The first obvious idea was accumulated elevation gain and loss.

That gets you much further than distance alone. It knows that a route with 600 meters of ascent is not a normal walk. It also preserves direction: the uphill and downhill versions of the same trail can differ.

But gain/loss alone loses too much detail. It does not know whether the ascent is steady or broken into steep steps. It does not know that mild downhill can be fast while steep downhill can be slow. It also reacts badly when noisy elevation samples inflate tiny ups and downs.

We also tested difficulty tags such as sac_scale. The signal is real on some Polish high-alpine segments. It also overfits easily. Some Swiss routes with high difficulty tags were already predicted too slow by slope-based models, so adding a generic technical penalty made them worse.

We tested Tobler-style walking functions. The shape is useful: mild downhill is fastest, climbing is slower, and steep descent eventually becomes slow too. But the literal formula was too optimistic for our combined data. Tuned Tobler variants worked better, especially in Switzerland, but still did not become the best single answer across all corpora.

We tested Weber-like slope polynomials inspired by the Swiss signpost method. These performed very well, and one blended polynomial was the best aggregate MAE in our clean check set. The problem was production shape: high-order polynomial coefficients are hard to reason about and easier to destabilize outside the fitted range.

The useful lesson was not "use this exact formula." It was: hiking ETA wants a slope-speed curve.

The model we kept

The model we kept is a bounded slope curve.

During research we first ran it at path level. Valhalla collected the elevation profile along the routed path, smoothed slope over a 150-meter window, then assigned seconds per kilometer from a small piecewise curve:

slope band:   -40%   -20%    -8%    -5%    +3%    +8%   +20%   +40%
seconds/km:   2520   1270   1130    880   1020   1110   2330   4120

The shape is intentionally boring:

mild downhill is fastest
steep downhill slows down
climbing slows down as grade increases
extreme slopes are clamped instead of extrapolated
the same formula runs in every region

This is less mathematically elegant than a fitted polynomial. It is easier to inspect, easier to bound, and easier to explain when a route is wrong.

That path-level version was useful for research because it saw the whole route. It was not the right production shape. Valhalla routes are made of directed edges, and route costing needs to be fast. So after the model looked good, we checked whether the same curve still behaved well when computed edge by edge, using 150-meter chunks inside each graph edge and a shorter final chunk when needed.

The numbers

After quality filtering, the clean check set had 250 directional route samples across Poland, Switzerland, and Italy.

The best aggregate model was the Weber/global blend:

clean combined check MAE:   7.532 minutes
clean combined check MAPE: 13.312%
bias:                      -1.694 minutes

The selected bounded slope curve was almost tied:

clean combined check MAE:   7.578 minutes
clean combined check MAPE: 13.135%
bias:                      -2.123 minutes

So we gave up 0.046 minutes of average absolute error, roughly three seconds per route, for a model with a clearer physical shape and slightly better percent error.

That tradeoff is worth it. Routing models live longer than benchmark scripts. A tiny metric win is not enough if the formula is harder to debug, harder to port, or easier to break with new data.

The remaining errors were instructive too. Quality-flagged rows had roughly 30 minutes MAE no matter which model we used. Some were likely source-format issues. One Swiss relation had a bare numeric duration 334; as minutes that means 5:34, but it may have been intended as compact 3:34. Other failures had elevation mismatch between the route profile and source ascent/descent.

That is important: some outliers are not model problems. They are corpus problems, geometry problems, or missing machine-readable trail difficulty.

The production-shaped edge-local check used Poland and Switzerland exact edge traces. On the clean check group, edge-local 150-meter segments produced:

clean check MAE:   8.361 minutes
clean check MAPE: 11.053%
bias:             -2.590 minutes

The path-level and edge-local versions were close enough to continue with the faster edge-local implementation. The edge-local version was slightly better on the combined check set we used for that comparison, and, more importantly, it can be precomputed in tiles.

Compatibility mattered

Compatibility mattered from the beginning.

The research hook did not require new tiles. It decoded elevation that was already present in existing Valhalla tiles and computed the path-level slope curve while building the route response. That let us iterate without rebuilding the planet.

The production implementation keeps the same compatibility idea, but moves the work to tile generation. Valhalla already has an extended directed-edge block that can carry extra per-edge attributes without changing the base DirectedEdge layout. We use that extension to store one precomputed hiking duration per directed edge.

The storage is intentionally small:

16 bits for hiking seconds
one validity bit
remaining bits reserved for future extension

Two bytes gives us up to 65,535 seconds for a directed edge, more than 18 hours. That is enough for real graph edges. If tile generation ever computes more than that, it logs a warning and clamps the stored value so the edge can be investigated.

This gives us the compatibility property we wanted:

old code can still read new tiles because the base edge layout is unchanged
new code can still read old tiles because missing extension data falls back to the old pedestrian timing formula
the new model is opt-in through pedestrian type=hiking
normal foot, wheelchair, and blind pedestrian behavior stays unchanged

There is one subtle but important detail: edges are directed. The two directions of the same trail can have different hiking times, because uphill and downhill are not symmetric. The extension stores only one value per directed edge. Tile generation may compute forward and reverse values while processing a shared shape, but each directed edge receives only the value for its own orientation.

What this means for routing

The important result is not one magic coefficient table. It is a safer direction for hiking ETA:

distance-only walking is too optimistic for mountain routes
raw accumulated gain/loss helps but is not enough
generic difficulty penalties overfit quickly
slope-window models transfer better across countries
quality flags are necessary before trusting outliers
the production formula should be bounded and inspectable

For product users, the expected behavior is simple. Flat walks should stay close to normal walking time. Moderate downhill can be faster. Steep climbs and steep descents should stop looking like short city walks.

For Valhalla deployments, the operational point is also simple. We can improve hiking ETA using existing elevation data, without a breaking tile-format migration. The production version computes the hiking seconds during elevation building and reads them during pedestrian costing when the user asks for type=hiking.

What comes next

The research path is finished. The production code is in place.

Hiking ETA will never be as clean as road speed. Trails carry local timing conventions, ambiguous geometry, incomplete difficulty tags, and human caution. But it can be much less naive than flat walking speed.

That is the kind of routing work we like at Globus: small enough to explain, measured against real routes, and compatible with the map data people already have.

If you are building routing, field service, travel planning, outdoor navigation, or offline maps, start with Globus. Register at user.globus.software, create an API key, and try the routing stack in your own product. If you want to evaluate Valhalla navigation tiles or discuss a custom routing deployment, write us at [email protected].

The problem​

The data we used​

What did not survive​

The model we kept​

The numbers​

Compatibility mattered​

What this means for routing​

What comes next​