know_my_way · Probability & Prediction — Interconnect WIM Reference

01 Prediction Identity 5 fields

probability_id

UUID required

Unique identifier for this prediction instance. Generated fresh on each model run — never reused. Downstream systems log this ID when acting on a prediction, creating an auditable chain from model output to operational decision without retaining any personal data.

Format: RFC 4122 UUID v4 — Single-use per prediction cycle

"probability_id": "g8h9i0j1-0000-4000-h000-000000000008"

prediction_type

enum required

What the model is predicting. Determines which probability fields are relevant and which downstream actions are triggered. A single prediction run may produce multiple objects of different types for the same location — for example, one congestion prediction and one wait_time prediction.

congestion wait_time closure delay queue_growth route_change occupancy

"prediction_type": "congestion"

prediction_target

UUID required

UUID of the WIM record (place, route segment, or corridor) that this prediction applies to. Links the forecast into the spatial graph so routing engines know which nodes and edges should be treated as at-risk during the prediction window.

"prediction_target": "c3d4e5f6-0000-4000-c000-000000000003"

prediction_window

integer required

How far ahead the prediction looks, in minutes. Routing engines use this to determine whether the forecast is relevant to the user's current journey. A user 20 minutes from the predicted location should receive rerouting; a user already passing through should not.

Unit: minutes — Min: 5 — Max: 1440 (24 h) — Typical: 15–60

"prediction_window": 30

prediction_model

enum optional

Family of model architecture used to generate this prediction. Informs consumers about the prediction's characteristics — for example, time-series models produce smoother confidence curves, while rule-based models produce hard thresholds. Used for model governance and auditability.

time_series rule_based ml_regression ml_classification ensemble hybrid

"prediction_model": "ensemble"

02 Model Configuration 10 fields

prediction_model_name

string optional

Human-readable name of the specific model instance that produced this prediction. Enables operations teams to identify which trained model version is active and cross-reference with model monitoring dashboards. Should match the model registry entry for full auditability.

Max: 128 chars — Convention: MODEL-NAME-vX.Y.Z

"prediction_model_name": "KMW-Congestion-Ensemble-v3.1.2"

prediction_model_purpose

string optional

Plain-language description of what the model was trained to predict and why. Required for GDPR Article 22 transparency obligations when the prediction triggers automated routing decisions that materially affect a user's path through the environment. Must be suitable for display to end users on request.

Max: 512 chars — Plain language — Required for GDPR Art. 22 transparency

"prediction_model_purpose": "Estimates corridor congestion 30 minutes ahead using aggregate footfall and historical patterns to enable proactive rerouting."

know_my_way_model_enabled

boolean required

Whether the Know My Way predictive engine is active for this record's site. When false, this object must not be generated and no predictions should be served. Acts as the master kill switch for the prediction layer — used during model incidents, data quality failures, or operator-requested disablement.

"know_my_way_model_enabled": true

know_my_way_context

string optional

Contextual label describing the operational scenario the model is currently calibrated for — for example, a special event, a seasonal pattern, or a modified timetable. Allows the model to apply scenario-specific weights without retraining. Set by operators via the WIM management interface.

Max: 128 chars — Examples: "weekday-standard", "event-open-day", "seasonal-summer"

"know_my_way_context": "weekday-standard"

know_my_way_route_memory

boolean optional

Whether the model is drawing on aggregate route memory — patterns derived from anonymised, aggregated historical journey data for this site — to improve prediction accuracy. When true, aggregate_input_only must also be true. Individual journey histories are never retained for model training.

"know_my_way_route_memory": true

know_my_way_preference_mode

enum optional

How the model incorporates user preference signals when generating predictions. aggregate_only means the model uses site-wide preference distributions; declared means only the user's explicitly stated preferences for this session; none means preferences are not factored in at all.

none declared aggregate_only

"know_my_way_preference_mode": "declared"

know_my_way_personalisation_level

enum optional

Degree to which this prediction has been personalised based on user context. none is a pure population-level forecast. session incorporates the current session's declared preferences. consented may incorporate richer context with explicit user consent. Drives GDPR processing classification.

none session consented

"know_my_way_personalisation_level": "session"

model_id

UUID optional

UUID of the trained model artifact in the operator's model registry. Used by model governance systems to trace every prediction back to the exact trained artifact that produced it, enabling full reproducibility audits and rapid rollback to a previous model version when needed.

"model_id": "h1i2j3k4-model-kmw-v3-1-2"

model_version

string optional

Semantic version of the model artifact. Combined with model_id, this uniquely identifies the exact model that produced the prediction — enabling operations teams to correlate prediction quality degradation with specific model versions in post-incident reviews.

Format: semver major.minor.patch

"model_version": "3.1.2"

training_data_window

integer optional

Number of days of historical data used to train or calibrate the current model version. Provides consumers with a signal of the model's temporal coverage — a model trained on 7 days of data is more sensitive to recent changes but less stable than one trained on 180 days.

Unit: days — Min: 1 — Typical: 30–180

"training_data_window": 90

03 Prediction Output 13 fields

confidence_score

float required

Model's confidence in the prediction, expressed as a float between 0.0 and 1.0. Computed from signal freshness, input completeness, and model calibration metrics. Routing engines should not act on predictions with confidence below 0.5; channels should suppress display below 0.4 and show a fallback message.

Range: 0.0–1.0 — Act threshold: ≥ 0.5 — Display threshold: ≥ 0.4

"confidence_score": 0.82

likelihood

float optional

Probability that the predicted event will occur within the prediction_window, expressed as a float between 0.0 and 1.0. Distinct from confidence_score — confidence measures how certain the model is of its estimate; likelihood is the estimate itself.

Range: 0.0 (will not occur) – 1.0 (certain to occur)

"likelihood": 0.74

risk_level

enum optional

Operational risk classification derived from the prediction's likelihood and potential impact. Used by channel renderers to select appropriate alert styling and by routing engines to apply graduated rerouting logic — critical triggers immediate rerouting; low may simply update estimated times.

critical high medium low negligible

"risk_level": "medium"

estimated_wait_time_minutes

number optional

Predicted wait time at the target location within the prediction window, in minutes. Surfaced directly in wayfinding channels as "Expected wait: X min." Used by routing engines as a cost penalty when computing alternative routes. A value of 0 indicates no expected wait.

Unit: minutes — Min: 0 — Precision: 1 decimal place

"estimated_wait_time_minutes": 8.5

estimated_travel_time_minutes

number optional

Predicted total travel time to reach the target, accounting for expected congestion along the recommended route during the prediction window. Combines static route distance with live and predicted congestion penalties. Displayed in mobile and kiosk channels as the journey time estimate.

Unit: minutes — Includes: congestion penalties from live and predicted sensor data

"estimated_travel_time_minutes": 12.0

estimated_arrival_time

datetime optional

Absolute UTC timestamp of the predicted arrival time at the target destination, computed from the current time plus estimated_travel_time_minutes. Displayed in time-sensitive contexts — for example, "You will arrive at 14:43" on a kiosk near a clinic entrance.

Format: ISO 8601 — Timezone: always UTC (Z)

"estimated_arrival_time": "2026-04-22T14:43:00Z"

probability_of_delay

float optional

Predicted probability that the journey to the target will experience a delay of more than the site-configured threshold (typically 5 minutes) within the prediction window. Routing engines use this to proactively surface alternative routes when the value exceeds 0.6.

Range: 0.0–1.0 — Reroute trigger: ≥ 0.6 (site-configurable)

"probability_of_delay": 0.31

probability_of_closure

float optional

Predicted probability that the target location or a critical route segment will be closed or inaccessible within the prediction window. When above 0.7, routing engines must treat the target as unavailable and compute alternatives. Physical signage systems receive an early warning to prepare updated content.

Range: 0.0–1.0 — Unavailable threshold: ≥ 0.7

"probability_of_closure": 0.04

probability_of_congestion

float optional

Predicted probability that the target zone's crowd density will exceed the site's congestion threshold (typically sensory.crowd_density ≥ 0.8) within the prediction window. The primary signal for proactive rerouting in high-footfall environments.

Range: 0.0–1.0 — Congestion defined by: site-level crowd_density threshold

"probability_of_congestion": 0.67

probability_of_queue_increase

float optional

Predicted probability that the queue at the target will grow longer within the prediction window. Used by channel systems to display early queue warnings and by routing engines to deprioritise destinations with growing queues when alternatives exist.

Range: 0.0–1.0

"probability_of_queue_increase": 0.55

probability_of_route_change

float optional

Predicted probability that the currently recommended route will become suboptimal or unavailable within the prediction window — for example, due to a planned maintenance window, a scheduled event closure, or predicted congestion on a key segment. Triggers proactive alternative route preparation in the routing engine.

Range: 0.0–1.0

"probability_of_route_change": 0.22

expected_occupancy

integer optional

Predicted absolute occupancy count at the target location at the peak of the prediction window. Used by building management systems to pre-position staff and by channel systems to contextualise predictions — "Expected 140 visitors in the next 30 minutes." Always derived from aggregate patterns, never individual tracking.

Unit: count — Min: 0 — Aggregate prediction, not individual tracking

"expected_occupancy": 140

expected_crowd_density

float optional

Predicted normalised crowd density at the target location at the peak of the prediction window, using the same 0.0–1.0 scale as sensory.crowd_density. Allows routing engines to apply the same threshold logic to predicted conditions as they do to live conditions.

Range: 0.0 (empty) – 1.0 (at or beyond capacity) — Same scale as sensory.crowd_density

"expected_crowd_density": 0.78

04 Input Signals 3 fields

input_signals

string[] optional

Array of named input signal types that contributed to this prediction. Documents the model's data diet for governance and debugging. Listing signal types (not values) is sufficient — no raw data is included here. If the list is empty, the prediction was generated without live signals and relied solely on historical patterns.

Examples: "occupancy_count", "people_flow", "noise_level_db", "historical_pattern", "event_schedule"

"input_signals": ["occupancy_count", "people_flow", "historical_pattern"]

anonymous_input_signals

boolean required

Whether all input signals fed to the model for this prediction were anonymised before processing. Must be true for all Know My Way predictions — the model is architecturally prohibited from ingesting non-anonymised data. If a signal cannot be anonymised, it must be excluded from the model input entirely.

"anonymous_input_signals": true

aggregate_input_only

boolean required

Whether this prediction was generated using only aggregate-level signals — no session-level or per-request data was used as model input. When true, the prediction is a pure population-level forecast and falls entirely outside GDPR's personal data processing regime. The most privacy-safe configuration and the default for all Know My Way deployments.

"aggregate_input_only": true

05 Privacy & GDPR 9 fields

personal_data_detected

boolean optional

Whether any input signal processed during prediction generation was determined to contain or derive from personal data. If true, the full GDPR processing regime applies, gdpr_status must be set to review_needed, and the prediction must not be served until reviewed.

"personal_data_detected": false

profiling_disabled

boolean required

Whether individual-level profiling is disabled for this prediction run, implementing GDPR Article 22 obligations. When true, the model produces only population-level forecasts and cannot be used to infer behaviour patterns about individuals. Must be true whenever aggregate_input_only is true.

Standard: GDPR Art. 22 — Right to object to automated profiling

"profiling_disabled": true

privacy_mode

enum required

Privacy processing mode applied when generating this prediction. aggregate_only is the standard mode for Know My Way — no session data enters the prediction pipeline. anonymised allows session-level signals after anonymisation. consented requires an explicit consent grant.

aggregate_only anonymised consented

"privacy_mode": "aggregate_only"

anonymous_processing_verified

boolean optional

Whether it has been verified that the full prediction pipeline — from signal ingestion through model inference to output publication — processes only anonymised or aggregate data with no possibility of re-identification. When true, the operator's DPO has signed off on the architecture.

"anonymous_processing_verified": true

gdpr_status

enum optional

Overall GDPR compliance status of this prediction object. Computed automatically from the privacy field values. Predictions with non_compliant status must not be served to any downstream system.

compliant review_needed non_compliant not_applicable

"gdpr_status": "compliant"

gdpr_reviewed

boolean optional

Whether the Know My Way pipeline configuration has been reviewed by the operator's Data Protection Officer. A configuration-level flag inherited per prediction run. Required for any deployment where the model ingests signals beyond simple aggregate counts.

"gdpr_reviewed": true

privacy_by_design_verified

boolean optional

Whether the prediction architecture has been verified to implement Privacy by Design per GDPR Article 25 — specifically: anonymisation before model input, no retention of prediction-linked session data, and aggregate-only output. Must be verified by the DPO before the Know My Way engine is activated for a site.

"privacy_by_design_verified": true

data_retention_policy

integer optional

Maximum duration in seconds that this prediction object and its associated metadata may be retained by any consuming system. After expiry all associated data must be deleted. Typically shorter than prediction_expires_at — the prediction may expire as a routing input before the retention window closes.

Unit: seconds — Typical: 1800 (30 min) for aggregate predictions

"data_retention_policy": 1800

session_expiry

integer optional

Duration in seconds after which any session-scoped context used as model input is invalidated and discarded. Enforces the principle of storage limitation — even if the prediction itself is still valid, the session signals that shaped it cannot persist beyond this window.

Unit: seconds — Max: equal to data_retention_policy

"session_expiry": 600

06 Consent 2 fields

consent_required

boolean optional

Whether explicit user consent is required before this prediction can be generated or served. For fully aggregate predictions (aggregate_input_only: true), this is typically false — population-level forecasts do not require individual consent under GDPR. Required only when session-level or consented-mode signals are used.

"consent_required": false

consent_status

enum optional

Current consent state for the data processing activities underlying this prediction. Aligns with the W3C Data Privacy Vocabulary. For aggregate-only predictions, this is typically not_required. Any prediction where this is refused or withdrawn must be discarded and all associated data deleted immediately.

not_required pending granted refused withdrawn

"consent_status": "not_required"

07 Lifecycle & Audit 3 fields

prediction_created_at

datetime required

UTC timestamp of when this prediction was generated by the model. Used to assess signal age — consumers must evaluate whether the prediction is still fresh relative to the rate of change of the predicted phenomenon. A congestion prediction older than its prediction_window must be treated as expired regardless of prediction_expires_at.

Format: ISO 8601 — Timezone: always UTC (Z)

"prediction_created_at": "2026-04-22T14:31:00Z"

prediction_expires_at

datetime required

UTC timestamp after which this prediction must not be used by routing engines or displayed by channels. Typically set to prediction_created_at plus the prediction_window in minutes. Consuming systems must check this field before every use and silently discard expired predictions.

Format: ISO 8601 — Typically: created_at + prediction_window minutes

"prediction_expires_at": "2026-04-22T15:01:00Z"

human_override

boolean optional

Whether an authorised operator has manually overridden the model's prediction — replacing or suppressing the automated forecast with an operator-authored assessment. When true, the prediction values reflect human judgement rather than model output and carry higher authority. Used during major incidents, planned closures, or when the model's training distribution does not match current conditions.

"human_override": false

audit_log_id

UUID optional

UUID referencing the immutable audit log entry for this prediction run in the WIM platform's audit service. Consumers with audit access can dereference this ID to retrieve the complete prediction log: input signal snapshot hashes, model version, inference timestamp, and all privacy field values at the moment of generation.

Format: RFC 4122 UUID v4 — Dereferenceable: via WIM audit API

"audit_log_id": "i2j3k4l5-0000-4000-i000-000000000009"