SLAM
NAVIGATION
SLAM (Simultaneous Localization and Mapping) is the primary alternative to GPS for autonomous drone position estimation. It uses the drone's existing camera — no additional hardware needed — to estimate position and build a map of the environment.
Why SLAM When GPS Is Jammed
GPS relies on satellite radio signals. The enemy jams those signals. Your drone can't hear the satellites. GPS position goes from 2-meter accuracy to "I have no idea where I am."
SLAM relies on what the camera SEES. Light travels at the speed of light and cannot be jammed by radio equipment. The enemy would have to block all light (complete darkness or fog) to defeat visual SLAM — and even then, thermal cameras see in the dark. A camera looking at the ground is immune to enemy electronic warfare.
The barometer (air pressure sensor) gives altitude — also immune to jamming because nobody can jam air pressure. So: camera for horizontal position, barometer for vertical position. Together they replace GPS in an EW zone.
How It Works
SLAM tracks visual features — corners, edges, distinctive patterns — in the camera image. When the drone moves, those features shift in the image. From how they shift, the algorithm calculates how the drone moved. Hundreds of features tracked simultaneously give a robust estimate even when some are lost (behind obstacles, blurred by motion).
The "Mapping" part: as the drone explores new terrain, it adds new features to its map. On a return trip, it recognizes previously seen features and corrects any accumulated drift. This means longer flights get MORE accurate at locations the drone has visited before, not less.
Accuracy without GPS: ~10-30m absolute position after terrain matching against pre-loaded orthophoto. Without terrain matching (pure visual odometry): drifts ~1-3m per minute of flight. For a 5-minute FPV strike mission, that's 5-15m error — acceptable for most targeting tasks since the AI detection accuracy is ~15-50m anyway.
Mathematical Derivation — Visual Odometry from Feature Tracking
This section derives step-by-step how ORB-SLAM3 converts sequential camera frames into drone motion estimates. The math is standard multi-view geometry (Hartley & Zisserman 2004, chapters 9–11; Mur-Artal et al. "ORB-SLAM", IEEE T-RO 2015) — this page's contribution is making explicit the error-propagation path from feature-level noise to drone-level position uncertainty so a reviewer can check when SLAM is trustworthy and when it fails.
ORB-SLAM3 — KEY DESIGN PARAMETERS
Step 1 — Feature extraction and matching
Each frame is scanned for ORB (Oriented FAST + Rotated BRIEF) features. Typical FPV/ISR cruise frame produces N = 500–2000 detected features depending on texture richness. Feature i at pixel (u_i, v_i) has a 256-bit binary descriptor that is matched against the previous frame's features by Hamming distance. A match is accepted if:
d_hamming(desc_t, desc_{t-1}) < τ_match (typical τ = 50 bits)
AND
d_hamming(desc_t, second_best) > 1.5 × d_hamming(desc_t, best_match) (ratio test)
The ratio test (Lowe 2004) rejects ambiguous matches. Typical outcome: 40–70 % of detected features produce accepted matches between consecutive frames at 30 FPS cruise. Snow, forest canopy, or open water can drop this below 10 % — the regime where SLAM fails.
Step 2 — Essential matrix from matched feature pairs
For each matched pair (x_i, x'_i) in consecutive frames, epipolar geometry imposes the constraint:
x'_i^T · E · x_i = 0
where E is the 3×3 essential matrix encoding the relative camera motion.
E has 5 degrees of freedom (3 for rotation, 2 for translation direction — absolute scale is lost in monocular SLAM).
Requires at least 5 matched point pairs for a minimal solution (Nistér's 5-point algorithm, 2004).
In practice ORB-SLAM3 uses RANSAC with 8-point or 5-point sampling on hundreds of candidate matches, keeping the E hypothesis that produces the largest inlier set. Inlier threshold: reprojection error < 2 pixels. Typical inlier count on good frames: 200–400 out of 500–1000 candidates.
Step 3 — Decomposition into rotation and translation
E decomposes via SVD as E = U · Σ · V^T where Σ = diag(1, 1, 0). Four possible (R, t) interpretations emerge. The correct pair is the one where triangulated 3D points lie in front of both cameras (positive depth). This gives the incremental pose:
[R_{t→t-1} | t_{t→t-1}] ∈ SE(3) (relative camera motion between frames)
Global camera pose updates by composition:
T_t = T_{t-1} · [R | t]
Note: monocular SLAM recovers translation direction but NOT scale. Scale is injected from one of three sources: IMU acceleration integration (scale drift proportional to IMU bias), barometric altitude change (vertical scale only), or stereo camera (not used on Fischer 26 — adds weight). Without scale correction, ORB-SLAM3 output is "up to scale" and must be multiplied by a factor obtained from one of these sources before it can be passed to ArduPilot as meters.
Step 4 — Drift accumulation and loop closure correction
Each incremental pose has small error (dominated by pixel noise and calibration error). Over N frames, these errors compound:
Pose error after N frames without loop closure:
σ_pos(N) ≈ σ_incremental × √N (random-walk model, independent increments)
For ORB-SLAM3 on Fischer 26 (published benchmarks on similar UAV platforms):
σ_incremental ≈ 0.008 m per frame at 30 FPS = 0.24 m/s RMS 1σ
After 10 minutes (18000 frames) without loop closure:
σ_pos ≈ 0.008 × √18000 = 1.07 m
But the real-world observation of 200 m drift/10 min for IMU-only and
5 m/10 min for ORB-SLAM3 post-loop-closure reflects systematic bias
(calibration error, scale drift) not captured by the random-walk model.
The empirical √N scaling breaks down at long time horizons.
Loop closure: when the drone revisits a previously seen area, ORB-SLAM3
recognizes features from the stored map and applies pose graph optimization
(Lie-algebra gradient descent over SE(3) manifold) to redistribute accumulated
error across all poses in the loop. Post-closure error reduces to
σ_pos ≈ σ_incremental × √(frames_since_last_closure) — resets on every loop.
Worked Example 1 — 5-minute FPV strike mission, no loop closure
FPV drone launches from platoon position, flies 2 km to target, engages, does not return. No loop closure (one-way mission). ORB-SLAM3 runs continuously on the downward-pointing camera.
Mission parameters:
Duration: 5 min = 300 s
Frames processed: 300 × 30 = 9000 frames
Scenario: Forest canopy terrain (low feature density), 40% match rate
Effective incremental error (degraded): ~2× baseline = 0.016 m/frame
Position uncertainty at impact:
σ_pos = 0.016 × √9000 = 1.52 m (random-walk theoretical)
Systematic calibration bias: ~3 m additional drift over 5 min
Total 1σ: ~3.4 m; 95% confidence interval: ~6.7 m
Targeting accuracy impact:
YOLOv8 bounding-box pixel uncertainty at 120 m AGL: ±0.56 m
Total target localization error (SLAM + detection):
σ_total = √(3.4² + 0.56²) ≈ 3.5 m 1σ
Compare to GPS-dependent mission:
GPS accuracy: ±2 m
Detection uncertainty: ±0.56 m
σ_total with GPS: ±2.1 m 1σ
Conclusion: GPS-denied degrades targeting from 2.1 m to 3.5 m — acceptable
for area suppression, marginal for precision strike on point targets.
Worked Example 2 — 30-minute Fischer 26 ISR with orbit loop closures
Fischer 26 flies a 3 km-radius orbit over the ISR target area, returning over the same terrain features every 6 minutes. Each complete orbit triggers loop closure against the previous orbit's stored map.
Mission parameters:
Duration: 30 min
Orbit period: 6 min (5 full orbits)
Frames per orbit: 6 × 60 × 30 = 10,800 frames
Terrain: Structured boreal (distinct tree lines, building corners) — good features
Effective error: baseline 0.008 m/frame (no degradation)
Drift growth PER ORBIT (before each loop closure):
σ_pos_peak = 0.008 × √10800 = 0.83 m 1σ (end of orbit, just before closure)
Post-closure reset: σ drops to ~0.1 m (residual after pose graph optimization)
Average position uncertainty during mission:
σ_pos_avg ≈ 0.5 m over 30 min (saw-tooth pattern: ramp 0.1 → 0.83, reset, repeat)
Peak: 0.83 m at end of each orbit window
Compare to no-loop-closure mission:
σ_pos(30 min) = 0.008 × √54000 = 1.86 m 1σ (plus systematic bias ~5 m)
Operational value of loop closures:
30-min ISR with loops: peak error 0.83 m, avg 0.5 m
30-min ISR without: peak error 7+ m (grows monotonically)
Factor improvement: 10× for persistent ISR, trivial for one-way strikes.
Verification Code — Drift and Loop Closure Simulation
import math
INCREMENTAL_ERROR_M_PER_FRAME = 0.008 # baseline per-frame random-walk noise
FRAME_RATE_HZ = 30
def slam_drift_sigma(minutes, terrain_quality=1.0, loops_per_mission=0):
"""Estimate 1-sigma position uncertainty after given flight duration.
terrain_quality: 1.0=good features (boreal/urban), 0.5=poor (forest/water).
loops_per_mission: number of loop closures that reset drift.
"""
total_frames = minutes * 60 * FRAME_RATE_HZ
effective_err_per_frame = INCREMENTAL_ERROR_M_PER_FRAME / terrain_quality
if loops_per_mission == 0:
# No loop closure - pure random walk
return effective_err_per_frame * math.sqrt(total_frames)
else:
# Assume closures evenly spaced; drift resets each closure
frames_per_segment = total_frames / (loops_per_mission + 1)
peak_sigma = effective_err_per_frame * math.sqrt(frames_per_segment)
return peak_sigma # Return peak (worst case before closure)
def targeting_error(slam_sigma, detection_sigma=0.56):
"""Combine SLAM uncertainty with YOLOv8 detection uncertainty."""
return math.sqrt(slam_sigma**2 + detection_sigma**2)
# Reproduce Worked Example 1 (5-min FPV, no loop closure, forest terrain)
sigma_slam = slam_drift_sigma(minutes=5, terrain_quality=0.5, loops_per_mission=0)
sigma_tgt = targeting_error(sigma_slam)
print(f"FPV 5-min forest: SLAM sigma={sigma_slam:.2f} m, targeting={sigma_tgt:.2f} m")
# Expected: ~3 m SLAM, ~3 m total (forest degrades 2x)
# Reproduce Worked Example 2 (30-min ISR, 5 loop closures, good terrain)
sigma_slam = slam_drift_sigma(minutes=30, terrain_quality=1.0, loops_per_mission=4)
sigma_tgt = targeting_error(sigma_slam)
print(f"ISR 30-min with loops: SLAM sigma={sigma_slam:.2f} m, targeting={sigma_tgt:.2f} m")
# Expected: ~0.8 m peak SLAM, ~1.0 m total
# Without loop closures:
sigma_slam_noclose = slam_drift_sigma(minutes=30, terrain_quality=1.0, loops_per_mission=0)
print(f"ISR 30-min no loops: SLAM sigma={sigma_slam_noclose:.2f} m")
# Expected: ~1.9 m
# Sensitivity: how much does loop closure frequency matter?
for n_loops in [0, 1, 3, 5, 10]:
s = slam_drift_sigma(30, 1.0, n_loops)
print(f" {n_loops:2d} loops in 30 min: peak sigma = {s:.2f} m")
Why This Derivation Matters Operationally
Four operational decisions depend on this SLAM error model being correct. If we get this wrong the consequences are direct and measurable: drones navigate toward positions that do not exist, FPV pilots cannot find targets Fischer 26 detected, and the entire GPS-denied claim collapses at the moment of contact with enemy EW.
First, mission planning: the targeting-error calculation (3.5 m for forest FPV strike vs 1.0 m for structured-terrain ISR) tells the planner whether SLAM-based navigation is acceptable for a given mission type. Point-precision strike against a command vehicle in forest terrain has 3.5 m localization error — insufficient for first-round hits. The planner must route Fischer 26 to structured terrain corridors (road intersections, tree lines, building clusters) or accept that FPV terminal guidance handles the last few meters visually.
Second, orbit pattern design: Worked Example 2 shows that Fischer 26's orbit period matters directly for position accuracy. A 6-minute orbit produces 0.83 m peak error at each orbit's end; a 15-minute orbit produces 1.3 m peak error. The coverage planner must trade off orbit radius (bigger = more area covered) against orbit period (shorter = better SLAM accuracy). The derivation makes this trade-off quantitative rather than intuitive.
Third, terrain selection for GPS-denied operations: the terrain_quality factor in the simulation quantifies what is otherwise a qualitative observation ("SLAM works in forests, fails on water"). At terrain_quality = 0.3 (open snow, dense homogeneous canopy), SLAM error doubles every minute — unacceptable for any mission longer than 2–3 minutes. The planner can explicitly exclude such terrain from SLAM-dependent mission profiles, or add an IMU-only dead-reckoning fallback with its own error budget.
Fourth, calibration investment: the derivation shows that systematic bias (calibration error, scale drift) dominates long-duration missions more than incremental random noise. Investing in high-quality camera calibration (10×7 chessboard with OpenCV at 50+ poses, re-calibrated every 100 flight hours) reduces systematic bias from ~3 m/5 min to < 1 m/5 min — a bigger operational improvement than doubling the SLAM frame rate. This tells engineering where to spend effort: calibration discipline beats algorithm optimization for field accuracy.
The random-walk model (σ = σ_incremental × √N) is verified numerically in the code above. The observed real-world drift rates (200 m/10 min IMU-only, 5 m/10 min ORB-SLAM3 post-loop, 1–3 m/min pure VO) are taken from published benchmarks on comparable UAV platforms (Campos et al. 2021; ArduPilot Issue tracker discussions of EK3_SRC1 Vision Position). FSG-A has not flown ORB-SLAM3 on Fischer 26 in the field — all numbers in this derivation are either published-literature values, calibration math, or simulation outputs, not FSG-A flight measurements.
Implementation
ORB-SLAM3 runs on the Jetson Orin Nano alongside YOLOv8. It uses the same camera — no extra hardware. Processing cost: ~5ms per frame on Jetson Orin. Position estimate fed directly into ArduPilot's EKF3 as an external position source (replaces GPS). Configuration: EK3_SRC1_POSXY=6 (ExternalNav), VISO_TYPE=1 (MAVLink vision position).
External source: SLAM – Wikipedia
Implementation
# pip install numpy
# ORB-SLAM3 Integration with ArduPilot via MAVLink
# pip install pymavlink
import time
import numpy as np
class ORBSLAM3Bridge:
"""Bridge ORB-SLAM3 pose to ArduPilot via VISION_POSITION_ESTIMATE."""
def __init__(self, mavlink_conn):
self.mav = mavlink_conn
self.origin_set = False
def send_vision_position(self, x_m, y_m, z_m, roll, pitch, yaw):
"""Send ORB-SLAM3 position to ArduPilot EKF3."""
self.mav.mav.vision_position_estimate_send(
int(time.time() * 1e6), # timestamp_usec
x_m, y_m, z_m, # position NED (meters)
roll, pitch, yaw, # orientation (radians)
[0] * 21 # covariance (not used)
)
def process_frame(self, orbslam3_output):
"""Convert ORB-SLAM3 camera frame to NED frame."""
# ORB-SLAM3 outputs in camera frame (right-down-forward)
# ArduPilot needs NED (north-east-down)
T = orbslam3_output.pose # 4x4 transformation matrix
# Camera to NED rotation
R_cam_to_ned = np.array([
[0, 0, 1], # Camera Z → North
[1, 0, 0], # Camera X → East
[0, 1, 0] # Camera Y → Down
])
position_cam = T[:3, 3]
position_ned = R_cam_to_ned @ position_cam
self.send_vision_position(
position_ned[0], position_ned[1], position_ned[2],
0, 0, 0 # Roll/pitch/yaw from ORB-SLAM3 rotation matrix
)
# ArduPilot config for ORB-SLAM3 integration:
# param set EK3_SRC1_POSXY 6 # ExternalNav
# param set EK3_SRC1_POSZ 6 # ExternalNav
# param set EK3_SRC1_YAW 6 # ExternalNav
# param set VISO_TYPE 1 # MAVLink vision position
Pixel-to-Motion Math — Visual Odometry in 30 Lines
The feature-tracking math that underlies SLAM is simple enough to write out. Two sequential frames of a feature at image coordinates (u₁, v₁) and (u₂, v₂) plus the drone altitude h above the ground plane give the horizontal translation. If the camera has focal length f (in pixels), the projected motion on the ground is: Δx = (u₂ − u₁) × h / f, Δy = (v₂ − v₁) × h / f.
This is a flat-earth, nadir-pointing, no-rotation approximation. The full ORB-SLAM3 pipeline handles rotation, perspective, and lens distortion, but the fundamental relationship stays the same: pixel motion scales linearly with altitude. Flying higher means coarser motion resolution; flying lower means finer resolution but shorter time horizon before features leave the frame.
# Minimal visual odometry — one feature, nadir camera
# This is the kernel of what ORB-SLAM3 does for ~2000 features at once.
import numpy as np
def feature_motion_to_ground_motion(u1, v1, u2, v2, altitude_m,
focal_length_px=2400):
"""Convert pixel displacement to ground displacement (meters).
Arducam IMX477 @ 1640x1232 with 6mm lens gives f ≈ 2400 px.
Assumes camera points straight down (nadir) and ground is flat.
"""
dx_px = u2 - u1
dy_px = v2 - v1
dx_m = dx_px * altitude_m / focal_length_px
dy_m = dy_px * altitude_m / focal_length_px
return dx_m, dy_m
# Example: drone at 120 m, feature moved 2 px right, 0 px down
dx, dy = feature_motion_to_ground_motion(800, 600, 802, 600, 120)
velocity_ms = dx * 30 # 30 fps
print(f"2 px/frame @ 120m = ground step {dx:.3f} m → {velocity_ms:.1f} m/s")
# → 3.0 m/s — matches Fischer 26 cruise speed (~24 m/s) only at higher fps
Drift Accumulation — Why Loop Closure Matters
Each frame-to-frame estimate carries a small error. Without correction, these errors accumulate as a random walk: the standard deviation of position error grows as the square root of time. After n frames, drift scales with √n. Loop closure is the antidote — when the drone revisits a known landmark, the accumulated error collapses to the loop-closure uncertainty, typically 2–5 m.
# Drift scaling — random walk vs loop closure
import math
SIGMA_PER_FRAME_M = 0.015 # 1.5 cm per-frame uncertainty (typical)
FPS = 30
def drift_after_seconds(t_s, sigma_per_frame=SIGMA_PER_FRAME_M, fps=FPS):
"""Open-loop drift estimate (random walk, 1-sigma)."""
n_frames = t_s * fps
return sigma_per_frame * math.sqrt(n_frames)
for t in [10, 60, 300, 600]:
drift_m = drift_after_seconds(t)
print(f"{t:4d} s open-loop: drift 1-sigma ≈ {drift_m:6.2f} m")
# 10 s: 0.26 m | 60 s: 0.64 m
# 300 s: 1.42 m | 600 s: 2.01 m (without loop closure)
# With loop closure every 60 s, drift resets to ~2 m each pass.
# Net error stays bounded instead of growing unboundedly.
Dempster-Shafer Minimal Example
Two independent SLAM-derived position estimates can be fused to a single estimate with lower uncertainty. The rule is a weighted average by inverse variance. With estimates p₁ ± σ₁ and p₂ ± σ₂, the fused position is σ²_fused = 1 / (1/σ₁² + 1/σ₂²) and p_fused = σ²_fused × (p₁/σ₁² + p₂/σ₂²).
# Two independent SLAM estimates fused by inverse-variance weighting
def fuse_estimates(p1, sigma1, p2, sigma2):
w1, w2 = 1.0/sigma1**2, 1.0/sigma2**2
var_fused = 1.0 / (w1 + w2)
p_fused = var_fused * (p1*w1 + p2*w2)
return p_fused, var_fused**0.5
# ORB-SLAM3 says x = 100.5 m ± 3 m; IMU dead reckoning says x = 98.0 m ± 8 m
x, s = fuse_estimates(100.5, 3.0, 98.0, 8.0)
print(f"Fused x = {x:.2f} m ± {s:.2f} m")
# → 100.18 m ± 2.82 m (better than either input alone)
Sources
Parameter sources. ORB-SLAM3 — Campos, Elvira, Rodríguez, Montiel & Tardós, "ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial, and Multi-Map SLAM", IEEE Transactions on Robotics, Vol. 37, 2021. Jetson Orin Nano specifications — NVIDIA datasheet. Arducam IMX477 characteristics — Sony IMX477 datasheet + Arducam. ArduPilot parameters (EK3_SRC1_POSXY, VISO_TYPE) — ardupilot.org/dev documentation. ORB-SLAM3 source code — github.com/UZ-SLAMLab/ORB_SLAM3.
Operational estimates — not validated by FSG-A in the field. 5 m/10 min drift after loop closure in urban conditions — published value from Campos et al. (2021), not measured by FSG-A on Fischer 26. "30% GPU at 30 FPS on Orin Nano" — typical performance from NVIDIA developer forums, not measured by FSG-A. "10–30 m accuracy after terrain matching" — calculated from typical orthophoto resolution, not measured. FSG-A has not deployed ORB-SLAM3 on a real drone platform.
External standards and references. Campos, C., Elvira, R., Rodríguez, J. J. G., Montiel, J. M. M., & Tardós, J. D. (2021). "ORB-SLAM3", IEEE Transactions on Robotics. ArduPilot visual odometry documentation (ardupilot.org/dev). FOI Memo 8336 "GPS-Denied Navigation for UAS" (2024). Jetson Orin Nano performance benchmarks for visual SLAM (NVIDIA developer forums). ROS 2 framework for SLAM-stack integration.
ORB-SLAM3 Performance Characteristics
ORB-SLAM3 extracts Oriented FAST corner features from camera frames and tracks them between sequential images to estimate camera motion. The algorithm requires visually distinctive terrain — buildings, rock formations, tree lines, road intersections — to generate reliable features. In favorable conditions (urban environment, structured terrain), position drift reduces from 200m/10min (IMU-only) to 5m/10min after the first loop closure. Loop closure occurs when the drone revisits a previously mapped area and recognizes the same landmarks, allowing accumulated drift to be corrected retroactively.
In unfavorable conditions (uniform forest canopy, snow-covered fields, open water), ORB-SLAM3 fails to find sufficient features and reverts to IMU-only dead reckoning. The Fischer 26 operator must understand these limitations: plan ISR routes over terrain with visual diversity, avoid extended flight over featureless areas, and accept that position accuracy degrades during featureless segments. The Jetson Orin Nano runs ORB-SLAM3 at 30 FPS using approximately 30 percent of GPU capacity — leaving 70 percent for simultaneous YOLOv8 inference on both visual and thermal streams.
Performance Comparison — GPS vs SLAM vs Dead Reckoning
Three navigation modes produce dramatically different position accuracy over a 30-minute Fischer 26 ISR mission. GPS (when available): ±2m continuous, zero drift. ORB-SLAM3 with loop closure: ±5m after first orbit, improving with each subsequent orbit. Dead reckoning (IMU + barometer + optical flow): ±200m after 30 minutes, continuously degrading. The position error directly affects targeting accuracy — coordinates derived from GPS navigation are artillery-grade (50m CEP), from SLAM are FPV-grade (enough for visual search), and from dead reckoning are area-grade (sector reference only).