Design a Proximity Service (Nearby Friends) | System Design Course

Who Asks This Question?

The proximity service question is popular at location-aware companies and social platforms. Based on interview reports, it's frequently asked at:

Meta — Their "Nearby Friends" feature is exactly this system; multiple Blind reports confirm it
Snapchat — Snap Map requires real-time proximity detection across millions of users
Google — Google Maps "Share Location" and "Find My Device" use similar architectures
Uber — Driver-rider matching is proximity search at massive scale
Bumble — Location-based matching for dating apps
Foursquare — Their entire business model was built on location proximity
Apple — Find My Friends and AirTag tracking use proximity services
Yelp — Restaurant and business discovery relies heavily on geospatial queries

This question tests whether you understand geospatial data at scale. Companies that ask it want to see that you know the difference between "calculate distance in a for-loop" (trivial) and "find nearby users among 100 million without scanning the entire database" (hard).

What the Interviewer Is Really Testing

Most candidates focus on the distance formula and miss the distributed systems challenges. Here's the actual scoring breakdown at most companies:

Evaluation Area	Weight	What They're Looking For
Requirements gathering	15%	Do you clarify active vs passive users, privacy, and scale?
Geospatial indexing	30%	Can you explain geohash, quadtree, or other spatial indexing?
Real-time updates	25%	WebSockets, location update frequency, mobile battery concerns
Privacy & security	15%	Location opt-in, data retention, precise location vs approximate
Scale considerations	15%	Database sharding, hot regions, read/write patterns

The #1 reason candidates fail this question: they spend 20 minutes on the haversine distance formula while the interviewer waits for them to mention geospatial indexing. The distance calculation is trivial — the spatial indexing is the actual challenge.

Step 1: Clarify Requirements

Questions You Must Ask

These questions fundamentally change your architecture:

"Is this active discovery (like Snapchat's Snap Map) or passive background (like Find My Friends)?" Active discovery means users actively search for nearby friends. Passive means the system pushes notifications when friends come nearby. Active has different read/write patterns and privacy implications.

"What's the proximity range — 1 mile, 10 miles, or configurable?" This affects indexing strategy. Smaller ranges (< 1 mile) can use finer-grained geohash precision. Larger ranges might need multiple index lookups or different algorithms.

"How precise should the location be — exact coordinates or approximate area?" For privacy, many apps show "nearby" without revealing exact location. Snapchat shows friends within a few hundred meters but not precise coordinates. This changes how you store and index location data.

"How many users and how often do they update location?" This is the make-or-break question for system scale. 1 million users updating every 30 seconds is manageable. 100 million users updating every 10 seconds requires serious distributed design.

"Should this work globally or in specific regions?" Global deployment means handling geospatial data across multiple data centers, time zones, and potentially different privacy regulations (GDPR location data requirements).

Requirements You Should State

After asking questions, explicitly state what you're building:

Functional:

Users can opt-in to share their location with friends
Users can see which friends are within X miles of their current location
Location sharing can be toggled on/off and has configurable visibility (all friends, close friends, temporary)
System supports both "find nearby friends now" (active) and "notify when friend is nearby" (passive)

Non-functional:

Support 50 million active users
Handle 10 million location updates per minute
Sub-200ms response time for proximity queries
99.9% availability for location updates
Global deployment across multiple regions
Location data encrypted at rest and in transit
Battery-efficient mobile updates (not constant GPS polling)

Step 2: High-Level Architecture

Core Components

[Mobile App] 
    |
    v
[Load Balancer]
    |
    v
[API Gateway] --> [Auth Service]
    |
    v
[Location Service] --> [User Service]
    |                      |
    v                      v
[Geospatial DB]     [Friend Graph DB]
    |                      |
    v                      v
[Notification Service] <---+
    |
    v
[WebSocket/Push Notifications]

Request flows:

Location Update:

Mobile app → Location Service (user coordinates + timestamp)
Location Service → Geospatial DB (store/update spatial index)
Location Service → Notification Service (check if any friends are nearby)

Find Nearby Friends:

Mobile app → Location Service (get nearby friends for user)
Location Service → Geospatial DB (spatial query within radius)
Location Service → Friend Graph DB (filter results to actual friends)
Return filtered friend list with approximate distances

Database Design

User Location Store (Primary):

CREATE TABLE user_locations (
    user_id BIGINT PRIMARY KEY,
    latitude DECIMAL(10,8) NOT NULL,
    longitude DECIMAL(11,8) NOT NULL,
    geohash VARCHAR(12) NOT NULL,    -- For spatial indexing
    last_updated TIMESTAMP NOT NULL,
    location_sharing_enabled BOOLEAN DEFAULT false,
    INDEX idx_geohash (geohash),
    INDEX idx_last_updated (last_updated)
);

Friend Relationships:

CREATE TABLE friendships (
    user_id BIGINT,
    friend_id BIGINT,
    status ENUM('pending', 'accepted', 'blocked'),
    location_sharing_level ENUM('none', 'approximate', 'precise'),
    created_at TIMESTAMP,
    PRIMARY KEY (user_id, friend_id),
    INDEX idx_user_friends (user_id, status)
);

Geospatial Indexing Strategy

This is the heart of the system. Raw distance calculations don't scale:

Naive approach (doesn't scale):

-- This scans every user record!
SELECT user_id, latitude, longitude 
FROM user_locations 
WHERE SQRT(POW(latitude - 37.7749, 2) + POW(longitude - (-122.4194), 2)) < 0.1;

Production approach: Geohash + Prefix Matching

Geohash encodes lat/lng into a string where nearby locations share common prefixes:

San Francisco: 9q8yywe (precision 7 ≈ 153m x 153m)
Nearby location: 9q8yywd (shares 6-char prefix)
Distant location: 9q9p5bc (different prefix)

Geohash implementation:

public class GeohashService {
    private static final String BASE32 = "0123456789bcdefghjkmnpqrstuvwxyz";
    
    public String encode(double lat, double lon, int precision) {
        double[] latRange = {-90.0, 90.0};
        double[] lonRange = {-180.0, 180.0};
        StringBuilder geohash = new StringBuilder();
        boolean isEven = true;
        
        while (geohash.length() < precision) {
            int bit = 0;
            for (int i = 0; i < 5; i++) {
                if (isEven) { // longitude
                    double mid = (lonRange[0] + lonRange[1]) / 2;
                    if (lon >= mid) {
                        bit = (bit << 1) | 1;
                        lonRange[0] = mid;
                    } else {
                        bit = bit << 1;
                        lonRange[1] = mid;
                    }
                } else { // latitude
                    double mid = (latRange[0] + latRange[1]) / 2;
                    if (lat >= mid) {
                        bit = (bit << 1) | 1;
                        latRange[0] = mid;
                    } else {
                        bit = bit << 1;
                        latRange[1] = mid;
                    }
                }
                isEven = !isEven;
            }
            geohash.append(BASE32.charAt(bit));
        }
        return geohash.toString();
    }
}

Spatial query using geohash:

-- Find users within ~2.4km (geohash precision 6)
SELECT u.user_id, u.latitude, u.longitude 
FROM user_locations u
WHERE u.geohash LIKE '9q8yyw%' 
  AND u.location_sharing_enabled = true
  AND u.last_updated > NOW() - INTERVAL 1 HOUR;

Pro tip: For edge cases near geohash boundaries, query all adjacent geohash cells too. A friend 100 meters away might have a different geohash prefix due to cell boundaries.

Alternative: Quadtree for Dynamic Hot Regions

Geohash works well for uniform distribution, but some areas (Times Square, airports) have much higher user density.

Quadtree approach:

Recursively divide space into 4 quadrants
Each leaf node contains up to N users
When a leaf exceeds N users, split into 4 children
Dynamically adapts to user density

public class QuadTree {
    private static final int MAX_USERS_PER_NODE = 100;
    
    static class QuadNode {
        double minLat, maxLat, minLon, maxLon;
        List<User> users = new ArrayList<>();
        QuadNode[] children = new QuadNode[4]; // NW, NE, SW, SE
        
        boolean isLeaf() { return children[0] == null; }
        
        void insert(User user) {
            if (isLeaf() && users.size() < MAX_USERS_PER_NODE) {
                users.add(user);
            } else {
                if (isLeaf()) split();
                getQuadrant(user).insert(user);
            }
        }
    }
}

When to mention quadtree: If the interviewer asks about handling non-uniform distribution or mentions "hot spots" like downtown areas.

Step 3: Deep Dive — Real-Time Location Updates

Location Update Frequency Trade-offs

This is a classic mobile system design challenge: accuracy vs battery life vs server load.

Update strategies:

Strategy	Battery Impact	Accuracy	Server Load
Continuous GPS	Very High	Highest	Extreme
Time-based (every 30s)	High	Good	High
Distance-based (every 100m)	Medium	Good	Medium
Significance-based	Low	Variable	Low

Significance-based updates (recommended):

public class LocationUpdatePolicy {
    private static final double MIN_DISTANCE_METERS = 50;
    private static final long MIN_TIME_MS = 30_000;
    private static final double SPEED_THRESHOLD_MPS = 5; // ~11 mph
    
    public boolean shouldUpdate(Location current, Location last) {
        if (last == null) return true;
        
        double distance = haversineDistance(current, last);
        long timeDiff = current.timestamp - last.timestamp;
        
        // Always update if time threshold passed
        if (timeDiff > MIN_TIME_MS) return true;
        
        // Update if moved significant distance
        if (distance > MIN_DISTANCE_METERS) return true;
        
        // Update more frequently if moving fast (driving/transit)
        double speed = distance / (timeDiff / 1000.0);
        if (speed > SPEED_THRESHOLD_MPS && distance > 25) return true;
        
        return false;
    }
}

WebSocket for Real-Time Proximity Notifications

When a friend comes nearby, push a notification immediately rather than waiting for the next app poll.

WebSocket connection management:

@Component
public class ProximityNotificationService {
    private final Map<Long, WebSocketSession> userSessions = new ConcurrentHashMap<>();
    
    @EventListener
    public void handleLocationUpdate(LocationUpdateEvent event) {
        Long userId = event.getUserId();
        Location newLocation = event.getLocation();
        
        // Find friends within notification radius
        List<Long> nearbyFriends = findNearbyFriends(userId, newLocation, NOTIFICATION_RADIUS_METERS);
        
        for (Long friendId : nearbyFriends) {
            // Check if this friend wasn't nearby before
            if (!wasPreviouslyNearby(userId, friendId)) {
                sendProximityNotification(friendId, userId, newLocation);
            }
        }
    }
    
    private void sendProximityNotification(Long recipientId, Long friendId, Location location) {
        WebSocketSession session = userSessions.get(recipientId);
        if (session != null && session.isOpen()) {
            ProximityNotification notification = ProximityNotification.builder()
                .friendId(friendId)
                .approximateDistance(Math.round(calculateDistance(recipientId, location)))
                .area(getApproximateArea(location)) // "Downtown SF", not exact coordinates
                .build();
                
            session.sendMessage(new TextMessage(JsonUtils.toJson(notification)));
        }
    }
}

Database Sharding Strategy

With 50M users updating location, a single database won't scale. Shard by geography:

Geohash-based sharding:

public class GeoShardingStrategy {
    private static final List<String> SHARD_REGIONS = Arrays.asList(
        "us-west", "us-east", "eu-west", "asia-pacific"
    );
    
    public String getShardForLocation(double lat, double lon) {
        if (lat >= 24.0 && lat <= 49.0 && lon >= -125.0 && lon <= -66.0) {
            return lon < -95.0 ? "us-west" : "us-east";
        }
        if (lat >= 35.0 && lat <= 71.0 && lon >= -10.0 && lon <= 40.0) {
            return "eu-west";
        }
        return "asia-pacific";
    }
}

Cross-shard friend queries: When a user near shard boundaries searches for friends, you may need to query adjacent shards:

public List<NearbyFriend> findNearbyFriends(Long userId, Location location, int radiusMeters) {
    String primaryShard = getShardForLocation(location.latitude, location.longitude);
    List<String> shardsToQuery = Arrays.asList(primaryShard);
    
    // If near shard boundary, query adjacent shards too
    if (isNearShardBoundary(location, radiusMeters)) {
        shardsToQuery = getAdjacentShards(primaryShard);
    }
    
    return shardsToQuery.stream()
        .flatMap(shard -> queryShardForNearbyFriends(shard, userId, location, radiusMeters).stream())
        .collect(Collectors.toList());
}

Step 4: Deep Dive — Privacy and Security

Location data is extremely sensitive. Strong answers address privacy proactively.

Privacy Levels

Granular consent model:

public enum LocationSharingLevel {
    NONE(0),           // Not visible to anyone
    APPROXIMATE(1),    // Visible within ~500m accuracy  
    PRECISE(2);        // Exact coordinates
    
    public static class LocationPrivacySettings {
        LocationSharingLevel defaultLevel = NONE;
        Map<Long, LocationSharingLevel> friendOverrides = new HashMap<>();
        boolean shareWithAllFriends = false;
        boolean temporarySharing = false; // 24-hour expiry
        Set<String> blockedAreas = new HashSet<>(); // "home", "work"
    }
}

Location fuzzing for approximate sharing:

public class LocationPrivacy {
    private static final double APPROXIMATE_RADIUS_METERS = 500;
    
    public Location approximateLocation(Location exact) {
        // Add random offset within 500m radius
        double radiusInDegrees = APPROXIMATE_RADIUS_METERS / 111320.0; // approx meters per degree
        double angle = Math.random() * 2 * Math.PI;
        double distance = Math.random() * radiusInDegrees;
        
        return Location.builder()
            .latitude(exact.latitude + distance * Math.cos(angle))
            .longitude(exact.longitude + distance * Math.sin(angle))
            .accuracy(APPROXIMATE_RADIUS_METERS)
            .build();
    }
}

Location data retention policy:

-- Automatically delete old location data
CREATE EVENT delete_old_locations
ON SCHEDULE EVERY 1 DAY
DO
DELETE FROM user_locations 
WHERE last_updated < DATE_SUB(NOW(), INTERVAL 30 DAY);

-- User deletion (GDPR Article 17)
CREATE PROCEDURE delete_user_location_data(IN user_id BIGINT)
BEGIN
    DELETE FROM user_locations WHERE user_id = user_id;
    DELETE FROM location_sharing_history WHERE user_id = user_id OR shared_with_user_id = user_id;
    INSERT INTO user_deletion_log (user_id, deleted_at) VALUES (user_id, NOW());
END;

Common Mistakes

These patterns come from real interview feedback:

Mistake 1: Forgetting About Mobile Battery Life

Suggesting "update location every 5 seconds" shows you've never built a mobile app. Real apps use significant movement thresholds and adaptive update frequencies.

Mistake 2: Storing Everyone's Location in One Big Table

A single user_locations table with 50M rows and no spatial indexing won't scale. Mention geohash indexing or geographic sharding from the start.

Mistake 3: Exact Distance Calculations for Everyone

Using the haversine formula to calculate exact distance to every user in the database. You need spatial indexing to narrow down candidates first, then exact distance calculations only for the filtered set.

Mistake 4: Ignoring Privacy

Treating location data like any other user data. Location requires special handling: consent, retention policies, approximate vs exact sharing, and deletion guarantees.

Conflating "nearby friends" (social network based) with "nearby people" (Tinder, Bumble style). The database design and privacy models are completely different.

Interviewer Follow-Up Questions

Prepare for these advanced scenarios:

"How would you handle a music festival with 50,000 people in a small area?" This is the "hot region" problem. Quadtree adapts better than geohash. You might also temporarily increase geohash precision for known event locations, or use event-specific sharding.

"What if someone is driving on a highway at 80 mph?" High-speed movement requires more frequent updates for accuracy. Use speed-based update intervals: stationary = 5 minutes, walking = 2 minutes, driving = 30 seconds. But also consider that highway users probably don't care about nearby friends.

"How do you prevent stalking or harassment?" Multiple protection layers: block/unblock functionality, sharing only with accepted friends, ability to see who viewed your location, and "ghost mode" to appear offline while still using the app.

"What about international borders?" Different countries have different privacy laws. Location data might need to stay within national boundaries (data sovereignty). Your sharding strategy should align with legal requirements, not just geography.

"How would you scale to 500 million users?" Multi-level sharding: geographic shards, then further partition by user ID within each region. Use read replicas for query load. Consider caching frequently-accessed location data in Redis with expiration.

Summary: Your 35-Minute Interview Plan

Time	What to Do
0-5 min	Clarify requirements: active vs passive, precision, scale, privacy expectations
5-12 min	High-level architecture: components, data flow, database design
12-22 min	Geospatial indexing: choose geohash vs quadtree, explain with examples
22-28 min	Real-time updates: WebSocket notifications, location update frequency, mobile battery
28-33 min	Privacy & scale: data retention, GDPR, geographic sharding
33-35 min	Wrap up: trade-offs, monitoring, what you'd improve

The proximity service interview tests your understanding of geospatial systems, mobile constraints, and privacy requirements all at once. The key insight: it's not about finding the perfect algorithm — it's about making the right trade-offs between accuracy, performance, battery life, and privacy at scale.