Meta Software Engineer Onsite Interview Question Breakdown: K-Closest Points

Meta Software Engineer Onsite Interview Question Breakdown: K-Closest Points

This article is the 2nd in my series breaking down Meta interview questions which I actually gave as an interviewer during my time there. There's a lot of misinformation around how Big Tech judges candidates, and this series' goal is to demystify all that and help you study data structure and algorithms (DSA) properly. If you want to read the 1st article, you can find it here: Meta Software Engineer Interview Question Breakdown: Finding Unique Integers

While the 1st article covered my warm-up phone screen question, this one takes us into the Big Leagues: The onsite. I'll cover the primary problem I gave when I was one of the DSA interviewers (code name "Ninja") on a candidate's final panel.


The Problem

Here's the problem: Given a list of n points on a cartesian plane and a number k, return a list of the k-closest points to the origin.

Let's clarify this with an example:

Input:

Points = [[3, 0], [2, 2], [7, 3], [1, 1], [0,0]]

k = 3

Output: [[0,0], [1, 1], [2, 2]]

As a reminder, you use the Pythagorean Theorem (c = sqrt(a^2 + b^2)) to find the distance of a point from the origin.


The Setup

If you look at the prompt, it's suspiciously oriented around the happy flow. The list is short and the points are very clean. I did this so the candidate would ask questions to clarify the problem like:

  1. What can be in a point?
    1. Can there be very large Integers?
    2. Can there be Floats?
    3. Can it have negative coordinates?
  2. What can the list be like?
    1. Can it be very small, empty, or null?
    2. Can it be extremely large?
  3. What to do in case of points that are the same distance and qualify as the k-closest?
    1. Return all of them even if we go above k?
    2. Have some sort of tiebreak algorithm?
    3. Just return whatever amount of them at random?

The only one that really mattered was #3: We should do c. We want to adhere to k but having an arbitrary tiebreak algorithm takes us away from the core of the problem.


The "Optimal" Solution

This problem boils down to knowing what a Heap is, which you can achieve with a PriorityQueue in Java (my primary programming language during my Meta days). The logic is as follows:

  1. Create a Point class with some utility logic somewhere to compute distance from origin.
  2. Create a Heap of k points to return.
  3. Iterate through every number in the list and add it to the Heap. If it's bigger than k, remove the point furthest away from the origin as we now know there's at least k pointers closer than it.
  4. After your loop is done, just extract everything from the Heap and return it.

Here's the solution in Java:

import java.util.ArrayList;
import java.util.List;
import java.util.PriorityQueue;

class Point {
    float x, y;
    
    public Point(float x, float y) {
        this.x = x;
        this.y = y;
    }
    
    public float distanceFromOrigin() {
        return x * x + y * y;
    }
}

public class KClosestPointsToOrigin {
    public List<Point> kClosestPoints(List<Point> points, int k) {
        PriorityQueue<Point> minHeap = new PriorityQueue<>((p1, p2) -> p2.distanceFromOrigin() - p1.distanceFromOrigin());
        
        for (Point point : points) {
            minHeap.offer(point);
            if (minHeap.size() > k) {
                minHeap.poll();
            }
        }
        
        List<Point> result = new ArrayList<>();
        while (!minHeap.isEmpty()) {
            result.add(minHeap.poll());
        }
        return result;
    }
}

Run-time: O(n log k) - You only need to go through each number once (n) and updating the HashMap is (log k) for each operation. Reading through the heap at the end to construct the return value is linear, so it doesn't factor into Big O.

Space: O(k) - The Heap is k + 1 space, which reduces down to O(k).


The Follow-Ups

When you're building production software at the scale of a FAANG company like Meta, you absolutely need to have an eye for detail and proactively handle edge cases. This problem is naturally laden with tons of them.

This is why I asked something like this after the candidate created their initial solution: "Let's say this code you wrote is going to production and thousands of engineers in your company will call it. What test cases can you run by it to really show your teammates that the code is truly rock-solid?"

From there, I expected some collection of these (some of which may have been mentioned in the setup step):

  1. The list is huge (i.e. several billion points or more)
  2. The list is tiny (0 or 1 items)
  3. The list is null
  4. k is larger than n
  5. k is the same as n
  6. k is tiny (0 or 1)
  7. k is negative
  8. Points have very large numbers (bigger than INT_MAX or smaller than INT_MIN)
  9. All the points are the same distance from the origin
  10. You have duplicate points
  11. Some points are just at the origin

Cases like #7 are particularly interesting - This is clearly a bogus scenario. However, you shouldn't just ignore it: You can never expect an API to only ever be passed clean inputs, especially at the scale of Big Tech. What do you return in this scenario? Or do you throw an exception instead? If you throw an exception, what kind? It's important that you have an intelligent view point on what to do in these scenarios.

After we went through edge cases, I asked for an alternative solution. Since coding takes time (especially when you have to both code and explain yourself simultaneously), I didn't ask candidates to code out the 2nd solution. Being able to explain the approach and having some very rough pseudo-code was enough.

So in the "optimal" solution, the run-time is optimized. However, it allocates O(k) amount of space. After candidates coded it up, I would ask them if we could come up with another solution that made trade-offs against the 1st one (hinting that they can sacrifice run-time for better space usage).

The alternative solution with O(1) space is as follows:

  1. Sort the list of points based on distance
  2. Return the k closest ones.

Here's the Java solution:

import java.util.Arrays;
import java.util.ArrayList;
import java.util.List;

class Point {
    float x, y;
    
    public Point(float x, float y) {
        this.x = x;
        this.y = y;
    }
    
    public float distanceFromOrigin() {
        return x * x + y * y;
    }
}

public class KClosestPointsToOrigin {
    public List<Point> kClosestPoints(Point[] points, int k) {
        Arrays.sort(points, (p1, p2) -> Float.compare(p1.distanceFromOrigin(), p2.distanceFromOrigin()));
        
        List<Point> closestPoints = new ArrayList<>();
        for (int i = 0; i < k; i++) {
            closestPoints.add(points[i]);
        }
        
        return closestPoints;
    }
}

Run-time: O(n log n) - Sort is the dominant operation, and it is n log n. Everything else is linear.

Space: O(1) - The only space we allocate is for the return list which doesn't count.

After they figured out the solution, I asked them when they would use this solution over the other one. The answer is when k is extremely large (making the space usage non-trivial) and is close to n (making log n and log k comparable).


Grading

At Meta, there are 3 types of "Yes" votes:

  • Strong Yes - The candidate completely crushed it. I would be surprised if they did too poorly on other parts of the interview. Candidates in this bucket always have superb communication skills.
  • Yes - The candidate did quite well.
  • Weak Yes - The candidate barely passed, and I could easily be convinced by another interviewer in the panel to not extend this candidate an offer.

In order to get "Yes" on this round, the requirements are:

  • Have clean code, particularly with encapsulating points into their own class/struct
  • Get the optimal solution for run-time and have a good understanding of the alternative solution with lower space usage
  • Have proper run-time and space-analysis for both
  • Ask 2-3 clarifying questions before diving into the problem
  • Be able to come up with at least 5 valid edge-case scenarios to test
  • Explain their code throughout the entire process

In order to get "Strong Yes", the candidate needs to have a mix of:

  • Extremely thorough edge-case analysis and problem clarification
  • Thinking forward and having an idea on how to make the algorithm more generic and take in a target point as opposed to being hard-coded against the origin
  • Excellent communication
  • Stellar coding speed + quality (be able to code the 2nd solution, get everything right on first try, clean variables, concise code). Here are some important things to look out for code-wise with this problem:
    • You don't need to take the square root of (a^2 + b^2) - You can just compare this sum directly
    • Be careful squaring numbers. If you're taking in everything as Integer, a coordinate can easily overflow when multiplied against itself

75%+ of candidates failed this relatively straightforward problem. Here's where they messed up to drop into the "No" vote categories:

  • Not creating a separate class/struct for the Point. In general, if you don't encapsulate bundles of data, your code will be extremely messy and you will have a bad time
  • They weren't able to come up with an alternative solution
  • They didn't ask any clarifying questions about the problem
  • They weren't able to do run-time and space analysis properly. Many candidates weren't entirely sure what the heap meant from a run-time perspective
  • They weren't able to come up with meaningful edge cases to test for
  • Their code was messy and/or didn't work
  • Their communication was poor in general and it was hard to understand what they were thinking

How You Can Get Better

If you're looking at all this and thinking that there's no way you would be able to go so in-depth into a DSA problem live in an interview, no worries - We're here to help! Check out these resources to actually improve your interviewing skills instead of just treating LeetCode as a memorization-fest: