2 minute read

Top K Frequent Elements

Given an integer array nums, extract the top k most frequent elements.
This post covers the journey from a naive loop approach to achieving $O(N)$ time complexity.


First Approach: The Limits of Loops and Conditionals

My initial idea was to compute frequencies with Counter, then iterate and find values satisfying a specific condition (n > k).

Trial-and-Error Code (Logic Error)

def topKFrequent(self, nums: List[int], k: int) -> List[int]:
    answer = []
    count = Counter(nums)
    for i, n in count:  # Tuple unpacking error
        if n > k:  # Logic error: checking 'frequency > k' instead of 'top k'
            answer.append(count[i])
    return count

Key Takeaway

  • When iterating over a Counter object, use .items() to get (number, frequency) pairs.
  • To find the “top $k$”, simple comparison is not enough — Sorting or a Priority Queue (Heap) is required.

Second Approach: Sorting (O(N log N))

Sort all data in descending order by frequency, then slice the top $k$ elements.

Core Principle

  • Extract frequency data via Counter(nums).items().
  • Use the key parameter of sorted() to sort by frequency.
from collections import Counter

def topKFrequent(self, nums: List[int], k: int) -> List[int]:
    count = Counter(nums)
    # Sort in descending order by frequency (x[1])
    sorted_counts = sorted(count.items(), key=lambda x: x[1], reverse=True)
    
    # Extract only the 'numbers' of the top k using list comprehension
    return [x[0] for x in sorted_counts[:k]]

Third Approach: Heap Optimization (O(N log k))

Instead of sorting all elements, maintain a heap of size $k$ and pick only the top elements.
This is efficient when the dataset is large and $k$ is relatively small.

Pythonic Way (heapq)

Using Python’s heapq.nlargest, the heap algorithm is executed internally,
allowing a clean one-liner solution.

import heapq
from collections import Counter

def topKFrequent(self, nums: List[int], k: int) -> List[int]:
    count = Counter(nums)
    # Extract top k elements by frequency
    return heapq.nlargest(k, count.keys(), key=count.get)

Final Optimization: Bucket Sort (O(N))

Going beyond the $O(N \log N)$ limit of comparison-based sorting,
this approach uses frequency itself as an index to solve the problem in $O(N)$.

The Art of Classification

  • The index becomes the ‘frequency’, and the value at that index is ‘a list of numbers with that frequency’.
  • Scan in reverse from the largest index (maximum frequency) and collect $k$ elements.

Final Optimized Code

from collections import Counter

def topKFrequent(self, nums: List[int], k: int) -> List[int]:
    count = Counter(nums)
    # Create buckets using frequency as index (max frequency is len(nums))
    bucket = [[] for _ in range(len(nums) + 1)]
    
    for num, freq in count.items():
        bucket[freq].append(num)
        
    answer = []
    # Scan from highest frequency (back of the bucket) and fill answer
    for i in range(len(bucket) - 1, 0, -1):
        for n in bucket[i]:
            answer.append(n)
            if len(answer) == k:
                return answer

Summary & Reflection

Choosing the Right Data Structure

  • Sorting ($O(N \log N)$): The most intuitive implementation, but average performance.
  • Heap ($O(N \log k)$): Memory-efficient and optimized for maintaining the top $k$ elements.
  • Bucket Sort ($O(N)$): Theoretically the fastest when the frequency range is bounded (within array length).

Conclusion

What started as a simple loop evolved into the powerful Bucket Sort algorithm.
This was an insightful process that demonstrated how
“placement by index” instead of “comparison-based sorting”
can dramatically boost performance.

Leave a comment