What is the $lookup stage in MongoDB aggregation?
Why Interviewers Ask This
This question tests conceptual clarity. Interviewers want to hear a precise, confident definition before moving to more complex MongoDB topics. It also reveals how well you can explain technical ideas to non-experts.
Answer
$lookup performs a left outer join between two collections in the same database. It adds a new array field to each document containing the matching documents from the joined collection. Syntax: { $lookup: { from: "orders", localField: "_id", foreignField: "userId", as: "userOrders" } }. This adds a userOrders array to each user document containing all their orders. Pipeline-based $lookup (correlated subquery): more flexible, allows conditions beyond simple equality: { $lookup: { from: "orders", let: { userId: "$_id" }, pipeline: [ { $match: { $expr: { $eq: ["$userId", "$$userId"] } } }, { $match: { status: "completed" } }, { $sort: { createdAt: -1 } }, { $limit: 5 } ], as: "recentOrders" } }. Handling joined results: use $unwind after $lookup if you need to flatten the array (treat each matched document as a separate document in the pipeline). $lookup limitations: only works within the same database; can be expensive on large collections without proper indexes — ensure the foreignField is indexed. Sharded collections: $lookup on sharded collections requires the from collection to be on the same shard or unsharded. When to embed vs reference: use $lookup when the data is accessed separately more than together, the related data is large/frequently updated, or the relationship is many-to-many. Embed when data is always accessed together, the sub-document is small, and the parent-child relationship is one-to-few.
Pro Tip
Before answering, structure your response: one-line definition → real-world analogy → concrete example from a project. This makes even complex MongoDB answers easy to follow.