How would you design Instagram's system architecture?
Why Interviewers Ask This
Advanced questions like this reveal whether a candidate has internalized System Design deeply enough to make architectural decisions. Strong answers demonstrate both breadth and depth of experience.
Answer
Instagram's photo sharing system needs to handle massive media upload/storage, feeds, and social graph. Scale: 1B+ users, 100M photos/day uploaded, 500M daily active users, 4.2B likes/day. Photo upload flow: client compresses image → HTTP POST to upload service → service generates a photo ID (Snowflake) → uploads original to S3 → triggers async transcoding pipeline (multiple resolutions: thumbnail, medium, display, full) → generated versions stored in S3 → metadata (photo_id, user_id, caption, location, timestamp, S3 keys) stored in PostgreSQL/Cassandra → CDN pre-warms popular photos. Feed generation: hybrid push-pull. When a user posts, fan out to followers (push model). Users with millions of followers (celebrities) use pull model on read. Celery + Redis for async fan-out. Redis sorted sets store each user's feed (photo_id → timestamp). Read feed: ZREVRANGE feed:{user_id} 0 49 for first 50. Social graph (follows): store in Cassandra: (follower_id, followee_id, created_at); (followee_id, follower_id, created_at). Query followees or followers efficiently. Photo metadata: PostgreSQL (sharded) → PostgreSQL handles relational queries; for high-read, cache in Redis. Like counts: Redis counter per photo (INCR likes:{photo_id}) — Redis sorted set for most-liked. Async write-back to persistent storage. Discovery/Explore: machine learning models, Elasticsearch for hashtag/location search. CDN: photos served via CDN (Akamai/CloudFront). URL: cdnurl/photo_id/resolution.jpg. Instagram's actual tech: Django (Python), PostgreSQL (with CitusDB sharding), Cassandra, Redis, S3, CloudFront.
Common Mistake
Candidates often give textbook answers here. Interviewers are more impressed when you relate the concept to a specific problem you solved in a real System Design project.
More System Design Questions
View all →- Advanced How would you design a distributed file system like HDFS?
- Advanced How would you design a video streaming service like Netflix?
- Advanced What is the consistent hashing with virtual nodes in detail?
- Advanced How would you design a global distributed database like Google Spanner?
- Advanced What is the difference between optimistic and pessimistic locking in distributed systems?