🐍 Python Advanced

What is Pandas in Python?

Why Interviewers Ask This

This is a differentiating question used for senior and lead roles. Interviewers want to see if you can explain not just what happens, but why — and what the trade-offs are in different approaches.

Answer

Pandas is Python's primary data manipulation library, built on NumPy. It provides two main data structures: Series (1D labeled array) and DataFrame (2D labeled table — like a spreadsheet or SQL table). Load CSV: df = pd.read_csv("data.csv"). Inspect: df.head(), df.info(), df.describe(). Select: df["col"] (Series), df[["col1", "col2"]] (DataFrame), df.loc[row, col] (label-based), df.iloc[0, 1] (position-based). Filter: df[df["age"] > 25]. Group: df.groupby("city")["salary"].mean(). Merge: pd.merge(df1, df2, on="id"). Handle missing: df.dropna(), df.fillna(0). Apply functions: df["name"].apply(str.upper). Pandas is indispensable for data cleaning, exploration, and transformation in data science workflows.

Pro Tip

Demonstrate both theoretical understanding and practical experience. Say what it is, then give an example of how you actually used it in a Python codebase.