Data Analyst Interview Questions: 40+ Questions with Expert Answer Frameworks for 2026

TL;DR: Data analyst interviews in 2026 test a unique combination of technical SQL skills, statistical thinking, business acumen, and communication ability. This guide covers 40+ real interview questions across five categories — SQL, Python/R, statistics, business cases, and behavioral — with expert answer frameworks and preparation strategies. Whether you are targeting a role at a tech company, a consulting firm, or a startup, the core question patterns are consistent and systematically preparable.
The Data Analyst Interview Structure
Data analyst interview processes vary more than software engineering interviews, but most follow a recognizable pattern. Understanding what to expect at each stage lets you prepare efficiently and reduces day-of anxiety.
Typical Interview Pipeline
- Recruiter screen (15-30 minutes): Basic qualification check, role overview, compensation discussion.
- Technical screen (45-60 minutes): SQL assessment (live or on a platform like HackerRank) and/or a short Python coding exercise. Some companies use a take-home assignment instead.
- Take-home assignment (3-5 hours, 48-72 hour window): A dataset with analytical questions that test SQL, data cleaning, analysis, and presentation. Not all companies include this, but it is increasingly common.
- On-site loop (3-4 interviews, 3-5 hours): A mix of SQL deep-dive, statistics/probability questions, business case discussions, and behavioral questions. Senior roles may include a presentation of past analytical work.
Before you even get to the interview stage, your resume needs to pass automated screening. Run yours through our ATS Resume Checker and reference our data analyst resume guide for section-by-section advice on highlighting your analytical skills effectively.
SQL Interview Questions (12 Questions)
SQL is the non-negotiable core skill for data analysts. Every company, regardless of industry, will test your SQL proficiency. Questions range from basic JOINs to complex window functions and CTEs. Below are 12 questions organized by difficulty, each with the key concept being tested.
Foundational SQL (Questions 1-4)
1. "Write a query to find the top 5 customers by total order value."
Key concept: JOIN between customers and orders tables, GROUP BY with SUM aggregate, ORDER BY DESC with LIMIT. This is a warm-up question that tests basic fluency.
2. "Find all employees who have never placed an expense report."
Key concept: LEFT JOIN with a NULL check on the right table's key column. Some candidates use NOT IN with a subquery, but LEFT JOIN is more performant and demonstrates stronger SQL instincts.
3. "Calculate the month-over-month revenue growth rate for 2025."
Key concept: Date functions (EXTRACT, DATE_TRUNC), GROUP BY month, and either a self-join or LAG window function to compare consecutive months. The growth rate formula is (current - previous) / previous * 100.
4. "Write a query to find duplicate records in a table based on email address."
Key concept: GROUP BY with HAVING COUNT(*) > 1. A follow-up might ask you to delete duplicates while keeping the most recent record, which introduces ROW_NUMBER() and CTEs.
Intermediate SQL (Questions 5-8)
5. "For each product category, find the product with the highest total sales revenue."
Key concept: Window function with RANK() or ROW_NUMBER() partitioned by category, ordered by revenue DESC. Use a CTE or subquery to filter for rank = 1. This tests your ability to combine aggregation with window functions.
6. "Write a query to calculate a 7-day rolling average of daily active users."
Key concept: Window function with AVG() and a ROWS BETWEEN 6 PRECEDING AND CURRENT ROW frame. Requires understanding of window frame specifications, which separates intermediate from beginner SQL skills.
7. "Find users who made a purchase within 7 days of signing up."
Key concept: JOIN between users and purchases tables with a date comparison in the WHERE clause (purchase_date BETWEEN signup_date AND signup_date + INTERVAL '7 days'). Tests date arithmetic and join conditions.
8. "Write a query to calculate user retention by cohort — the percentage of users from each signup month who are still active 30, 60, and 90 days later."
Key concept: Cohort analysis is one of the most commonly tested patterns in analytics interviews. Requires DATE_TRUNC for cohorting, conditional aggregation with CASE WHEN and COUNT, and potentially a CTE to organize the logic cleanly.
Advanced SQL (Questions 9-12)
9. "Write a query to find the median order value. Do not use built-in MEDIAN functions."
Key concept: Use ROW_NUMBER() or NTILE(2) to split the sorted data, then average the middle values. This tests your ability to solve problems creatively when built-in functions are unavailable.
10. "Given a table of user sessions with start and end timestamps, find the maximum number of concurrent sessions at any point in time."
Key concept: Event-based approach: create a union of +1 events (session starts) and -1 events (session ends), order by timestamp, calculate a running sum, and take the maximum. This is a classic advanced SQL problem that tests algorithmic thinking within SQL.
11. "Write a recursive CTE to find all employees in a reporting chain under a given manager."
Key concept: Recursive CTEs with a base case (the manager) and a recursive step (join employees to the CTE on manager_id). Tests understanding of hierarchical data and recursive query patterns.
12. "Given a table of daily stock prices, write a query to find the maximum profit from a single buy-sell transaction for each stock."
Key concept: Window function with MIN() OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) to track the running minimum price, then calculate the maximum difference between current price and running minimum. This bridges SQL and algorithmic thinking.
Python and R Questions (6 Questions)
Most data analyst roles expect proficiency in Python (or R). Interview questions focus on practical data manipulation rather than algorithms. You will be evaluated on pandas fluency, data cleaning techniques, and basic visualization.
1. "How would you handle missing values in a dataset? Walk me through your approach."
Framework: First, assess the pattern of missingness (MCAR, MAR, MNAR). Then choose a strategy: deletion (if missing data is minimal and random), imputation (mean/median for numerical, mode for categorical), forward/backward fill for time series, or model-based imputation for complex patterns. Always document what you did and why.
2. "You have two DataFrames with customer data from different sources. They need to be merged but have inconsistent formatting. Walk me through the process."
Framework: Standardize column names, handle data type mismatches (str to datetime, float to int), normalize text fields (strip whitespace, consistent casing), deduplicate before merging, choose the appropriate merge type (inner, left, outer), and validate the merge by checking row counts and null values.
3. "How would you detect and handle outliers in a dataset?"
Framework: Detection methods include IQR method (values below Q1 - 1.5*IQR or above Q3 + 1.5*IQR), Z-score (values beyond 3 standard deviations), and visual inspection (box plots, scatter plots). Handling depends on context: remove if they are data entry errors, cap/floor (winsorize) if they are extreme but valid, or keep them and use robust statistical methods.
4. "Write a Python function that reads a CSV file, performs basic exploratory data analysis, and returns a summary report."
Expected approach: Use pandas read_csv, then systematically check shape, dtypes, describe() for numerical columns, value_counts() for categorical columns, isnull().sum() for missing values, and correlations for numerical pairs. Demonstrate awareness of data quality checks beyond the basics.
5. "How would you pivot a long-format DataFrame to wide format, and vice versa?"
Expected approach: Use pandas pivot_table() or pivot() for long-to-wide, and melt() for wide-to-long. Be prepared to explain when each format is preferable: long format for analysis and modeling, wide format for presentation and cross-tabulation.
6. "You need to analyze time series data with seasonal patterns. What tools and techniques would you use?"
Framework: Resample data to consistent frequency, decompose into trend, seasonal, and residual components (statsmodels seasonal_decompose), check for stationarity (ADF test), visualize with line plots and autocorrelation plots, and identify appropriate modeling approaches (ARIMA, exponential smoothing) based on the decomposition.
Statistics and Probability Questions (8 Questions)
Statistics questions test whether you can think rigorously about data and make sound decisions under uncertainty. These are often the questions that separate strong candidates from average ones.
1. "Explain A/B testing. How would you design an experiment to test a new homepage layout?"
Framework: Define the hypothesis and primary metric (e.g., conversion rate). Calculate required sample size based on minimum detectable effect, baseline conversion rate, and desired statistical power (typically 80%) and significance level (typically 0.05). Randomize users into control and treatment groups. Run the test for a predetermined duration. Analyze using a two-proportion z-test or chi-square test. Address common pitfalls: peeking at results early, multiple comparisons, and novelty effects.
2. "Your A/B test shows a p-value of 0.03. What does this mean, and what would you recommend?"
Key concept: The p-value is the probability of observing results as extreme as the data, assuming the null hypothesis is true. It does NOT mean there is a 97% chance the treatment works. Discuss practical significance vs statistical significance — a statistically significant result with a tiny effect size may not be worth shipping. Also consider confidence intervals, which provide more information than p-values alone.
3. "What is the difference between Type I and Type II errors? Give a business example of each."
Key concept: Type I (false positive): concluding the new feature improves conversion when it actually does not, leading to a wasted engineering investment. Type II (false negative): concluding the new feature has no effect when it actually does improve conversion, missing a valuable improvement. The trade-off between these errors is controlled by significance level (alpha) and power (1 - beta).
4. "A product manager says that daily active users dropped 15% this week. How would you investigate?"
Framework: Verify the data (check for logging errors, data pipeline issues, or definition changes). Segment the drop (by platform, geography, user cohort, feature, acquisition channel). Check for external factors (holidays, competitor launches, app store changes, outages). Identify the specific point of change (did it happen suddenly or gradually?). Correlate with recent deployments or experiments. Present hypotheses ranked by likelihood with supporting evidence.
5. "Explain the Central Limit Theorem and why it matters for data analysis."
Key concept: The CLT states that the sampling distribution of the sample mean approaches a normal distribution as sample size increases, regardless of the population distribution. This matters because it justifies using normal-distribution-based statistical tests (z-tests, t-tests) on sample means even when the underlying data is not normally distributed, as long as the sample size is sufficiently large (typically n > 30).
6. "What is Simpson's Paradox? Give an example of how it could mislead a business decision."
Key concept: Simpson's Paradox occurs when a trend that appears in several groups reverses when the groups are combined. Classic example: Treatment A has a higher success rate than Treatment B in both mild and severe cases, but Treatment B appears better overall because it was disproportionately applied to mild cases. In business: a marketing channel might look worse overall because it reaches harder-to-convert segments, but it actually performs better within each segment.
7. "How would you determine the right sample size for an A/B test?"
Framework: You need four inputs: baseline metric (e.g., 5% conversion rate), minimum detectable effect (the smallest improvement worth detecting, e.g., 0.5 percentage points), significance level (alpha, typically 0.05), and statistical power (1 - beta, typically 0.80). Use a power analysis formula or calculator. Smaller effect sizes require larger samples. Discuss the business trade-off: running tests longer for smaller detectable effects vs shipping faster with less certainty.
8. "What are confidence intervals and how do they differ from point estimates?"
Key concept: A point estimate is a single value (e.g., the sample mean is 42). A confidence interval provides a range (e.g., we are 95% confident the true mean falls between 38 and 46). Confidence intervals communicate uncertainty, which is essential for decision-making. A narrow interval suggests high precision; a wide interval suggests more data is needed. Always present confidence intervals alongside point estimates in business recommendations.
Business Case Questions (8 Questions)
Business case questions evaluate whether you can translate vague business problems into structured analytical approaches. They test your metric selection, analytical thinking, and communication skills. There is no single right answer — interviewers evaluate your framework and reasoning.
1. "How would you measure the success of a new feature in our product?"
Framework: Define the feature's goal (engagement, conversion, retention). Select a primary metric (e.g., feature adoption rate) and secondary metrics (e.g., time spent in feature, downstream conversion). Establish guardrail metrics to ensure the feature does not harm overall experience (e.g., support ticket volume, page load time). Propose a measurement plan: A/B test if possible, pre/post analysis if not. Define the success threshold before launch.
2. "Our subscription cancellation rate increased from 5% to 8% over the past quarter. How would you investigate and what would you recommend?"
Framework: Segment the increase by plan type, tenure, acquisition source, geography, and feature usage. Identify when the increase started and correlate with product changes, pricing changes, or external factors. Analyze cancellation survey data if available. Build a cohort analysis to determine if newer cohorts are canceling faster or if long-tenured users are leaving. Propose both quick wins (targeted retention offers, cancellation flow improvements) and structural fixes (feature improvements targeting at-risk segments).
3. "What metrics would you use to evaluate a marketplace platform like Airbnb?"
Framework: Marketplace metrics operate on both supply and demand sides. Supply: listing growth rate, host activation rate, average listing quality score. Demand: search-to-book conversion, guest repeat rate, time-to-book. Marketplace health: fill rate, liquidity (searches that find at least one match), take rate, and supply-demand balance by geography. Growth: gross bookings, nights booked, average booking value.
4. "You notice that users who use Feature X have 40% higher retention than those who do not. What do you conclude?"
Key insight: Correlation does not imply causation. Possible explanations: Feature X genuinely drives retention (causal), users who are already highly engaged are more likely to discover Feature X (selection bias), or a confounding variable (e.g., power users who use multiple features) drives both Feature X usage and retention. To establish causality, you would need an A/B test that randomly exposes some users to Feature X. In the absence of an experiment, you can use quasi-experimental methods like propensity score matching or instrumental variables.
5. "Design a dashboard for the executive team that tracks the health of our SaaS business."
Framework: Organize around the SaaS funnel: acquisition (new signups, trial starts, CAC by channel), activation (time to first value, onboarding completion rate), revenue (MRR, ARPU, expansion revenue), retention (monthly and annual churn rate, net revenue retention), and referral (NPS, referral conversion). Limit to 8-12 metrics to avoid dashboard overload. Include trend lines with 13-month history for seasonality context and comparison targets. Highlight metrics that are off-track.
6. "How would you determine the optimal pricing for a new product tier?"
Framework: Start with competitive analysis (what do alternatives cost?), then conduct willingness-to-pay research (Van Westendorp price sensitivity meter or conjoint analysis). Analyze existing customer data: what features do current customers value most? Model demand elasticity using historical data from pricing changes or discounting. Consider value-based pricing (price relative to the value delivered, not cost to produce). Propose A/B testing different price points if feasible, with guardrails on total revenue impact.
7. "We want to reduce customer support ticket volume by 20%. How would you approach this analytically?"
Framework: First, categorize tickets by type, product area, and user segment. Identify the top 5-10 ticket categories by volume using Pareto analysis (typically 20% of issue types drive 80% of volume). For each high-volume category, analyze root cause: is it a product bug, unclear UX, missing documentation, or user error? Prioritize fixes by impact (ticket volume reduction) and effort (engineering cost). Build a model to predict ticket volume reduction from each intervention. Track weekly and measure against the 20% target.
8. "How would you build a model to predict customer churn?"
Framework: Define churn (no activity in 30 days? Subscription cancellation? Define clearly). Identify features: engagement metrics (login frequency, feature usage, session duration), account characteristics (plan type, tenure, company size), support interactions (ticket frequency, sentiment), and billing signals (failed payments, downgrade requests). Choose a model (logistic regression for interpretability, gradient boosting for accuracy). Split data temporally (train on older data, test on newer) to avoid data leakage. Evaluate with precision, recall, and AUC-ROC. Most importantly, build an actionable intervention plan: what do you do when the model flags a user as at-risk?
Behavioral Questions for Data Analysts (6 Questions)
Behavioral questions for data analysts focus on communication, stakeholder management, and analytical decision-making. The ability to translate complex analysis into actionable business recommendations is often the differentiator between good and great analysts.
- "Tell me about a time your analysis led to a business decision. What was the outcome?"
- "Describe a situation where a stakeholder disagreed with your findings. How did you handle it?"
- "Give me an example of when you had to present complex data to a non-technical audience."
- "Tell me about a time you found an error in a dataset that others had missed."
- "Describe a project where you had to work with messy, incomplete data. What was your approach?"
- "Give me an example of when you had to prioritize between multiple analytical requests."
For detailed behavioral interview preparation, including the STAR method, Story Bank technique, and fully worked answer examples, see our behavioral interview questions guide.
Take-Home Assignment Tips
Take-home assignments are where many data analyst candidates either shine or stumble. The technical analysis is important, but presentation and communication often carry equal weight.
Before You Start
- Read all questions before writing any code. Understand the full scope so you can plan your time.
- Set a time limit for yourself (3-5 hours). Companies value efficiency.
- Document your assumptions from the start.
During the Analysis
- Spend 30% of your time on data exploration and cleaning. Document what you found and how you handled it.
- Show your work. Include your code, but make it clean and commented.
- Do not overcomplicate. A clear linear regression with thoughtful feature selection often impresses more than a complex neural network with no explanation.
Presentation
- Lead with insights, not methodology. Your audience cares about "what should we do?" before "how did you calculate it?"
- Include visualizations that support your narrative, not decorative charts.
- End with clear, actionable recommendations.
- Mention limitations and what you would do with more time or data.
Key Takeaways
- SQL is the foundation of every data analyst interview. Master JOINs, window functions, CTEs, and cohort analysis patterns. Practice writing queries without IDE assistance.
- Statistics questions test your ability to think rigorously. Focus on A/B testing design, p-value interpretation, sample size calculation, and the distinction between correlation and causation.
- Business case questions have no single right answer. Interviewers evaluate your framework, metric selection, and communication clarity. Practice structuring answers with a clear hypothesis, segmentation approach, and actionable recommendation.
- Python/R proficiency should focus on pandas data manipulation, data cleaning, and basic visualization — not algorithms or machine learning (unless the role is data science-adjacent).
- Take-home assignments are evaluated on presentation and communication as much as technical correctness. Lead with insights, document assumptions, and keep your code clean.
- Prepare behavioral stories that demonstrate analytical impact: times your analysis drove a business decision, times you communicated complex findings simply, and times you navigated stakeholder disagreements with data.
Prepare for Your Data Analyst Interview with JobJourney
Analytical skills get you the interview. Communication skills get you the offer. JobJourney's AI Interview Coach helps you practice articulating your analytical thinking clearly and confidently, with instant feedback on structure and specificity.
Before you apply, make sure your resume highlights the right skills. Our data analyst resume guide walks through exactly how to present SQL proficiency, tools expertise, and analytical impact. Browse data analyst resume examples for formatting inspiration. Run your resume through our ATS Resume Checker to ensure it passes automated screening, and use our Cover Letter Generator to write a data-driven cover letter.
Data analyst candidates who practice case questions and SQL out loud perform significantly better than those who only prepare on paper. Start practicing today.