The date is set. After successfully crossing the initial stages, you are ready for the grand finale! Now all that remains is to brush up on a few FAQs before the d-day. Following are the top 10 common questions asked in any data science interview.

Data Science Interview

1. Define Data Science

When answering this question, do not stop with the generic definition. Explain further about how you use data science on a personal and professional level. And that will help to establish your ease of usage in the domain.

Also, it is always important to make sure you steer the interview and not the other way around. One method to maintain control is by ending answers with a keyword trigger to the next question, a question for which you know the answer.

For e.g., Make sure to talk about structured and unstructured data when explaining data science, as that will immediately trigger to our next question in the list.

2. What is the difference between structured and unstructured data?

Start with a layman’s definition and then proceed to highlight the technical differences. This will help to showcase your comprehension skills too.

Structured DataUnstructured Data
Highly organized and formattedNo predefined format or organization
It is managed mainly by relational or SQL (Structured Query Language) databasesIt is managed mainly be non-relational or NoSQL databases
Presents an overall view of the trendsPresents an in-depth view and understanding of the data trends
Examples: Names, Dates, Addresses, Credit Card details etc.Examples: Text, Videos, Audios, Social Media usage etc.
Structured Vs Unstructured Data

3. What is selection bias?

One cannot discuss data without understanding the effects of selection bias. Choosing data, groups or people without proper randomization is selection bias.

A famous example of selection bias explanation is the anecdote of Abraham Wald, the statistician who increased the survival rate of the US Aircraft in WW2. And the discussion ensuing that could be an interesting point to connect with your interviewer.

Finally, you could choose any data specific to the company you are interviewing for, and end with a technical demonstration to use statistics to approach and eliminate the selection bias. For e.g., the members of Buzzfeed used statistics to eliminate all selection bias before translating a few of their articles. And that helped them reduce a lot of unnecessary effort and resources.

Selection Bias - Statistics

4. What are Interpolation and Extrapolation?

Extrapolation is a method by which data beyond the known values are estimated.

Interpolation is a method by which data within a known set of values is estimated.

Explain the difference with a proper statistical example that will highlight your technical prowess in data science.

5. What is a Boltzmann Machine?

This is the point at which you understand that the interview is venturing deep into technical territory, and you best be fully prepared.

Boltzmann Machine is a famous Neural Network and is capable of learning representations and solve combination problems. The Boltzmann machine is a famous buzzword in the data science community.

Go into depth while explaining this core data science concept as it will showcase your strength and forte in the field.

Boltzmann Machine

6. What is the Law of Large Numbers?

This law states that when you repeat an experiment for a large number of times, the end average tends to be closer to the expected outcome (true population).

An easily understandable example is the dice probability, say you roll a dice 3 times and get the following outputs:
5, 5, 6, the average is 5.33. But, when you roll the same dice for a larger number of times, the output comes closer to the expected average, 3.5.

Expand your explanation with an example from the company itself, which will help to portray your application skills.

7. What are Eigenvectors and Eigenvalues?

These are concepts that we learnt in high school math, so it would be helpful to have a quick brush up of the same before the interview.

An Eigenvector is a vector which remains unchanged even after linear transformation is applied to it.

The value of the above vector’s change in length is called the Eigenvalue.

After this brief explanation, proceed to draw a proper matrix and explain the mathematical concepts behind Eigenvectors and Eigenvalues, this will reflect your strength in basic Math too.

Eigenvectors and Eigenvalues

8. What is Logistic Regression?

Logistic Regression is an algorithm used to predict a binary outcome when a set of independent variables are given.

It can be explained with a simple example of shooting baskets in the game of basketball. The output will be a simple binary of 0 for missing the shot and 1 for getting the shot. The following scatter plot will give an idea of the result of various shots taken.

Logistic Regression

A simple real life example like this would be enough to establish your understanding on the subject.

9. Explain Gradient Descent

Gradient Descent is an algorithm that minimizes a function by iteratively moving in the steepest negative descent direction till it reaches the bottom.

The three types of gradient descent are:

  • Batch gradient descent
  • Stochastic gradient descent
  • Mini-batch gradient descent

Follow this up with an explanation of the 3 types of descent and a simple example, that will help to clinch the deal.

10. What is A/B testing in data science?

In this method, 2 variations of the same webpage are shown to consumers, and the results are checked to see which variation produced the most conversions, i.e. customers taking action. This method is also called split testing.

Especially useful in the current times, where many websites tend to suffer from high bounce rates, i.e. customers who leave without taking action. This method helps to retain loyal customers by finding out and providing to them what they would like to view.

Complete with a real-time example and your answer is solid.

Split Testing

Armed with these tips, the knowledge from your data science or machine learning courses and confidence, the data science interviews will be a piece of cake! We hope you find the Best Opportunity in Data Science. All the best!

Spread the love

Henry Harvin® is a leading career and competency development organization with focus on value creation. We are into the business of training, skill development, assessment centres, content services and higher education. Our dream is to establish 'Henry Harvin®' in line with the vision of Mr.Henry Dunster 400 years ago which now resonates in the form of a prestigious educational institution respected worldwide.

Write A Comment