What is stratified cross-validation and when should we use it?
Why do ensembles typically have higher scores than individual models?
What is regularization? Can you give some examples of regularization techniques?
What is the curse of dimensionality? Can you list some ways to deal with it?
What is an imbalanced dataset? Can you list some ways to deal with it?
Can you explain the differences between supervised, unsupervised, and reinforcement learning?
What is data augmentation? Can you give some examples?
What are convolutional networks? Where can we use them?
What will you do if removing missing values from a dataset cause bias?
How can you reduce bias in a given data set?
How will you impute missing information in a dataset?
Estimate the probability of a disease in a particular city given that the probability of the disease on a national level is low.
How will inspect missing data and when are they important for your analysis?
How will you decide whether a customer will buy a product today or not given the income of the customer, location where the customer lives, profession and gender? Define a machine learning algorithm for this.
From a long sorted list and a short 4 element sorted list, which algorithm will you use to search the long sorted list for 4 elements.