36 Relative importance of steps
Again, garbage in = garbage out. You need to know when your data is not sufficient enough to answer your question. More often than not, more data -> better models. Input data is the most important part of model building. Larger datasets when analysed with more simple algorithms can often beat better algorithms when applied on smaller datasets.
36.1 Features Matter
Properties of good features:
- Lead to data compression
- Retain relevant information
- Are created based on expert application knowledge
Common Mistakes:
- Trying to automate feature selection
- Not paying attention to data-specific quirks
- Throwing away information necessarily
36.1.1 Issues to consider
“Best” Machine learning method.
- Interpret able
- Accurate
- Simple
- Scalable
- Fast to predict and fast to train
36.1.2 Prediction is about accuracy trade-offs
- Interpretability vs accuracy
- Speed vs accuracy
- Simplicity vs accuracy
- Scalability vs accuracy