Though there has been significant amount of research and development in the field of machine learning, every new experiment needs something new off the shelf. Developing successful machine learning applications requires a substantial amount of ‘black art’ that is difficult to find in any of the textbooks.
Machine learning has three major components:
Representation – a classifier represented in some formal language that computer can understand and handle.
Evaluation – an evaluation function that is needed to distinguish good classifiers from bad ones.
Optimization – which is key to the efficiency of the learner.
Machine learning comes with several ‘to-be-followed’ guidelines. List below indicates them along with necessary explanation.
It’s generalization that counts –
Doing well on the training set is easy. It is important to generalize beyond the examples in the training set and doing well even on it.
Data alone is not enough –
machine learning is not magic. It cannot get something from nothing. Learners generally combine knowledge with data to grow programs.
Over-fitting has many faces –
a classifier could be 100% accurate on the training data but only 50% accurate on the test data. When knowledge and data are not sufficient, we just hallucinate the classifier which might do no good.
Intuition fails in high dimension –
algorithms that work in low dimensions become intractable when the input is high dimensional. Shapes of one type approximated to another do not work in machine learning.
Theoretical guarantees are not what they seem –
There is always a guarantee on results of induction if we are willing to settle for probabilistic guarantees. The main role of theoretical guarantee in machine learning is not as a criterion for practical decisions, but as a source of understanding and driving force for algorithm design. The close interplay of theory and practice is one of the main reasons machine leaning has made so much progress over years.
Feature engineering is the key –
learning is easy if we have many independent features that each correlate well with the class. The amount of trial and error that can go into feature design is also a considerable criterion. Feature engineering is more difficult because it is domain specific while learners can be largely general purpose. It is necessary to automate more and more of feature engineering process.
More data beats a cleverer algorithm –
Design a better learning algorithm or gather more data. A dumb algorithm with lots and lots of data beats a clever one with modest amount of it. There is always a time limit to process huge amount of data and the memory requirements. Learners can be divided into two types – those whose representation has a fixed size and those whose representation can grow with data.
Learn many models not just one –
Instead of selecting the best variation found, combine many variations, the results are better, often much better and at little extra effort for the user.
Simplicity does not imply accuracy –
simpler hypothesis should be preferred because simplicity is a virtue in its own right, not because of a hypothetical connection with accuracy.
Representation does not imply learnable –
Just because a function can be represented, does not mean it can be learned.
Correlation does not imply causation –
Machine learning is usually applied to observational data, where the predictive variables are not under control of the learner as opposed to experimental data, where they are. Learning algorithms can potentially extract causal information from observational data, but their applicability is rather restricted. Causality is only a convenient function.
This post is the summary of the paper –
Pedro Domingos, Communications of the ACM, Vol. 55 No. 10, Pages 78-87