In this article, I want to solve a very simple question: What is machine learning?
You may be interested in or know something about machine learning. If one day you talk about machine learning with friends or colleagues, then someone may ask you, "What is machine learning?" . Then, the goal of this paper is to tell you some reference definitions and a ready-made, interesting and easy-to-remember definition. We will start with understanding the standard definition of machine learning in authoritative books in this field, and end with a programmer's definition of machine learning and a ready-made joke when asked what machine learning is. The definition of authority begins with reading machine learning reference books commonly used in four courses in universities. These are our authoritative definitions, which laid the foundation for us to think more deeply about this subject. I chose these four books to emphasize some useful and diverse viewpoints in this field. Experience tells us that this field does include many methods, so choosing a suitable angle is the key to making progress. Machine Learning defined by Mitchell Tom Mitchell gave a definition in the preface of his book Machine Learning (Chinese version: Computer Science Series: Machine Learning): "The discipline of machine learning focuses on how computer programs can automatically improve their performance with the accumulation of experience." I like this simple and clear definition, which is the basis of the programmer definition we got at the end of the post. Pay attention to the citation of computer programs and the word "automatic improvement" mentioned. Writing programs to improve yourself is a provocation! In his introduction, he repeatedly mentioned a short formal system: "For a certain kind of task T and performance measurement P, if the performance of a computer program measured by P on T is improved by experience E, then we say that the computer program is learning from experience E." Don't let the definition of terms scare you away, it is a very useful formal system. We use this formal system as a template to list experience E, task T and performance indicator P at the top of a table, and list complex problems that are not too vague. It can be used as a design tool to help us think clearly about what data to collect (e), what decisions to make (t) and how to evaluate the results (p). This is why we take it as a standard definition. Please remember it. The Essentials of Statistical Learning: Data Mining, Impact and Prediction, written by three Stanford statisticians, describes itself as a statistical framework for organizations to explore their fields. In the preface, it writes: "Many fields produce a lot of data, and the job of statisticians is to make all these data meaningful: extract important patterns and trends and understand" what the data is saying ". We call it learning from data. " I understand that the job of statisticians is to use statistical tools to interpret data in a contextual environment. The author seems to want to use all machine learning fields as auxiliary tools. Interestingly, they chose to add "data mining" to the title of the book. Statisticians learn from data, but so can software. We learn from what software has learned, and from the decisions made and the results obtained by various machine learning methods. Bishop wrote in the preface of Pattern Recognition and Machine Learning: "Pattern recognition originated from engineering, while machine learning originated from computer science. However, these fields can be regarded as two aspects of the same field. " After reading these, you will have the impression that Bishop came to this field from the perspective of engineering, and later adopted the same method to study and use computer science. This is a mature method that we should follow. More broadly, no matter what field a method claims to be in, if it can make us get closer insights or results through learning data, so as to better meet our needs, then we call it machine learning. Algorithm perspective Marshland adopted Mitchell's definition of machine learning in Machine Learning: An Algorithm Perspective. In the preface, he provided a powerful explanation that prompted him to write this book: "One of the most interesting features of machine learning is that it lies between several different theoretical disciplines, mainly computer science, statistics, mathematics and engineering. Machine learning is usually studied as a part of artificial intelligence, which places it firmly in computer science. Understanding why these algorithms can work effectively requires a certain degree of statistical and mathematical mind, which is often lacking among undergraduate students majoring in computer science. " This is profound and beneficial. First of all, he emphasized the multidisciplinary nature of this field. Although we got this feeling from the above definition, he further emphasized this point for us. Machine learning comes from all kinds of information science. Secondly, he emphasized the danger of sticking to a given angle too much. In particular, algorithm engineer avoided the situation of the mathematical internal operation principle of the method. Undoubtedly, in the opposite case, statisticians are also restricted to avoid practical problems in implementation and deployment. Wayne illustrated 20 10 In September, Drew Conway created a beautiful venn diagram, and I found this picture very helpful. In his explanation, he commented that machine learning is the sum of hacker technology, mathematics and statistical knowledge. Data Science venn diagram. The signature of Drew Conway is a non-commercial intellectual signature. He also described the dangerous area as the sum of hacking skills and expertise. Here, he means that those who know enough are dangerous. They can access and build data, understand fields, run methods and give results, but they don't understand the meaning of the results. I think that's what Masland might imply. Programmer definition
Now let's talk about what programmers need to do to break down all these problems into specific details. First, let's look at the complex problems that resist our decomposition and program solution. This constitutes the driving force of machine learning. Then, we need to find a definition suitable for programmers, a definition that we can use whenever other programmers ask us what machine learning is.
Complex problems As a programmer, you will eventually encounter many types of problems that stubbornly resist logical and process solutions. I mean, for many kinds of problems, it is neither feasible nor cost-effective to sit down and write all the conditional statements needed to solve them. I heard your programmer's brain shouting "blasphemy". It's true. Take the problem of identifying spam every day as an example. Whenever machine learning is introduced, it is an example that has been used. When an email arrives, how will you write a program to filter spam and decide whether to put it in the trash can or inbox? You may start to collect some examples and study them in depth. You will look for patterns unique to spam and non-spam, and you will also consider abstracting these patterns so that your heuristic learning can be applied to new cases in the future. You will ignore weird emails that will never be seen, and you will be able to easily improve the accuracy and make special procedures for border conditions. You will browse the email repeatedly and abstract new patterns to improve your decisions. There is a machine learning algorithm, and all these things are done by programmers rather than computers. This hard-coded system, which is derived manually, will have the same ability as programmers to extract rules from data and implement them. This can be done, but it will consume too many resources and it will be a constant nightmare. Machine learning In the above example, I am sure that the part of the programmer's brain that you are determined to seek automation can see opportunities to automate and optimize the process of extracting patterns from examples. Machine learning method is such an automated process. In the example of spam/non-spam, experience E is the mail we collected, and task T is a decision-making problem (also called classification). It is necessary to mark whether each mail is spam or not and put it in the correct folder. Our performance index will be a percentage of 0%- 100%, similar to accuracy (the number of correct decisions divided by the total number of decisions multiplied by 100). The process of preparing such a decision-making program is usually called training, and the collected examples are called training sets. The program is a model, a model of the problem of separating non-spam from spam. As programmers, we like this term. The model has a specific state and needs to be maintained. Training is a process that is only performed once and can be re-run as needed. Classification is a task that needs to be completed. These are all meaningful to us. We can see that the terms used in the above definition are not suitable for programmers. Technically, all the programs we write are automatic, so it is meaningless to say that machine learning is automatic learning. A ready-made joke Well, let's see if we can use these fragments to construct a programmer's definition of machine learning. Machine learning is a training process based on data model, and finally a performance-oriented decision is obtained. "Training a model" represents a training example, "model" represents the state obtained through empirical learning, "inducing a decision" represents the ability to make decisions based on input, and future decisions need an invisible expected input. Finally, "performance-oriented measurement" refers to the target requirements and directional characteristics of the prepared model. I am not a poet. Can you give a more accurate and concise definition to the programmer of machine learning? Please leave your comments.