Looking at the confusion of the younger brothers and sisters who just entered the laboratory, although I had some scattered conversations with them, they were not systematic enough. Therefore, according to my own experience, I give some suggestions on learning data mining, which can be analyzed in detail according to my own situation as a reference. I hope it will be deeper and further on the basis of the last session.
I. the foundation of graduate school and data mining
First of all, let's introduce some issues that everyone cares about, including what our group's research direction is, paper-related issues, big data and work-related issues, Shanghai hukou and so on.
1. What is our research direction?
The research direction of our group is data mining, and the research direction of this paper is recommendation algorithm. Pay attention to the big research direction, the difference and connection between research direction and paper work direction.
2. Paper related issues
Graduate students will inevitably think about a question, what is the significance of graduate school? Personally, I think the greatest significance of graduate school is to exercise my systematic and rigorous analytical thinking ability. After the tutor gives the research direction of the thesis, how to establish a more detailed research direction, how to retrieve information, how to read English papers, how to put forward their own innovations, how to do experiments, how to write papers, how to modify papers, how to submit papers, how to quit classes, how to make oral reports in English in international conferences, and how to communicate with peers are all issues that need to be considered by themselves.
3. Big data and work-related issues
Does data mining belong to big data major? Of course it belongs. It is still ideal to find a job with big data now. What courses are the key? I have recommended many books to you before, but the effect is just the opposite, because there are too many books to finish and I don't know the order of reading. I just dabbled in it, and I didn't finish the last book of graduate students.
(1) minimum guarantee book
No matter what you do in the future, it is necessary to master a programming language, a database, data structures and algorithms.
High performance MySQL
Data structure and algorithm analysis: described in Java language
Algorithm: /subject/ 19952400/
(2)Python and machine learning
Collective intelligent programming
Data Mining and Analysis of Social Networking Sites
Data Mining: Concept and Technology
Python official document: /javase/8/docs/api/
Java EE:/javaee/6/api/
(4)Hadoop and Spark books
Big Data Logs: Architecture and Algorithms
Hadoop authoritative guide
Big data triggers enterprise-level battles.
Scala programming
Hadoop official website: http://spark.apache.org/
Spark official website: http://spark.apache.org/
Scala official website: http://www.scala-lang.org/
Description: Look for the goal, be patient and move forward step by step. After reading the books recommended above, data mining is basically an introduction.
4. Shanghai hukou problem
Shanghai hukou belongs to the points system. If you want to get it in school, then the only way is to win the Parameter Annual Graduate Data Modeling Competition. The winning percentage is still very high. In fact, if you learn Python well, buy a book on mathematical modeling, read several award-winning papers in recent years, study a topic during the competition and write a good paper, you will basically win the prize.
2. Advanced data mining
Data mining involves many directions, but it is usually studied from three directions: mathematical statistics, database and data warehouse, and machine learning. When I want to learn a direction, what I want to do most is to let others make a list of books for me. Because I will also make a list of books for you to study slowly.
1. Mathematical statistics
Pure mathematics (1): complex variable function, real variable function, functional analysis, topology, integral transformation, differential manifold, ordinary differential equation, partial differential equation, etc.
(2) Applied mathematics: discrete mathematics (set, logic, combination, algebra, graph theory, number theory), concrete mathematics, tensor analysis, numerical calculation, matrix theory, approximation theory, operational research, convex optimization, wavelet transform, time series analysis, etc.
(3) Probability: probability theory, measure theory, stochastic process, etc.
(4) Statistics: statistics, multivariate statistics, Bayesian statistics, statistical simulation, nonparametric statistics, parametric statistics, etc.
2. Databases and data warehouses
The concept of database system
Database system implementation
data warehouse
Distributed system: concept and design
3. Machine learning
Communication principle; Data mining; Machine learning; Statistical learning; Natural language processing; Information retrieval; Pattern recognition; Artificial intelligence; Graphic images; Machine vision; Speech recognition; Robots and so on. You can finish reading all the classic books in this field and add them later. )
4. Other books
( 1)Linux
(2) network principle, compilation principle and combination principle,
(3)JVM
(4) Unified modeling language
(5) Software engineering
(6) Design mode
(7) Cloud computing and Docker
(8) Parallel computing
(9) Demand analysis
Three. Learning and methods
As a software engineer, you need to master the following tools:
(1) blog
In addition to learning, we should also think about summing up, and serialize the memories that we have not forgotten into words and record them in the blog.
(2) Language
Common languages of big data are Java, Scala and Python. If you must choose to be proficient in a language, then choose Scala yourself and study JVM in depth. (3) Developing tools
I choose IntelliJ IDEA for Java and Scala, and Eclipse for Python.
(4)GitHub
Adhere to daily programming and actively participate in open source projects.
(5)Linux
Ubuntu 12.04 LTS is often used in work.
Due to time, the above summary is still rough, it is the first version, and will continue to be further summarized and improved later.