Abstract:? On February 15, Sun, Dean and Professor of School of Software, Harbin University of Science and Technology, gave a theme sharing entitled "Application of Big Data in Smart Campus of Colleges and Universities" in the column of APP Micro-lecture Hall in CIO era.
Key words:
Application of CIO era
mini-lecture
On February 15, Sun, Dean and Professor of School of Software, Harbin University of Science and Technology, gave a theme sharing entitled "Application of Big Data in Smart Campus of Colleges and Universities" in the APP micro-class column of CIO era, which was divided into two parts: small data era and big data era, and big data application cases of smart campus of colleges and universities.
First, the era of small data and the era of big data
"Data" means "known" in Latin and can also be understood as "existence". So "data" is "existence" and "big data" is "big existence". Studying big data means studying big existence, that is, studying all substances, all behaviors, all thoughts and human beings themselves.
Data is flooding, changing people's lives and work. Digitalization refers to the process of transforming phenomena into quantitative forms that can be analyzed by indicators, including sorting out and understanding the world and forming storable experiences. Calculation and recording contribute to the generation of data, which is the basis of dataization. Digitization is to convert analog data into binary codes represented by 0 and 1, which is convenient for human beings to better process data by using modern technology. Digitalization is an idea, and digitalization is a means; Data has existed since ancient times, and digitalization is in the ascendant.
The era of small data relies on random sampling, and its principle is to obtain the most information with the least data. But in this way, we can't understand some microscopic details, which is not conducive to the analysis of some specific subclasses. "Uneven is the essence of the world", and the lack of details will affect the exploration and research of the whole natural activities and human activities. In addition, random sampling is based on the theoretical premise of researchers and can only answer selected questions, so it is difficult to consider other questions. In other words, the era of small data faces the problem of "prejudice" with extremely limited information.
The era of big data means the digitalization of the world, which means that the essence of the world is information. The world is not only considered as a combination of events, but also a collection of information and data. This is a profound change in the world outlook: human beings have past experience in understanding and handling events, rather than blindly following the experience. Human beings collect "data" but clearly "what they see, think and get" are "data". We live in a sea of data, and we are data ourselves.
Above, from the era of small data to the era of big data, the following changes and understandings have accompanied or occurred:
1, recognizing that "sample" equals population. Observe, understand and care for the world with a bigger, more comprehensive and comprehensive attitude.
2. The requirements for accuracy of big data are reduced. In the era of small data, because there is little data, the accuracy of data is very high. When there is a large amount of data or a large amount of data is needed, it is inevitable to accept the complexity of data.
3. Be aware that data errors are not an inherent feature of big data, but a practical problem that needs to be dealt with and may exist for a long time.
4. Miscellaneous collections are never equal to mistakes. Hybridization is the normal state of big data and should be the basic state and standard configuration.
5. Big data reveals detailed information that traditional samples cannot reveal. Big data is the basic way of "accurate" processing.
6. In the era of big data, we are no longer keen on the pursuit of causality, but try to explore the relationship between different things, and on this basis, find observable related objects to predict. And prediction is the core of big data application.
7. After explaining the correlation, we can analyze the causal relationship. However, it must be noted that causality is only a special form of association. In the era of big data, causality is no longer the basis for explaining the world; Relevance is ubiquitous, which is easier to find in the era of big data and can guide practice more efficiently. Even with the development of big data, the previous causality may be falsified or regarded as correlation.
1 point is the transformation of epistemology by big data; Points 2-5 reflect the completely different data requirements between the big data era and the traditional era; Points 6 and 7 subvert the priority of logical relationship between data. From a practical point of view, point 1 can be used as a premise, points 2-5 can be used as a criterion for data collection and processing, and points 6 and 7 can be used as a guiding direction for data interpretation.
Second, the application of big data in smart campus of colleges and universities
In 20 15, the state put forward and formulated the "internet plus" action plan, and upgraded "internet plus" to a national strategy. The proposal of "internet plus" will certainly add new connotation and inject new impetus into the construction of smart campus in colleges and universities. With the help of "internet plus", we will accelerate the upgrading of digital campus to smart campus, make full use of a series of new technologies, concepts and models such as cloud computing, Internet of Things, mobile Internet and big data, build a brand-new smart campus, strongly support the future development strategy of the school, promote the innovation of personnel training and evaluation methods, improve the level of school affairs governance, and provide multi-level personalized services and intelligent management decisions. The core connotation of smart campus construction in colleges and universities can be summarized as "comprehensive environmental awareness, seamless network interoperability, flexible cloud ecology, massive data support, open learning environment, personalized teacher-student service, intelligent management decision-making, and efficient school affairs governance".
In the process of informatization in colleges and universities, various structured and unstructured data are produced, including teaching management data, teaching resource data and student information data. From the principles and strategies of running a university to the daily consumption of students, the data are diverse and the types are complex. Using big data technology to collect and analyze these data and turn them into available resources for university management and service will play a very important role in the construction of smart campus.
The following examples illustrate the application of big data technology in smart campus.
1, showing the comprehensive situation of the school
For school administrators, through the comprehensive analysis and display of the school situation, we can intuitively understand and compare the students (undergraduates and postgraduates), courses, scientific research achievements, scholarships, employment, teaching staff, teacher distribution, cadres, furniture, assets, housing, ranking, consumption and other aspects of the school. Combined with the changing law of data over the years, it can provide basis for auxiliary decision-making. The correlation of data between different systems may provide new ideas for managers to make decisions.
The comprehensive school situation display mainly includes basic data analysis and behavior data analysis and display.
Basic data analysis: such as enrollment data analysis, student data analysis, graduation data analysis, teacher data analysis, course data analysis, achievement data analysis, employment data analysis, university assets data analysis, etc.
Behavioral data analysis: the dining situation in the school canteen, the consumption behavior of one card, the online behavior, the borrowing behavior of books, the correlation analysis between library use time, online time/flow and performance, the characterization analysis and early warning of key people, etc.
For example:
(1) Statistics of employment information in colleges and universities. From the multi-dimensional statistical analysis of graduates' graduation destination, employment unit, employment area, employment industry and employment salary, this paper presents the employment situation of colleges and universities in an all-round way, which provides support for the employment office of colleges and universities to discover the law of students' employment and provide targeted employment guidance for students.
(2) Statistical analysis of teaching information. For school leaders, it presents the ranking of popular courses in colleges and universities, statistics of courses offered by various departments, statistical analysis of students' grades, and analysis of failure rate, which comprehensively presents the distribution of students' study and grades during their school days, providing support for guiding colleges and universities to offer courses and improve students' grades.
(c) Statistical analysis of a card. It shows the overall consumption ability and preferences of college students, and provides support for logistics departments to understand students' catering and shopping preferences and improve service level in a targeted manner.
(four) the consumption power of students. According to the statistics of students' consumption ability in this respect, we can look at the statistics of students' consumption amount and consumption times in a certain period of time in detail.
(e) Analysis of the use of school network and statistics of students' online behavior. Through the statistical analysis of students' online addresses, combined with their personal basic information data, we can count the frequency of different categories of people using a certain type of website according to different dimensions such as gender, hometown and department. If the recorded logs are detailed enough, students' preferences or biases in online consumption can even be counted, which is also an important reference for logistics or engineering departments.
The related technologies applied include: data association analysis, multi-source data integration, massive log data processing, benchmark testing, index system establishment, AgileBI and full-text retrieval engine.
2. Analysis of the use of public resources
For colleges and universities, public resources such as canteens, stadiums, classrooms, libraries and school hospitals are limited, and teachers and students do not have a good way to understand the service capabilities of these resources, which leads to frequent queuing and crowding, which brings bad experiences to teachers and students' study and life. With the advancement of school informatization, the management information systems of various departments have been gradually built and put into use; With the development of technology, especially the emergence of Internet of Things and intelligent sensing devices, it is possible to provide intelligent services in digital campus.
The data comes from credit card consumption, one-card access control, wireless network, campus security video surveillance and so on.
(a) the density of people in canteens and bathhouses, the density of people who plan to eat in canteens and public bathhouses at various times, and the dining hobbies and habits of all kinds of people (grade, place of origin, professional title, etc.). ).
(b) Classroom usage, personnel density, classroom usage in each time period, number of classrooms, etc. ; Attendance based on wireless network.
(c) Usage and personnel density of conference venues and sports venues. Provide teachers and students with the availability of meeting venues and the use of sports venues (whether there are classes or not, etc.). ) and the release of personnel density.
(e) Publish the usage of library seats and the density of people, and provide information on the vacancy of library seats and the number of library people.
Distribution of personnel density in schools. According to the school wireless network data and security video monitoring information, identify the thermal distribution map of school personnel.
Related technologies applied include: data association analysis, data mining (cluster analysis), massive log data processing, multi-source data integration (integration of log data and structured data), high-speed memory database and distributed full-text retrieval engine.
3. Personal data report
Provide personalized data services for campus teachers and students, and show their study, consumption, life and health status on campus.
Personal behavior habits can help students better understand themselves and their differences with others through rigorous data analysis, and help campus teachers and students feel the humanistic care and changes brought about by informationization.
The data comes from credit card consumption, library access control, book lending system, campus network system, stadium access control and so on.
(a) campus card bills and consumption habits analysis report;
(b) An analysis report on the frequency, duration and borrowing habits of the library;
(c) Analysis report on online billing and online habits;
Term report on physical exercise.
Through the official micro-signals and apps in colleges and universities, the mobile Internet era is convenient for users to read, share and spread in time.
Provide personalized data services for campus teachers and students, show their personal behaviors and habits in study, consumption, life and health on campus, help students to know more about themselves and their differences with others from rigorous data analysis, and help campus teachers and students feel the humanistic care and changes brought about by informationization.
The related technologies applied include: data association analysis, data mining (user portrait), massive log data processing and multi-source data integration.
4. Analysis on the utilization efficiency of library electronic periodical resources.
Colleges and universities spend money to buy famous periodicals every year to provide convenient literature retrieval and download services for teachers and students. The library's utilization of electronic periodical resources and the differences of different disciplines' preferences for different electronic periodical resources are the contents that the library urgently needs to know. By analyzing the big data of university users' periodical literature retrieval records, the paper periodical purchasing scheme is optimized, so that the library can purchase the resources that teachers and students need more (traditional paper+electronic resources) and improve the existing purchasing efficiency.
The usual practice of schools is to purchase access statistics of electronic periodical resources from data providers (such as Wanfang and CNKI). However, this method is based on the overall access data of schools and cannot be based on the detailed analysis of users' access. Therefore, it is impossible to obtain periodical access analysis based on different disciplines, different colleges and majors and different teachers' levels, and it is also impossible to understand the horizontal comparative analysis of the use of different resource libraries. It is also a very important direction to mine the search keywords of teachers and students, but the traditional methods can't understand the specific information such as the search preferences and search hotspots of teachers and students in schools.
The exported network log data records the behavior of teachers and students visiting the electronic journal resource library. By using big data technology to process the exported URL logs and other data and extract key information, the comprehensive analysis of the library's electronic resources usage and crowd analysis will be realized, which will provide assistance for the library's procurement decision.
The data comes from the list of electronic periodical resources purchased by the library, the URL logs of teachers and students surfing the Internet, and the identity authentication of teachers and students surfing the Internet.
Related technologies applied include: data association analysis, massive log data processing, multi-source data integration (integration of log data and structured data), and distributed full-text retrieval engine.
5. Campus public opinion monitoring
Under the tide of mobile Internet, both positive and negative information will spread at a faster speed. The reputation of a school has a great influence on enrollment, employment and evaluation. With the popularity of mobile Internet and social media, colleges and universities pay more and more attention to the social evaluation of schools. At present, some colleges and universities will use Internet data to monitor the reputation of the school, and through the means of big data, real-time monitor the school-related news, dissemination topics and user feedback on the new Internet media to understand the public opinion, reputation and influence of the school.
The related technologies applied include: text mining, semantic analysis (positive and negative judgment), semantic similarity calculation, elastic crawler engine and distributed full-text retrieval engine.
What I know about the application of big data in smart campus also includes statistical analysis of teaching information. Through the sample analysis of course knowledge structure, combined with the educational process and the distribution of students' academic achievements, the rationality of course teaching process and the degree of achievement in engineering education certification are verified, and the rationality of course setting is comprehensively analyzed.
Another example is the analysis of school asset management information. With the help of the asset management information platform, the collection and analysis of campus infrastructure, teaching experimental equipment and campus communication network equipment are realized, which provides data support for the direction of school infrastructure construction, maintenance of teaching experimental equipment and upgrading of campus network communication equipment.
"Intelligent Grid Student Management Platform" is based on the achievements of information technology and digital campus construction in colleges and universities, and takes community grid, management grid and education grid as carriers to build an overall framework for comprehensive management of student development and optimization of service process. Actively tutoring students' life, study and ideological development process in the whole life cycle, forming a new mode of coordinated and sustainable intelligent management and guiding development, which has the functions of student portrait, early warning of students' behavior (school status, study, consumption, physical and mental health), analysis of students' family economic situation, comprehensive data retrieval of students, and analysis of students' groups. , and can assist students' safety education management, students' mental health counseling, and accurate funding.
There is too much communication today because of the time. thank you