1 Introduction
Earth science, including geology, geography, atmosphere, ocean and many other disciplines, is a part of natural science that directly faces the relationship between human beings and nature. Earth science is not only a channel to understand the solid, liquid and gas layers of the earth and their relationship with human beings, but also directly serves the social economy in terms of resources, energy, environment, disaster prevention and mitigation through scientific and technological practice activities in mineral exploration, weather forecasting, hydrology, surveying and mapping, earthquakes and other disciplines.
Satellite communication technology, network technology and computer technology have changed the traditional research mode of earth science. With the development of remote sensing, information technology and various real-time observation and analysis technologies, earth science has entered a new stage covering the whole world and crossing the circle, that is, earth system science, which has advanced from describing local phenomena to exploring planetary mechanisms and obtained global and systematic information.
In application, the role of earth science is almost everywhere, from extractive industry, industry and agriculture to architectural planning, tourism and military. Moreover, with the development of society, the deterioration of the environment and the aggravation of the consequences of natural disasters, earth science, which was originally mainly oriented to resources, has developed in the direction of environment and disaster reduction and prevention, thus broadening the field of earth science serving the society.
With the application of modern detection methods and information technology, PB/TB-level geospatial data are produced, which need to be processed, interpreted, accessed and utilized by supercomputers for more than one trillion times. On the other hand, the development of digital information and communication environment has also changed the means and methods of traditional basic subject research. Multidisciplinary research team is an important guarantee to complete large-scale scientific research and engineering realization. Advanced supercomputer and grid computing technology provide a multidisciplinary platform for basic interdisciplinary research. Since 2002, the United States, Britain, Japan, Australia and the European Union have all started "electronic research" or "electronic science" projects, with investments ranging from $654.38 billion to $654.38 billion. Its purpose is to use grid technology and middleware technology to connect supercomputers in universities or research rooms in countries or regions to form a virtual platform for multidisciplinary resources collaboration. At the same time, advanced developed countries are establishing a multi-disciplinary resource sharing platform with earth science as the core.
super computer
As explained by Moore's Law, the running speed of computers is increasing rapidly (doubling every 18 months), the manufacturing cost is dropping sharply, and the cost of supercomputing tends to be reasonable. At present, most universities in China can afford a supercomputer with trillions of computing power. According to the latest supercomputer performance statistics, TOP500 shows (as of 65438+February 2004), including 358 newly installed in 2004 and 95 newly installed in 2003. Together, they account for more than 90% of the fastest 500 computers in the world, as shown in table 1. Speed-up of computer is not a technical problem, but the development of software system, which is our weak link.
In the field of basic subject research, earth science is the field where supercomputers are used the most. According to the latest statistics of TOP500 (65,438+computers as of February 2004), as shown in Table 2, geophysics accounts for 5 1 of the top 500 supercomputers, accounting for more than 10% of the total. Such as weather and meteorological research, weather forecast, etc. Together, the proportion of supercomputers occupied by earth science is even greater.
Table 1
Table 2
At present, many universities and research institutions in China have also carried out research on supercomputing architecture, such as cluster computer architecture based on Linux operating system. This architecture provides a feasible solution for supercomputing tasks when large computers and supercomputers are expensive but can't meet the requirements of larger-scale computing now. Its main problems are poor performance ratio, low reliability, difficult maintenance, poor expansibility and poor security. Researchers spend too much energy on the construction of the system, and the cost is not necessarily low.
In 2003, Dr. Chen Shiqing, an academician of the American Academy of Sciences and the cover figure of Time magazine, returned to China and invented the super blade computer in Shenzhen Shell Ying Xing Company. Dr. Chen Shiqing is also a world-famous leader in the development of CRAY-MP and Y-MP supercomputers.
The blade design concept of super blade computer is similar to the turbine blade of jet engine. These "blades" can be disassembled and replaced at any time, and screwing them together will generate powerful power. Super blade computer makes full use of this design concept, simple and simple, and adopts brand-new technology. The upgrade of computing nodes only needs to add "blades" without rewiring and configuration. This kind of computer is like an engine full of "blades". Each "blade" is an arithmetic unit. Theoretically, it can be infinitely expanded, added and replaced at any time without stopping. The super blade computer adopts a brand-new design concept and system architecture, and its operation speed can exceed 50 trillion floating-point numbers per second, reaching the supercomputer level of advanced countries such as the United States and Japan. Super blade computer has lasting vitality, safety, reliability, reasonable cost performance and real-time cooperation mode.
Three Supercomputing Problems in Earth Science Research
Supercomputing problems in earth science research include: seismic data processing and interpretation, remote sensing information processing and interpretation, large-scale geographic information system, geospatial data processing and visualization, dynamic simulation of various natural phenomena such as earth, atmosphere and ocean, such as earthquake, flood, sandstorm, engineering geological structure simulation, and material molecular dynamics simulation. In addition, in the research of earth science, there are many supercomputing involving multidisciplinary and interdisciplinary problems. Some problems are real-time, collaborative workflow patterns.
4 Supercomputing based on high performance network
With the development and application of computer and information technology, especially the construction and application of high-speed network and related equipment, the methods of scientific research have been deeply influenced, and the means of research have been changed, which has also led to the emergence of the concepts of e-Research and e-Science.
E-Science is a definition of super-large-scale scientific research infrastructure, which requires the cooperation of scientists all over the world and the use of the Internet and related technologies. A typical feature of these collaborative scientific research is that scientists need to access massive data sets, use unique scientific research facilities, consume a lot of scientific computing resources, and conduct high-performance analysis, modeling and visual display. Another important aspect of this super-large-scale research is to provide an interdisciplinary platform for information exchange and the germination of new concepts between scientists and interdisciplinary subjects.
E-Research is a more general definition and generalization of e-Science, which includes non-scientific research behaviors and activities. For example, electronic research includes anthropological and sociological research. In order to work together and enjoy knowledge, electronic research also has the characteristics of using distributed computing resources.
Grid technology has played an important role in electronic research and the development of electronic science. Just as customers and enterprises can get power supply, grid enables researchers and research institutions to access data warehouses, special scientific equipment, knowledge services and powerful computing functions distributed on the network in some way. They can realize flexible and safe knowledge sharing and solve scientific research problems in the dynamic combination of individual researchers, research institutions and resources. This method is also commonly called virtual organization.
Network infrastructure represents a new virtual science and engineering knowledge environment formed by distributed computers, information and communication technologies. It realizes an efficient platform for various forms of scientific research.
Scientists solve complex scientific and engineering problems by mining new knowledge, interactive modeling, using simulation tools and cooperating with each other, which leads to changes in basic scientific research facilities. Complex scientific and engineering problems require that our new basic scientific research facilities must be interdisciplinary, distributed and integrated. Astronomy, biology, earth science, public health and nanomaterials usually need to realize information integration, data analysis and safety knowledge sharing. Safe, operable and continuous access to physical devices (such as computers, disk arrays, instruments, etc.) is required. ), data and information (a large number of data sets, commercial and scientific databases, information and software libraries, video and image libraries) and specific experts and scholars.
E-Research middleware is a kind of software with specific functions, which provides standard general tools and services for application systems, computing resources, knowledge management, knowledge sharing and task collaboration between research institutions and individuals on the whole computing infrastructure. It is an important part of the electronic research computing infrastructure.
America, Britain, Europe, Japan, etc. They have implemented a huge research plan of e-Research computing infrastructure, hoping that the plan can increase the country's long-term economic prosperity and give full play to the knowledge distribution skills provided by the infrastructure. Many research projects have developed important middleware, some of which are cooperation projects or exchange projects between countries, and the development of universal middleware across continents.
With the support of the National Natural Science Foundation (NSF), the United States is currently considering investing an additional $654.38 billion a year to build and develop advanced network infrastructure, of which one third (about $395 million) will be invested in middleware technology research and corresponding development activities. Table 3 lists some important electronic research infrastructure R&D plans and some funds invested in middleware R&D. ..
Table 3
Although China has invested a certain amount of research funds in the construction of network infrastructure, the report shows that it is inefficient, time-consuming and requires more manpower to use it effectively to obtain research resources. Users are forced to use unreliable manual methods to find suitable resources; Sometimes it is necessary to consult with the resource owner; Sometimes it is necessary to use these resources by inefficient, time-consuming and expensive means; Sometimes you even need to fly across continents. Lack of sufficient knowledge about accessing high-speed online resources, equipment, services and data leads to many lost opportunities. In addition, users bring a lot of uncertainty to the security of the system, and it is necessary to prevent unauthorized personnel from invading resources. Because of standardization, system support and maintenance and imperfect user interface, researchers need to invest more time and energy in the process of supporting and maintaining software.
Earth science needs a mutual trust, cooperation, interaction and resource environment based on high-speed network, and middleware supporting software services can achieve this goal. Although ICT (Information and Computer Technology) researchers in China have done a lot of research on many key technologies and services of middleware, most of them are single-discipline research groups and enterprises, lacking central coordination and a special application driver. Therefore, more coordination mechanisms should be established within China middleware research projects and with international middleware research projects. At present, China's funding for middleware infrastructure research is limited and scattered, which leads to the duplication and inefficiency of some projects.
China needs an open middleware plan, which can ensure the integration and overall coordination of these research activities, expand and transform the existing traditional middleware into an OMP(Open middleware program) architecture that meets international standards, and provide services in special application fields. The middleware research plan will also identify and fill the differences between us and the international middleware research technology, and update the software of the current research project to the software that can be applied by e-Research research institutions.
The current grid service middleware (identity management, access control, supply management, reservation service, notification service) is very fragile and unreliable when running on the existing computing infrastructure. Grid service components need to be engineered to make them more robust and reliable. Users can access the equipment, computing and data resources enjoyed by the grid completely transparently. We need to increase research and investment in grid service middleware to improve its standardization, robustness and usability.
One of the important purposes of implementing the Open Middleware Program is to solve and improve the interface among OGSA grid service, Internet-based application-level middleware, digital library, information management service and knowledge service management. In the past few years, GGF (Global Grid Forum) has developed grid infrastructure specifications, such as Globus Toolkit and Open Grid Service Architecture (OGSA). Global Grid Alliance (including Globus Alliance, HP and IBM) jointly develops network services in the form of WSRF(WS-Resource Frame-work). This will also enable grid research institutions to influence the technologies and tools developed by W3C and OASIS, and now it has attracted a lot of industrial investment. WSRF and related technical requirements are not industry standards at present. One of OMP's functions is to track these developments and ensure that they reflect and understand the current situation of electronic research and grid technology in China.
We should re-recognize the existing middleware tools and services to make them more reliable and practical.
The existing middleware tools and services should be more operable, interesting and customized, and can be integrated with larger frameworks and grid environments.
Therefore, it is necessary to develop new middleware tools and services. In the absence of the following functions, we should consider developing new middleware: grid security, grid management and assembly, service adaptation quality, workflow engine, collaboration tools, multimedia semantic indexing, intelligent service discovery, decision support and hypothesis testing software, data and knowledge verification and correction, automatic representation mechanism, collaborative visualization, simulation and high-end grid user interface designed for application system scientists.
There are a large number of heterogeneous data sets in the special scientific data warehouse, such as spatial data, time data, images, videos, audio, 3D, spectra, graphics and multimedia, which should be accessible, shared and integrated with information resources in other fields, digital libraries (published articles and papers) and websites.
The knowledge grid layer needs to be added to the existing computing and data grid, which will involve defining the interface between knowledge management services and grid management, and realizing the integration of knowledge grid services and grid environment.
Strengthening the coordination of research work and increasing capital investment can prevent duplication of work and narrow the gap with the international community.
5 Collaborative Computing Middleware
Theoretically, middleware is between users, between application systems, or between resources used to solve complex scientific and engineering problems (see the figure below). Middleware provides a set of universal services and tools, allowing researchers and application systems to handle computing, data warehouses and other distributed resources as if they were a very large virtual facility. Middleware puts a set of core services required by the application system in a standard and ubiquitous container. This universal service simplifies the development of application system, provides the robustness and interoperability of the system, reduces a lot of repetitive work and improves the efficiency in all aspects.
Key component diagram of computing infrastructure
Although middleware is divided into three types of services and tools, there are some other traditional ways to divide the space of middleware. In addition, some components (such as security, semantics, sources, etc. ) actually spans all three categories.
Grid Service and Resource Management Middleware: This middleware includes an open grid service facility OGS (Open Grid Services Infrastructure), which provides access, communication, security, authentication, accounting and coordination services between grid data and computing resources and between high-end application services that use these resources. Computing and data grid depend on grid service middleware, so it is also called resource management middleware.
Knowledge management middleware: This middleware provides a large number of services and tools for indexing, archiving, querying, analyzing, integrating, managing and representing various types of large data warehouses and video information storage warehouses. These tools can realize the integration and automatic indexing of multidisciplinary data sets, and realize interactive analysis, modeling and visualization. Tools can also mine, acquire and publish new levels of knowledge and enjoy new annotations.
Collaborative middleware: This middleware provides services and tools to support formal and informal, real-time and non-real-time collaborative activities, which may occur between remote scientists, research institutions or resources (dynamic and extensible virtual organizations). Table 4 lists the basic functions of these middleware, which usually need to be integrated and developed in this research project.
Table 4
sequential
6 conclusion
The development of earth system science plays an important role in the sustainable development of economy and society.
The research of earth system science needs to apply large-scale scientific instruments and equipment and ultra-large-scale computing facilities to process PB and TB geo-spatial data sets.
Modern earth system science research involves a large number of multidisciplinary and interdisciplinary problem solving, so it needs a collaborative multidisciplinary resource sharing platform and technical standards and norms for using the platform.
The research of earth system science should not be an isolated behavior, but should be studied together with the world, and the resource sharing platform can participate in the world e-Research and Geo Grid grid construction.
Supercomputing facilities for basic research of earth system science in China are poor, especially in universities, which need to increase investment and integrate our basic research resources.
Establish a basic research platform for earth science supercomputing and geospatial data processing.
Multi-disciplinary resource sharing environment and earth science grid computing environment for the purpose of basic research of earth science.
Develop middleware with super parallel computing, distributed collaborative processing and multidisciplinary resources, as well as related application basic research.
To lay a foundation for participating in a larger national and even world-class research grid R&D. ..
refer to
Wang, Zhao, Qiu, Yin Hongfu. In 2003, the future of earth science education in China was viewed from the implementation of sustainable development strategy. Expert opinion of the Science and Technology Committee of the Ministry of Education,No. 1 1 (25 in total).
Gong Jianhua, Lin Hui. 200 1. Virtual geographical environment-the geographical perspective of online virtual reality. The frontier of contemporary science. Beijing: Higher Education Press.
Jiang Bin, Bo Huang, Lu Feng. 2002. Spatial analysis and geographical visualization in GIS environment. The frontier of contemporary science. Beijing: Higher Education Press.
Jianguo Wu. 2000. Landscape Ecology-Pattern, Process, Scale and Grade. The frontier of contemporary science. Beijing: Higher Education Press.
Pu Duanliang, Gong Peng. 2000. Hyperspectral remote sensing and its application. The frontier of contemporary science. Beijing: Higher Education Press.
Zhang youxue, yin an. 2002. Structure, evolution and dynamics of the earth. The frontier of contemporary science. Beijing: Higher Education Press.
Chen Changsheng. 2003. Dynamics and model of marine ecosystem. The frontier of contemporary science. Beijing: Higher Education Press.
Length Foster, C.Kesselman.2005 Grid computing (English version). Beijing: Machinery Industry Press.
A. Grama, A. Gupta, G. Karipis and V. Kumar (translated by Zhang Wu, etc.), 2005. Introduction to parallel computing. Beijing: Machinery Industry Press.
J. Douato, S. Aramanchili and L. Ni (translated by Zhang Minxuan et al.), 2004. Parallel computer internet technology--an engineering method. Beijing: Electronic Industry Press.
G.Tel (translated by Huo Hongwei). 2004. Introduction of distributed algorithm. Beijing: Machinery Industry Press.
W. Stalins (translated by Qi, et al.), 2003. High-speed network and internet-performance and quality of service. Beijing: Electronic Industry Press.
R. Buya (translated by Zheng Weimin, etc. ) .2001.High performance cluster computing-structure and system. Beijing: Electronic Industry Press.
R. Buya (translated by Zheng Weimin, etc.). High performance cluster computing-programming and application. Beijing: Electronic Industry Press.