CXY:
Operation and maintenance is a very broad definition, which has different responsibilities and positioning in different companies at different stages. If understood from the literal meaning of the operation, it is wrong to think that it is the work of typing a few lines of operation orders. For a start-up company, the work of an operation and maintenance engineer may need to start from applying for a domain name, buying/renting servers, putting servers on shelves, adjusting network equipment settings, deploying operating systems and running environments, deploying codes, designing and deploying monitoring, and preventing vulnerabilities and attacks. For large companies, the requirements for operation and maintenance work are getting higher and higher, which also gives birth to a more detailed division of labor: generally speaking, IT can be divided into website operation and maintenance, system operation and maintenance, network operation and maintenance, database operation and maintenance, IT operation and maintenance, operation and development, operation and maintenance security and other directions.
Many non-employees' views on operation and maintenance generally belong to a very small responsibility of IT operation and maintenance: installing the system. Some R&D engineers' views on operation and maintenance are limited to several points: deployment, change, monitoring and response.
No matter what operation and maintenance, the most basic duty is to ensure the stable operation of the business. So it must be a stable owner of the business. Some people usually think that operation and maintenance engineers are like firefighters, who react abnormally for 7*24 hours and put out fires. But the stable operation engineer is closer to the profession of doctor. Doctors are also divided into various departments and emergency rooms. We must first judge the patient's problems and prescribe the right medicine.
Enterprises have various needs. If the operation and maintenance engineers can meet the business needs or actively explore the pain points and improvement methods of the business, they can realize more value for the business.
When meeting business needs, we should prioritize and give priority to the very important needs of rapid business development, such as stability, deployment and change efficiency, capacity management, etc. Needless to say, stability, if users can't use your business stably, any product function is worthless. For a fast-growing Internet company like Baidu, there are a lot of upgrades to be provided to users every day. How to meet the needs of product upgrading in large clusters in different places at the fastest speed, and at the same time let users not know the upgrading process, is our pursuit. When users will use Baidu to measure whether the network can get online, it is a praise for the quality of operation and maintenance.
Secondly, we can look at the needs of different businesses horizontally. If we can abstract the requirements of various services and platformize some work with universal value (such as database, cdn, monitoring, traffic access and scheduling, storage and calculation of big data), we can also develop in this direction. With the huge traffic and server scale like Baidu, you not only have huge space and challenges, but also have enough resources and support to develop and apply the most cutting-edge technologies in the industry.
After a certain accumulation, we can enter the macro and micro level, and consider the intelligent deployment and scheduling of services from the whole company level (involving key points such as network, hardware, system and application development mode), so as to further improve efficiency and save costs.
If you can understand the business, understand the business model, and combine it closely with the business to optimize and innovate, it is another way for the operation and maintenance engineer to reflect the value. There are many product innovations, patent applications, papers published and business indicators improved, all of which are directly or cooperatively contributed by operation and maintenance engineers.
YBX:
Compared with R&D personnel, operation and maintenance engineers can observe that computer systems maintained around the world, especially senior operation and maintenance engineers, have no module boundaries. This unique location brings a lot of value: knowing the accurate system bottleneck, and then knowing the accurate capacity of the system; Know how to provide capacity quickly before the system bottlenecks. Knowing the risk points of the system, we can coordinate the related modules above and below the risk points and formulate redundancy strategies; It is more reasonable than focusing on the stability of single-point modules. Being engaged in related work for a long time and accumulating more architectural design experience can guide the design and audit of new buildings. From the perspective of different businesses of the company, operation and maintenance can abstract the same modules, manage them in a unified way, and form an effective platform and automatic management mode. Similarly, from the perspective of different businesses of the company, resources can be uniformly allocated and saved.
KZ: Design and implement software that can improve the availability, scalability, delay and efficiency of company services. Deal with daily emergencies, correct and replace problem parts. And design method to avoid this problem. Design and implement a new architecture and standard for very large scale distributed systems. Participate in service expansion plan and forecast service growth trend, optimize software and system performance. Provide online consulting service and on-site problem solving service. Build an automated operation and maintenance platform to solve daily problems. Establish knowledge base and predict possible problems. XX:
Operation and maintenance is the whole process of maintaining the production environment, resources and services related to the production environment, including related technologies and technological means to ensure the stable, efficient and low-cost operation of the production environment.
On the one hand, operation and maintenance is ultimately responsible for business functions, and its value is reflected in maximizing product value. This is usually achieved by maximizing the performance of product functions. For example, the operation and maintenance of search engines should focus on ensuring the ultimate experience of users when searching: stable, fast, accurate, novel and complete. The operation and maintenance of online chat system should ensure the real-time and smooth chat process of users. On the other hand, it is ultimately responsible for the cost of online business. Its value lies in reducing the service operation cost.
The development mode of operation and maintenance work generally depends on the characteristics and requirements of the business to be maintained, forming a number of theme directions that need to be developed. Common solutions include the following topics: event management, configuration management, change management, capacity management, etc.
The requirements for operation and maintenance engineers are particularly strict, because they need to constantly supplement and expand their knowledge and research scope for different problems.
Excellent operation and maintenance engineers will show outstanding initiative and sense of responsibility in the early stage, and will actively learn and expand their understanding and corresponding knowledge in the face of unfamiliar business, so as to be competent for independent maintenance of business.
In the gradual development stage, engineers who pay attention to summarization and introspection will gradually grow into high-level operation and maintenance engineers, and usually have a more systematic understanding of business operation and maintenance. There are also some engineers who gradually become project managers because of their excellent project management planning ability.
Further development, senior operation and maintenance engineers will have a thorough understanding of the products, so in this case, senior operation and maintenance engineers can even become product managers and product research and development consultants, which plays a vital role in the design and development of product functions.
SJY:
The technical system required by an operation and maintenance engineer varies according to his professional direction. But mastering the basic computer system architecture, operating system and network technology is the basic requirement. For example, you may need to master the use of linux operating system, use various scripting tools to handle daily work tasks, and master TCP/IP protocol stack to eliminate abnormal traffic problems in large network systems. Further, you need to form a set of experience accumulation in software maintainability as a guide for follow-up work.
The purpose of an operation and maintenance engineer in the initial stage is to master all the software and hardware knowledge and experience needed to maintain a system. In the advanced stage, it is necessary to design and develop a set of basic system software to support the stable and reliable operation of business systems, that is, to develop software to support larger-scale business systems and improve operation and maintenance productivity. The highest stage is the construction and operation stage of the software system, which makes the system naturally operable from the birth stage, so as to maximize the productivity of the system and minimize the dependence on external support resources.
ZM:
The operation and maintenance engineer should be a software engineer first, but with different responsibilities and emphases.
The operation and maintenance engineer is not a system administrator. The biggest difference with the system administrator is that the job of the operation and maintenance engineer is not only to configure and manage the system, but also to enhance the function of the system or analyze the data by software development.
The operation and maintenance engineer should be a combination of software engineer and system engineer, and should have a broader knowledge background than the general software engineer.
The responsibility of operation and maintenance lies in: ensuring the stable operation of services; Consider the scalability of the service; From the point of view of system stability and operability, the development requirements are put forward. Locate system problems and even directly correct bug;; Quickly respond to and deal with unexpected problems; Daily work of operation and maintenance: it is necessary to analyze the requirements and design scheme of the system, think about what can be strengthened in ensuring stability, and communicate effectively with R&D personnel of the system; Use tools or write programs to analyze operational data; Write programs to build tools or platforms to strengthen the stability of the system; The most important thing for operation and maintenance engineers is to solve problems by programming and software. The development path should not be very different from that of software engineers, but the focus and field direction are different.