Cloud computing is a mode of increasing, using and delivering related services based on the Internet, which usually involves providing dynamically expandable and often virtualized resources through the Internet. Cloud is a metaphor of network and internet. In the past, cloud was often used to represent telecommunication networks, and later it was also used to represent the abstraction of the Internet and the underlying infrastructure. In a narrow sense, cloud computing refers to the delivery and use of IT infrastructure, and refers to obtaining the required resources in an on-demand and scalable way through the network; Cloud computing in a broad sense refers to the delivery and use of services, and refers to obtaining the required services in an on-demand and extensible way through the network. The service can be related to IT and software, Internet or other services. It means that computing power can also be circulated as a commodity through the Internet.
Big data, or massive data, refers to information that involves such a huge amount of data that it cannot be captured, managed, processed and sorted in a reasonable time by current mainstream software tools to help enterprises make more active business decisions. 4V characteristics of big data: quantity, speed, change and accuracy.
Technically, the relationship between big data and cloud computing is as inseparable as the front and back of a coin. Big data can't be processed by a single computer, so distributed computing architecture must be adopted. Its characteristic lies in the mining of massive data, but it must rely on the distributed processing of cloud computing, distributed database, cloud storage and virtualization technology.
Big data management, distributed file systems, such as Hadoop, Mapreduce data segmentation and access execution; At the same time, with the support of SQL and the support of SQL interface represented by Hive+HADOOP, building the next generation data warehouse with cloud computing on big data technology has become a hot topic. From the perspective of system requirements, the architecture of big data poses new challenges to the system:
1, with higher integration. The standard chassis can maximize the completion of specific tasks.
2. The configuration is more reasonable and the speed is faster. The balanced design of storage, controller, I/O channel, memory, CPU and network is the best design for data warehouse access, which is more than one order of magnitude higher than that of traditional similar platforms.
3. The overall energy consumption is lower. The same computing task has the lowest energy consumption.
4. The system is more stable and reliable. It can eliminate all kinds of single-point failure links and unify the quality and standard of a component.
5. Low management and maintenance costs. The daily management of data warehouse is fully integrated.
6. Plans and foreseeable road maps for system expansion and upgrade.
The relationship between cloud computing and big data
To put it simply: cloud computing is the virtualization of hardware resources, while big data is the efficient processing of massive data. Although this explanation is not entirely appropriate, it can help people who don't know these two names to understand the difference quickly. Of course, if explained more vividly, cloud computing is equivalent to our computers and operating systems, virtualizing a large number of hardware resources and then allocating them for use.
It can be said that big data is equivalent to a "database" of massive data. Looking at the development of big data, we can also see that the development of big data has been developing in the direction similar to the experience of traditional databases. In short, traditional databases provide enough space for the development of big data.
The overall architecture of big data includes three layers: data storage, data processing and data analysis. The data should be stored through the storage layer first, and then the corresponding data model and data analysis index system should be established according to the data requirements and objectives, so as to analyze the data and generate value.
Intermediate timeliness is accomplished by the powerful parallel computing and distributed computing capabilities provided by the intermediate data processing layer. The three work together to make big data generate ultimate value.
Regardless of the current development of cloud computing, the future trend is: as the bottom of computing resources, cloud computing supports the upper-level big data processing, and the development trend of big data is the query efficiency and analysis ability of real-time interaction, which is really exciting to borrow the words from a technical paper of Google: "PB-level data can be operated with a click of a mouse".