Why Python is suitable for some reptiles?
1) captures the interface of the web page itself.
Compared with other static programming languages, such as java, c#, C++ and python, the interface for grabbing web documents is simpler. Compared with other dynamic scripting languages such as perl, shell, python, etc., the urllib2 package provides a relatively complete API for accessing web documents. (Of course, ruby is also a good choice. )
In addition, crawling a web page sometimes needs to simulate the behavior of a browser, and many websites prohibit blunt crawler crawling. This is why we need to simulate the behavior of user agents to construct appropriate requests, such as simulating user login and simulating the storage and setting of session/cookie. There are excellent third-party packages in python that can help you, such as Requests and mechanize.
2) Processing after webpage crawling
Crawled web pages usually need to be processed, such as filtering html tags and extracting text. Python's beautifulsoap provides concise document processing function, which can complete most document processing with very short code.
In fact, many languages and tools can do the above functions, but python is the fastest and cleanest. Life is short, you need python.