1, beautiful soup
Objectively speaking, Beautifu Soup is not a complete set of reptile things, which need to be used in cooperation with urllib, but a set of things for analyzing, cleaning and obtaining HTML/XML data.
2. Aggressive
Scrapy stage crawling, a fast advanced screen crawling and web crawling framework.
for
Python。 I believe many students have heard that many courses in the course map are based on Scrapy. There are many introduction articles in this field, including an early article by Daniel pluskid: Scrapy.
Easily customize the web crawler, lasting forever.
3. Python Goose
Goose was originally written in Java and later rewritten in Scala. This is a Scala project. Python-Goose is rewritten in Python, which is beautiful.
Given the URL of an article, you can easily get the title and content of the article, which is very easy to use.
The above is the introduction of Python programming web crawler toolset. I hope it can help everyone who is programming Python. Of course, Python programming learning needs not only tool learning, but also a lot of programming knowledge, which needs to be learned well. Come on!