How to use Python as a reptile?

We often see some beautiful pictures when surfing the Internet everyday, so we hope to save these pictures and download them, or users can use them as desktop wallpaper or design materials.

Our most common method is to click the right mouse button and select Save As. But when you right-click some pictures, there is no save as option. There is another way to capture them with the screenshot tool, but this will reduce the clarity of the pictures. Well, actually, you are very good. Right click to view the page source code.

We can use python? Realize such a simple crawler function and grab the code we want to the local. Let's take a look at how to use python to achieve such a function.

Specific steps

Get the whole page of data. First, we can get the whole page information of the picture to be downloaded.

getjpg.py

# coding = utf-8 import urllibdef getHtml(URL):

page = urllib.urlopen(url)

Html = page.read () returns html.

html = get html(" blogs . com/fnng/archive/20 13/05/20/30898 16 . html

If we find some beautiful wallpapers in Baidu Post Bar, we can check the tools in the previous paragraph. Found the address of the picture, such as: src = "/forum ... jpg" pic _ ext = "JPEG"

Modify the code as follows:

Import re-import urllibdef getHtml(url):

page = urllib.urlopen(url)

html = page . read()return html def getImg(html):

reg = r'src= "(。 +? \.jpg)" pic_ext '

Imgre = recompile (reg)

Imglist = re.findall(imgre, html) returns imglist?

html = get html("/p/2460 150866 ")print getImg(html)

We also created the getImg () function, which is used to filter the required picture connections in the whole page. Re module mainly contains regular expressions:

Recompile ()? You can compile regular expressions into regular expression objects.

re.findall()？ Method to read html? Included in? Imgre (regular expression) data.

Running this script will get the URL address of the picture contained in the whole page.

3. Save the filtered data of the page locally.

Traverse the filtered image address through the for loop and save it locally. The code is as follows:

# coding = utf-8 import URL libimport redef get html(URL):

page = urllib.urlopen(url)

html = page . read()return html def getImg(html):

reg = r'src= "(。 +? \.jpg)" pic_ext '

Imgre = recompile (reg)

imglist = re.findall(imgre，html)

X = 0 of imgurl in imglist:

urllib.urlretrieve(imgurl，' %s.jpg' % x ')

x+= 1 html = getHtml("/p/2460 150866 ")print getImg(html)

The core here is to download the remote data directly to the local area by using urllib.urlretrieve () method.

Image connection obtained by for loop traversal. In order to make the file name of the image look more standardized, it was renamed, and the naming rule was to add 1 to the x variable. The storage location defaults to the storage directory of the program.

After the program runs, you will see the downloaded files in the directory.

Selected papers of graduation design

Pricing skills of new products

Model essay on the experience of geography teaching and research work

If the signature of sci paper is 4, what does it mean?

Problems and Countermeasures in the Construction and Management of Intelligent Library

Thesis title thanks to high school mathematics evaluation.

How to treat the employment-labor relationship in socialist countries

What are the evaluation conditions for intermediate professional titles in agricultural system?

Evaluation conditions for intermediate professional titles

One of the evaluation conditions

How to write an introduction to an important paper on family education for children's growth?

New scientific papers