Current location - Education and Training Encyclopedia - Graduation thesis - The conclusion of python reptile paper
The conclusion of python reptile paper
From where? bs4? Import? Beautiful voice

From where? Request. Exception? Import? Request exception

Import? about

Import? ask

Import? Operating System (operating system)

def? get_html_text(url):

Try:

r? =? Request. get(url)

r.raise_for_status()

Return? R. text

Except? Exception in request:

Return? nobody

def? Get _ chapter _ name (html):

Soup? =? BeautifulSoup(html,? lxml’)

charpter? =? soup.select('。 BG’)

charpter _ names? =? []

For what? Getting started? Are you online? charpter[ 1:]:

charpter_name? =? re . find all(& lt; h2 & gt(.*? )& lt/H2 & gt; ',? Str (entry))

File name? =? re . find all(& lt; Answer? href。 *? & gt(.*? )& lt/a & gt; ',? Str (entry))

What if? charpter_name? And then what? File name:

For what? Name? Are you online? File name:

Name? =? name.split('?' ))[0]

char pter _ names . append(char pter _ name[0]? +? '_'? +? Name)

Otherwise:

get through

Return? Collection (Chapter Name)

def? get_each_url(html):

Soup? =? BeautifulSoup(html,? lxml’)

Website? =? soup.select('ul? Lee? a’)

For what? Website? Are you online? URL:

Link? =? url.get('href ')

Words? =? url.text.split('?' ))[0]

Full name? =? url.text.replace('?' ),'')

Output? {'url ':? Link,? Text': text,' full name': full name}

Print (text)

def? Get text (url):

r? =? Request. get(url)

R. coding? =? R. apparent _ coding

Soup? =? BeautifulSoup(r.text,? lxml’)

Articles? =? soup.select('div.content-body ')

Articles? =? re . find all('; (.*? );' ,? Item [0]. Text,? About. s)

Return? Item [0]. Code ()

def? Save to a file (url, text,? Full name):

base_dir? =? Daoist

Path? =? '{}\\{}\\{}'.format(os.getcwd(),? base_dir? Text)

What if? Isn't it? Os.path.exists (path):

Try:

Os.makedirs (path)

Except:

get through

Try:

With what? Open (path? +'\\'+? Full name? +'.' txt ',? WB’)? As? Female:

f.write(get_text(url))

Except:

get through

def? main():

Website? =? '/'

html? =? get_html_text(url)

Chapters? =? Get _ chapter _ name (html)

For what? Chapter? Are you online? Chapters:

For what? Every one? Are you online? get_each_url(html):

What if? Every ['text']? ==? chapter.split('_')[- 1]:

Save to file (each ['url'], chapter, each ['full name'])

What if? __name__? ==? __main__ ':

Master ()