当前位置: 萬仟网 > IT编程>脚本编程>Python > python爬虫项目-一见倾心壁纸

python爬虫项目-一见倾心壁纸

2019年12月01日 14:58  | 萬仟网IT编程  | 我要评论

方法1

 1 import re
 2 import urllib
 3 import urllib.request
 4 
 5 def gethtml(url):
 6     page = urllib.request.urlopen(url)
 7     html = page.read()
 8     return html
 9 
10 def getimage(html,x):
11     #https://mmbiz.qpic.cn/mmbiz_jpg/ib55rg6wzuc3b16kiy3uu53nkcttdic8uea4wwbpahj8lpibvankps2fztyjrv7w7dbeenrhfvpuuyrenaxsldgja/640?wx_fmt=jpeg
12     #https://mmbiz.qpic.cn/mmbiz_jpg/ib55rg6wzuc3b16kiy3uu53nkcttdic8uehqoci7r86nehl2neforaqvctiaeaiuwjtwpknxnnxipuuuqnujefkyw/640?wx_fmt=jpeg
13     #此处正则为重点
14     reg = 'data-src="(.*?)"'
15     image = re.compile(reg)
16     imlist = re.findall(reg,html.decode('utf-8'))
17 
18     print(imlist)
19     for i in imlist:
20         print(i)
21         print(x)
22         urllib.request.urlretrieve(i,'%s.jpg' % x)
23         x +=1
24     return x
25 x=1
26 url = 'https://mp.weixin.qq.com/s/mvdcn0o3093olihmykqbia'
27 html = gethtml(url)
28 x = getimage(html,x)
29 print('下载完成')
30 #下载结果与此.py文件在同一目录

方法2:beautifulsoup 避免写正则表达式(因为不会)

  

import requests
import urllib.request
from bs4 import beautifulsoup

url = "https://mp.weixin.qq.com/s/cm3bua0um1jbznr2de7twg"
r = requests.get(url)
demo = r.text
soup = beautifulsoup(demo,"html.parser")

piclist=[]

for link in soup.find_all('img'):
    link_list = link.get('data-src')
    if link_list != none:
        piclist.append(link_list)
#print(piclist) 
    #print(type(link_list))

x = 0
for http in piclist:
    print(http)

    #f:\桌面\pa 是存储路径,需要先建立文件夹
    filesavepath = r'f:\桌面\pa\%s.jpg' % x  

    urllib.request.urlretrieve(http,filesavepath)
    x +=1
    print('正在保存第{:.0f}张图片'.format(x))
    
print('下载完成')   
    

 

 

 

 

如对本文有疑问,请在下面进行留言讨论,广大热心网友会与你互动!! 点击进行留言回复

相关文章:

◎已有 0 人评论

Copyright © 2019  萬仟网 保留所有权利. 粤ICP备17035492号-1
站长QQ:2386932994 | 联系邮箱:2386932994@qq.com