一区二区三区在线-一区二区三区亚洲视频-一区二区三区亚洲-一区二区三区午夜-一区二区三区四区在线视频-一区二区三区四区在线免费观看

腳本之家,腳本語言編程技術及教程分享平臺!
分類導航

Python|VBS|Ruby|Lua|perl|VBA|Golang|PowerShell|Erlang|autoit|Dos|bat|

服務器之家 - 腳本之家 - Python - python爬蟲之BeautifulSoup 使用select方法詳解

python爬蟲之BeautifulSoup 使用select方法詳解

2020-12-13 00:14古怪的一陣風 Python

本篇文章主要介紹了python爬蟲之BeautifulSoup 使用select方法詳解,具有一定的參考價值,感興趣的小伙伴們可以參考一下

本文介紹了python爬蟲BeautifulSoup 使用select方法詳解 ,分享給大家。具體如下:

 
?
1
 
2
3
4
5
6
7
8
9
10
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" rel="external nofollow" rel="external nofollow" rel="external nofollow" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" rel="external nofollow" rel="external nofollow" rel="external nofollow" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""

我們在寫 CSS 時,標簽名不加任何修飾,類名前加點,id名前加 #,在這里我們也可以利用類似的方法來篩選元素,用到的方法是 soup.select(),返回類型是 list

(1)通過標簽名查找

 
?
1
 
2
3
4
5
6
7
8
print soup.select('title')
#[<title>The Dormouse's story</title>]
 
print soup.select('a')
#[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link3">Tillie</a>]
 
print soup.select('b')
#[<b>The Dormouse's story</b>]

(2)通過類名查找

 
?
1
 
2
print soup.select('.sister')
#[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link3">Tillie</a>]

(3)通過 id 名查找

 
?
1
 
2
print soup.select('#link1')
#[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>]

(4)組合查找

組合查找即和寫 class 文件時,標簽名與類名、id名進行的組合原理是一樣的,例如查找 p 標簽中,id 等于 link1的內容,二者需要用空格分開

 
?
1
 
2
print soup.select('p #link1')
#[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>]

直接子標簽查找

 
?
1
 
2
print soup.select("head > title")
#[<title>The Dormouse's story</title>]

(5)屬性查找

查找時還可以加入屬性元素,屬性需要用中括號括起來,注意屬性和標簽屬于同一節點,所以中間不能加空格,否則會無法匹配到。

 
?
1
 
2
3
4
5
print soup.select("head > title")
#[<title>The Dormouse's story</title>]
 
print soup.select('a[href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" ]')
#[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>]

同樣,屬性仍然可以與上述查找方式組合,不在同一節點的空格隔開,同一節點的不加空格

 
?
1
 
2
print soup.select('p a[href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" ]')
#[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>]

以上就是本文的全部內容,希望對大家的學習有所幫助,也希望大家多多支持服務器之家。

原文鏈接:http://www.cnblogs.com/yizhenfeng168/p/6979339.html

延伸 · 閱讀

精彩推薦
主站蜘蛛池模板: 高清视频在线观看+免费 | 四虎在线视频免费观看视频 | 香蕉在线精品一区二区 | 国产精品国产三级在线专区 | www一区| 久久精品国产免费播高清无卡 | 女张腿男人桶羞羞漫画 | 免费观看一区二区 | 久久婷婷电影网 | 我的男友是消防员在线观看 | 人妖三级 | brazzersvideo欧美最新 | 久久精麻豆亚洲AV国产品 | 非洲一级毛片又粗又长aaaa | 久久精品久久久久 | 亚洲国产在线 | 国产成人高清精品免费5388密 | 久久精品午夜一区二区福利 | 99r在线观看 | 第一次破学生处破 | 俄罗斯freeoo性另类 | 91在线视频导航 | 精品手机在线1卡二卡3卡四卡 | 免费在线公开视频 | 天美传媒在线视频 | 毛片网站观看 | 日韩 国产 欧美 精品 在线 | 91真人毛片一级在线播放 | 96日本xxxxxxxxx70 95在线观看精品视频 | 好男人资源免费播放 | 欧美视频免费 | 亚1洲二区三区四区免费 | 臀精插宫NP文| 亚洲天堂成人在线 | 午夜成私人影院在线观看 | 俄罗斯一级成人毛片 | kisssis无减删全集在线观看 | 男女爆操| 色综合伊人色综合网站中国 | 亚洲AV久久无码精品九九软件 | 娇妻中日久久持久久 |