使用 Python 的 urllib
和 BeautifulSoup
库进行简单的页面元素提取测试
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| import urllib.request from bs4 import BeautifulSoup
response = urllib.request.urlopen('http://www.mmjpg.com/') if response: html_content = response.read().decode('utf-8') soup = BeautifulSoup(html_content, 'html.parser') print(soup.prettify()) tag_elements = soup.find('div', attrs={"class": "subnav"}).find_all('a') hrefs = [e.get('href') for e in tag_elements] tags = [e.string for e in tag_elements] print(hrefs) print(tags)
|