웹스크래핑(크롤링)기초 및 연습2

https://suminpark.tistory.com/35

파이썬 웹스크래핑(크롤링)기초/beautifulsoup4

크롤링 웹사이트에 접속해서 데이터를 솎아내어 가져오는 기술 크롤링에 필요한 라이브러리 설치 - beautifulsoup4 beautifulsoup4 - 파이썬에서 크롤링을 할 때 사용하는 라이브러리 HTML 및 XML 문서를

suminpark.tistory.com

순위와 별점도 같이 출력 해보기

#old_content > table > tbody > tr:nth-child(2) > td:nth-child(1) > img

#old_content > table > tbody > tr까진 변수 trs에 저장되어 있음

반복문 안에서 변수를 만들어 td:nth-child(1) > img을 저장하고 alt속성의 값만 가져오면 됨

별점도 같은 방법으로 진행하되 text값을 가져오기

#이 코드에서 보완해야할 곳 밑에 기술
import requests
from bs4 import BeautifulSoup

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('https://movie.naver.com/movie/sdb/rank/rmovie.naver?sel=pnt&date=20210829',headers=headers)

soup = BeautifulSoup(data.text, 'html.parser')

trs = soup.select('#old_content > table > tbody > tr')
for tr in trs:
    a = tr.select_one('td.title > div > a')
    rank = tr.select_one('td:nth-child(1) > img')
    point = tr.select_one('td.point')
    if a is not None:  
      print(rank['alt'], a.text, point.text)

보완해야할 곳

#처음 코드
for tr in trs:
    a = tr.select_one('td.title > div > a')
    rank = tr.select_one('td:nth-child(1) > img')
    point = tr.select_one('td.point')
    if a is not None:  
      print(rank['alt'], a.text, point.text)
      
#보완해야하는 부분
for tr in trs:
    a = tr.select_one('td.title > div > a')
    if a is not None:
      name = a.text
      rank = tr.select_one('td:nth-child(1) > img')['alt']
      point = tr.select_one('td.point').text
      print(rank, name, point)

처음 코드는 보완해야 하는 부분의 print처럼 쓰고 싶었는데 에러가 발생해서 print안에서 사용했다.

에러가 발생했던 이유 - if문 위에 rank와 point을 선언해서 None을 걸러주지 않아서

만약 조건문 안에 저장했더라도 a.text를 다른 변수에 넣을 생각을 못했을 것이다.

전체 코드

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('https://movie.naver.com/movie/sdb/rank/rmovie.naver?sel=pnt&date=20210829',headers=headers)

soup = BeautifulSoup(data.text, 'html.parser')

trs = soup.select('#old_content > table > tbody > tr')
for tr in trs:
    a = tr.select_one('td.title > div > a')
    if a is not None:
      name = a.text
      rank = tr.select_one('td:nth-child(1) > img')['alt']
      point = tr.select_one('td.point').text
      print(rank, name, point)

'[스파르타 코딩클럽] > 비개발자를 위한, 웹개발 종합반' 카테고리의 다른 글

파이썬으로 웹스크래핑 결과 몽고DB에 저장하기 (0)	2023.02.26
mongoDB기초/pymongo DB조작 코드 (0)	2023.02.23
파이썬 웹스크래핑(크롤링)기초/beautifulsoup4 (0)	2023.02.23
파이썬/터미널 설정/기초 문법/venv 설치 방법/Requests라이브러리 (2)	2023.02.23
Fetch연습2 (0)	2023.02.20

SuminStory

웹스크래핑(크롤링)기초 및 연습2

'[스파르타 코딩클럽] > 비개발자를 위한, 웹개발 종합반' 카테고리의 다른 글

티스토리툴바

웹스크래핑(크롤링)기초 및 연습2

'[스파르타 코딩클럽] > 비개발자를 위한, 웹개발 종합반' 카테고리의 다른 글

관련글

티스토리툴바