Webscraping using BeautifulSoup

Home

1 Beautiful Soup Install

2 Beautiful Soup Commands

2.1 Methods.

soup.prettify()
soup.find(string, x)
soup.title
soup.title.text

These all find the 1st instance of the attribute, i.e. title To find other attribute instances, you can find them based on another attribute:

Examples:

#+BEGINSRC python soup.find('div') # finds the first division (section) soup.find('div',class_="article") # finds the first division (section) where

#+ENDSRC*** class_ (The underscore is needed to distinguish it from the python keyword 'class'

3 General HTML tools used with BeautifulSoup

3.1 Google Chrome

  1. On any webpage, simply righ-click on any element, and select inspect
  2. On any webpage, menu-view-developer-inspect elements

The above two get to the same tool, a split window with the right side (rs) panel showing the html source. To view specific elements, on the rs panel click the top-left icon (arrow pointer) to "inspect an element", then click the ls panel on the element for which you are interested in seeing the html source code.

3.2 Home