GEOG 489
Advanced Python Programming for GIS

2.9.2 Lesson 2 Practice Exercise 2 Solution

PrintPrint
import requests
from bs4 import BeautifulSoup

url = 'https://www.e-education.psu.edu/geog489/node/2269'

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

divElement = soup.select('article#node-book-2269 > div > div')[0] 

wordLengths = [ len(word) for word in divElement.text.split() ] 
print(wordLengths)
 

After loading the html page and creating the BeautifulSoup structure for it as in the examples you already saw in this lesson, the select(…) method is used in line 9 to get the <div> elements within the <div> element within the <article> element with the special id we are looking for. Since we know there will only be one such element, we can use the index [0] to get that element from the list and store it in variable divElement.

With divElement.text.split() we create a list of all the words in the text and then use this inside the list comprehension in line 11 where we convert the word list into a list of word lengths by applying the len(…) function to each word.