0

I am trying to get the numbers of the scores from the corporate governance section from this link https://ca.finance.yahoo.com/quote/AMP/profile?p=AMP

I am new to python and I do not know why am I getting an error when I am web scrapping.

import requests
from bs4 import BeautifulSoup

# URL of the webpage to be scraped
url = "https://ca.finance.yahoo.com/quote/AMP/profile?p=AMP"

# Send a GET request to the URL and store the response in a variable
response = requests.get(url)

# Parse the HTML content of the response using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')

# Find the section containing the governance scores
governance_section = soup.find('section', {'class': 'Mt(30px) corporate-governance-container'})

# Find the required elements using their text content and extract their values
scores = governance_section.find('span', text='The pillar scores are')
score_text = scores[0].text.strip() if scores else ''
audit_score = score_text[score_text.find('Audit: ') + 6 : score_text.find(';', score_text.find('Audit: '))].strip()
shareholder_score = score_text[score_text.find('Shareholder Rights: ') + 20 : score_text.find(';', score_text.find('Shareholder Rights: '))].strip()
compensation_score = score_text[score_text.find('Compensation: ') + 14 :].strip()

# Print the extracted information
print("Corporate Governance Score:", governance_section.find('span', {'class': 'Va(m) Fw(600) D(ib) Lh(23px)'}).text.strip())
print("Audit and Risk Oversight Score:", audit_score)
print("Shareholders' Rights Score:", shareholder_score)
print("Compensation Score:", compensation_score)
print("Board Structure Score:", board_score)

This is the error

AttributeError                            Traceback (most recent call last)
<ipython-input-15-140e65f0615d> in <cell line: 17>()
     15 
     16 # Find the required elements using their text content and extract their values
---> 17 scores = governance_section.find('span', text='The pillar scores are')
     18 score_text = scores[0].text.strip() if scores else ''
     19 audit_score = score_text[score_text.find('Audit: ') + 6 : score_text.find(';', score_text.find('Audit: '))].strip()

AttributeError: 'NoneType' object has no attribute 'find'

And here is the html source code

Minneapolis, Minnesota.</p></section><section class="Mt(30px) corporate-governance-container"><h2 class="Fz(m) Lh(1) Fw(b) Mt(0) Mb(18px)"><span>Corporate Governance</span></h2><div><p class="Fz(s)"><span>Ameriprise Financial, Inc.’s ISS Governance QualityScore as of <span>April 1, 2023</span> is 5.</span>  <span>The pillar scores are Audit: 3; Board: 7; Shareholder Rights: 5; Compensation: 6.</span></p><div class="Mt(20px)"><span>Corporate governance scores courtesy of</span> <a href="https://issgovernance.com/quickscore" target="_blank" rel="noopener noreferrer" title="Institutional Shareholder Services (ISS)">Institutional Shareholder Services (ISS)</a>.  <span>Scores indicate decile rank relative to index or region. A decile score of 1 indicates lower governance risk, while a 10 indicates higher governance risk.</span></div></div></section></section></div></div><script>if (window.performance) {window.performance.mark && window.performance.mark('Col1-0-Profile');window.performance.measure && window.performance.measure('Col1-0-ProfileDone','PageStart','Col1-0-Profile');}</script></div><div>

Could you please help me out?

2 Answers 2

0

The error you're encountering is because governance_section is None. This occurs because the response you received from the server is not as expected. In fact, you are getting a <Response [404]>, which means the HTML does not contain the section you are trying to scrape.

To resolve this issue, modify your request as follows:

requests.get(url, headers={'user-agent': 'custom'})

This change will help you obtain a <Response [200]>, allowing you to parse the HTML and extract the information you need.

0

Yes, you need a user agent to connect to it.

I have 200 code with this :

import requests
from bs4 import BeautifulSoup

# URL of the webpage to be scraped
url = "https://ca.finance.yahoo.com/quote/AMP/profile/?p=AMP"

# Agent
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"

# Try to request the site
res = requests.get(url, headers={'user-agent': user_agent})
print(res)

Not the answer you're looking for? Browse other questions tagged or ask your own question.