Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

beautifulsoup: iterating over bs4.BeautifulSoup returns Tag, but Mypy thinks that bs4.element.PageElement type is returned #8369

Open
AIGeneratedUsername opened this issue Jul 22, 2022 · 3 comments
Labels
stubs: false positive Type checkers report false errors

Comments

@AIGeneratedUsername
Copy link

AIGeneratedUsername commented Jul 22, 2022

The code is correct, but fails with Mypy:

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""
from bs4 import BeautifulSoup, SoupStrainer

soup = BeautifulSoup(html_doc, 'lxml', parse_only=SoupStrainer("a"))
reveal_type(soup)
for el in soup:
    print(type(el))
    reveal_type(el)
    print(el.get("href"))

Mypy output

_local.py:17:13: note: Revealed type is "bs4.BeautifulSoup"
_local.py:19:17: note: Revealed type is "bs4.element.PageElement"
_local.py:20:11: error: "PageElement" has no attribute "get"  [attr-defined]
        print(el.get("href"))
              ^
Found 1 error in 1 file (checked 1 source file)

Python output if you run the code

<class 'bs4.element.Tag'>
http://example.com/elsie
<class 'bs4.element.Tag'>
http://example.com/lacie
<class 'bs4.element.Tag'>
http://example.com/tillie

Expected Mypy output

soup = BeautifulSoup(...
for item in soup:
  reveal_type(item)  #  <-- type must be `Tag`

Versions

types-beautifulsoup4 = "4.11.4"
beautifulsoup4 = "4.11.1"

@AIGeneratedUsername AIGeneratedUsername changed the title beautifulsoup: iterating over bs4.BeautifulSoup must return Tag, but "bs4.element.PageElement" type is returned Jul 22, 2022
@AIGeneratedUsername
Copy link
Author

AIGeneratedUsername commented Jul 22, 2022

I did not understand whether this was planned to fix in #8356 (comment), so sorry if this issue is a duplicate. I see that those pull requests were merged, so perhaps this requires a separate fix.

@JelleZijlstra
Copy link
Member

Yes, this seems like a separate bug. Thanks for the precise report!

@srittau srittau added the stubs: false positive Type checkers report false errors label Jul 22, 2022
@AIGeneratedUsername AIGeneratedUsername changed the title beautifulsoup: iterating over bs4.BeautifulSoup must return Tag, but bs4.element.PageElement type is returned Jul 23, 2022
@TheQuinbox
Copy link

Have there been any updates on this? I'm getting bit by it now, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stubs: false positive Type checkers report false errors
4 participants