0

I am trying to scrape all instances of text between tags with a particular class on a web page that dynamically updates. I am using selenium with a chrome WebDriver in Python.

In a normal browser, if I right click on the elements I want and go to 'Developer Tools>Inspect', I can see the tags I want, for example, as:

<span class="sCell valX poolX">2112</span>

With the number 2112 being what I want. These are nested within dozens of other outer tags. Note that if I choose 'Page Source' instead of 'Inspect' in the browser it shows:

<span class="sCell valX poolX" <% if(poolState !== "Y"){%> style="display: none"<%}%>><%=xPool%></span>

The problem is that I am getting an empty array when I use xPath to find this information.

Here is the relevant code in the simplest iteration of what I have tried:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

options = Options()
options.headless = True
driver = webdriver.Chrome(
    options=options, 
    executable_path=chrome_path
)

x_path = '//span[@class="sCell valX poolX"]'

wait = WebDriverWait(driver, 20)
driver.get(url)

wp = driver.find_elements(By.XPATH, x_path)

for n in wp: 
     print(wp.text)

I receive the error: AttributeError: 'list' object has no attribute 'text'

Please note that when I use:

from selenium.webdriver.support import expected_conditions as EC

wait.until(EC.visibility_of_element_located((By.XPATH, x_path)))

I get a TimeoutException


I can't help but assume I'm missing something very simple here. I don't have much experience with this, but it seems like a straightforward scrape.

Note that if I print driver.page_source, I get the same tags as 'Developer Tools>Page Source':

<span class="sCell valX poolX" <% if(poolState !== "Y"){%> style="display: none"<%}%>><%=xPool%></span>

1 Answer 1

0

First off, if you are using "WebDriverWait" to wait for the page to load, then you should do that after "driver.get". Secondly, your for loop is wrong, change "wp.text" for "n.text".

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

options = Options()
options.headless = True
driver = webdriver.Chrome(
    options=options, 
    executable_path=chrome_path
)

x_path = '//span[@class="sCell valX poolX"]'

driver.get(url)
wait = WebDriverWait(driver, 200)

wp = driver.find_elements(By.XPATH, x_path)

for n in wp: 
     print(n.text)
2
  • Thanks for the help! A little embarrassing to have two sloppy mistakes forever immortalized on Stack Overflow, but that will motivate me to be more careful in the future :). It's now running - and is even finding the correct number of tags - but unfortunately the print(n.text) statement is producing a blank for each tag instead of the text that is visible on the site and the 'inspect' code. Also the 'wait.until' still times out for the same xpath. Any thoughts are appreciated.
    – zicari
    Commented Jul 8 at 16:23
  • 1
    In addition to the error corrections found by @Toka47 I finally figured out that I needed the "inner text" of the elements in question. or n in wp: print(n.get_attribute('innerText'))
    – zicari
    Commented Jul 8 at 19:20

Not the answer you're looking for? Browse other questions tagged or ask your own question.