0

I am currently trying to scrape a local web page, generated by my EV charger. I access it through it's IP address, which requires me to sign in. After signing in, I want to retrieve the data from the JS chart below. This data is shown in chunks (not even 1 complete day visible), but it goes way back (1 year +). I want to use this data, to compare my EV charging sessions with my available power in house.

enter image description here

However, I struggle so far to extract the data shown in the chart, and then make it iteratively go back in time, clicking the arrow below.

driver_path = r"C:\\chromedriver-win64\chromedriver.exe"

ACCOUNT = "[email protected]"
PASSWORD = "pw"

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
wait = WebDriverWait(driver, 10)
driver.get("http://192.168.1.245/#!/login")

    #login to my charger
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@type='text']"))).send_keys("[email protected]")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@type='password']"))).send_keys("pw")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@type='submit']"))).click()
#Using this code, I can extract the y-axis and x-axis, with the visible parameters. 
el = driver.find_element(By.XPATH, "//div[@id='powerManagementDashboard']//*[local-name()='svg']").text
print(el)

08:00
09:00
10:00
11:00
12:00
13:00
-10
0
10
20
30
40
50
kW
#go back in time clicking the left arrow.
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//i[@ng-click='ChartAddHour(-6)']"))).click()

But I can't work out how to scrape the most important section in the HTML, with an abundance of <g> tags, of which only one I need. In the image below the data is shown. I want to retrieve the datapoints that contain the time and the measured kW at that point. But how do I get to that specific <g> ? Or scrape all of those <g> tags, and clean the data later.

enter image description here

Wondering if anyone can help me out. Thanks in advance.

EDIT 1:

I've managed to access all the data inside. However, when I want to iterate back in time, it returns a StaleElementReferenceException: stale element not found in current frame.

This occurs right away when iterating for the second time, using this code:

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//i[@ng-click='ChartAddHour(-6)']"))).click()
try: 
    chartdata = driver.find_elements(By.XPATH, "//div[@id='powerManagementDashboard']//*[local-name()='svg']/*[name()='g']/*[name()='g']/*[name()='g']/*[name()='g']")
except:
    chartdata = None

chdata = []
for i in chartdata:
    store = { 
        'kW data': i.get_attribute("aria-label")
    } 
    chdata.append(store)

print(chdata)

which gives me:

[{'kW data': 'Building power 20:40:00 3.711'}, {'kW data': 'Building power 20:45:00 5.235'}, {'kW data': 'Building power 20:50:00 5.241'}, {'kW data': 'Building power 20:55:00 5.346'}, {'kW data': 'Building power 21:00:00 5.375'}, {'kW data': 'Building power 21:05:00 5.28'}, 

etcetera.

Not sure if this is the most efficient way to do it, but now I need to loop back 10-100s of times (using left-click arrow), to generate all data.

Any comments on the method so far?

5
  • if you click something (e.g. arrow to see older data, or to see newer data) then you have to run again find_elements() because selenium gives reference to objects in browser's memory and when you click then it moves objects in memory and old references can't find objects in memory - so you have to use again find_elements() to get new reference to objects in memory.
    – furas
    Commented Jul 5 at 12:13
  • 1
    I don't know why you use dictionary to keep data - it would be simpler to add directly .append( i.get_attribute("aria-label") ). and even with i.get_attribute("aria-label").replace("Building power ", "") or i.get_attribute("aria-label").split(" ", 2)[-1]
    – furas
    Commented Jul 5 at 12:16
  • BTW: sometimes Selenium works faster then JavaScript in browser and sometime you have to sleep() before you use again find_elements() because you may still get older references before JavaScript replace objects in memory. Or you have to catch error and run again find_elements() to get correct data - but time.sleep(1) is simpler.
    – furas
    Commented Jul 5 at 12:20
  • 2
    maybe page uses JavaScript to get data from some URL and maybe using DevTools in Chrome/Firefox (tab: Network, filter:XHR) you could find this url and use requests to get data without clicking buttons on page. It could run faster, and maybe it will send data in JSON so it can be simpler to extract data.
    – furas
    Commented Jul 5 at 12:25
  • @furas thanks! Especially the last comment, I will look into that. I have managed to scrape data from the last 365 days, using the code above (combined with 'sleep' when being too quick). I will now look into your last suggestion, would be great if I could skip all the extra hurdles.
    – DvdV
    Commented Jul 15 at 12:44

0