1

I'm trying to automate the download from this link: https://worldwide.espacenet.com/patent/search?q=cpc%20%3D%20"Y02B30%2F12"

Under the voice Filter -> Download -> Filters

Download button

The HTML code doesn't have any link and I also can't find the download link from the Network page in chrome developer tools. (Í've tried to replicate the requests in postman but it won't download anything).

I have some others restriction to download this file such as not being able to use selenium, or any other library/framework that uses webdrivers (limited by the virtual machine) Also we can't download any binarie due to VM limitations

6
  • 1
    Just use a headless browser, like Playwright or the older Selenium? Don't write code to do something the browser can already do, just write the code to make the browser do what you would do. Commented Jun 5 at 8:34
  • Try to download more results than the default 20, the Network tab will show you a request to the /3.2/rest-services/search endpoint that contains all the data you need. The website also caches the data so repeatedly downloading same data will not create a new request.
    – Jeyekomon
    Commented Jun 5 at 10:20
  • @Mike'Pomax'Kamermans As I said in the post I'm not able to use webdrivers and the headless mode still uses them. I've tried using Playwright since it says that it just uses API calls but you actually have to download the webdriver which I can't do on the virtual machine. Commented Jun 6 at 8:52
  • Of course you can, you first install Playwright, and then you tell it to install the browser(s) you want to use. E.g. after you've installed Playwright in your VM, you simply also run playwright install firefox or whatever browser you want to use. It runs that without needing any sort of superuser rights, because it's not a system application install, and it's not a webdriver like Selenium that needs to connect to your own Firefox etc, it's literally just a separate standalone browser that Playwright can use without permissions nonsense. Commented Jun 6 at 8:59
  • @Mike'Pomax'Kamermans The virtual machine is an instance of databricks web and we're blocked from downloading binaries. Commented Jun 6 at 14:43

0