Parse Links from Sites with Pagination embedded by Google's Custom Search script

gromex · March 26, 2018, 10:34am

Continuing the discussion from:

XpathOnUrl or CsQueryOnUrl for Infinite Scroll pages +Work-Around:

The answer for that post worked and it was a custom connector which was really great for the SeoTools team to do for no additional charge. Maybe someone can help with a similar situation?

This time it's infinite pagination? If that makes sense.

Here's the source URL I'm testing:
https://pastebin.com/search?q=excel

Any idea how to run a query on an page with infinite pagination?

The reason I say "infinite" pagination is because no matter what page you click on the front-end, the url is static. So using my old methods of modifying the URL structure will not work for this one, at least not that I know of.

Looking further into it, I just realized It's results based off of Google
Custom Search. So an option might be to just use the Google Search connector while including only results from pastebin specifically?

But besides that option, Is there a way to paginate a Google Custom Search on the site it's being hosted on? Without going into google directly?

I'm not looking for anything in particular, but if I had to, how can I paginate the link I sent and return all the hrefs from the results?

Any info would be appreciated! Thanks!

diskborste · March 28, 2018, 8:47pm

Hi,

If I understand it correctly, scraping with infinite scroll requires a headless browser to manage the javascript which triggers the scrolling. Perhaps an external service like PhantomJS or Import.IO can solve this?