Scraping Instagram User's External URL

I got thousands of Instagram URLs from SEoTools.
Next step for me is scraping Instagram User external URL for each user, which is defined by each user at setting. This URL is placed under each user's description, and it is an optional field.

Here is the example:
https://www.instagram.com/gucci/
I am trying to scrape "on.gucci.com/FW18"

So far, I know that Instagram page is created with JavaScript language.
Perhaps, I have to use Phantom JSCloud.
However, I have no idea how to use it. I am not a programmer. I have no idea how to determine Xpath. Can please someone help me to determine "exact function phrase" ?

Thanks

Hi,

Try this:
=RegexpFindOnUrl(A3;"""external_url"":""(.*?)""";1)

image

Thank you so much for your help.
It works! but, some fields return empty values.
If I take a few minutes break and try again, some of them return values.

Do you know why?

This is because Instagram prevents scraping. Happens to me after 800 or so posts scraped in a short timespan. Try using a proxy or do this task in batches?

Would setting random ms delay solve this problem?
if so, what's the range do you recommend?

I'm not sure, I think it depends on the source. Try and tell us about the results :slight_smile:

I set ms delay between 1000 and 2000. It works perfectly fine. Thank you so much for your help

1 Like