Resolve to Home Page

Is there a single, full-proof method for determining a site's home page when all you know is the domain?

I'm frequently faced with this challenge, but sometimes find exceptions. It sounds easy enough, but you don't always know if the site uses http or https, defaults to www subdomain, or even requires a full file name rather than just a slash. My usual approach is to assume the site has proper redirects in place to do the work for me, so this works most of the time:

UnshortUrl( UrlProperty($A1, "absolute") )

Lately, I come across some sites that appear to prevent simple browsing from anything except legitimate desktop or mobile browsers. For example, here's what the above formula renders for these domains:

footlocker.com → Error:403 Forbidden http://www.footlocker.com/

dickssportinggoods.com → http://www.dickssportinggoods.com/UnsupportedBrowserErrorView?langId=-1&storeId=15108

networksolutions.com → timeout.html'

I've tried changing my User-Agent in GlobalSettings config, as well as various combinations of CollectCookies, IntervalBetweenRequests, and Accept-Language. Perhaps there is a specific Request Header I can set to spoof the server into believing I'm a real browser?

I'd appreciate any suggestions from the SeoTools Community.

This issue is discussed below and you are spot on about the header settings:
https://stackoverflow.com/questions/13670692/403-forbidden-with-java-but-not-web-browser

It works for me without UnshortUrl:

=UrlProperty("footlocker.com";"absolute"))

Perhaps an if statement to ignore it if the request returns the 403 error? Might be easier than messing around with the HTTP Settings to imitate a real browser?

Thanks for the reference link, Victor.

It sounds like I'm on the right track, but I don't know how to set a User-Agent string for SeoTools functions which don't support GlobalSettings config or HttpSettings() (e.g. HtmlTitle(), UnshortUrl(), HttpStatus(), DownloadFile(), etc.).

When resolving for home pages, I test for errors as follows:

=RegexpReplace( UnshortUrl( UrlProperty("champssports.com", "absolute") ), "^(?!http).*", UrlProperty("champssports.com", "absolute") )

This way, returned results not starting with http* will be replaced with the absolute URL instead. Unfortunately, excluding UnshortUrl() would not catch home page exceptions such as for adidas.com or fanatics.com.

I used only the base formula in my post to ask if anyone has a better solution, since other HTTP functions such as HttpStatus() will still render the undesired result.

Hopefully, this uservoice request can be added in the near future. In addition to this use case, that feature would also allow SEOs to properly detect issues such as content cloaking based upon user-agent, which is forbidden by Google.