Hello,
I am looking for a way to find if a domain has a XML-sitemap present and what the URL path is. I want to see the result in one single cell in Excel.
Hello,
I am looking for a way to find if a domain has a XML-sitemap present and what the URL path is. I want to see the result in one single cell in Excel.
Assuming the domain has a Sitemap: directive in their robots.txt
file, this regex will extract the URL of the XML sitemap:
=RegexpFindOnUrl("https://example.com/robots.txt", "(?i)Sitemap:\s*(?=.*)(.+)\s*", 1)
Enclose the above within Dump()
to list all sitemaps. For example, this should show 4 sitemap URLs for eBay.com:
=Dump( RegexpFindOnUrl("https://www.ebay.com/robots.txt", "(?i)Sitemap:\s*(?=.*)(.+)\s*", 1) )
Hope this helps!
-TimW
BTW, if the domain does not use a Sitemap: directive in their robots.txt
file (e.g. Newsweek.com), I usually start by looking for the most commonly used name of /sitemap.xml
. Since most sites have redirect rules in place to take you to the correct instance, this will usually identify the primary sitemap:
=UnshortUrl( "https://example.com/sitemap.xml" )
-TimW
Thnx,
Works like a charm!!