XML sitemap path

Hello,

I am looking for a way to find if a domain has a XML-sitemap present and what the URL path is. I want to see the result in one single cell in Excel.

Assuming the domain has a Sitemap: directive in their robots.txt file, this regex will extract the URL of the XML sitemap:

=RegexpFindOnUrl("https://example.com/robots.txt", "(?i)Sitemap:\s*(?=.*)(.+)\s*", 1)

Enclose the above within Dump() to list all sitemaps. For example, this should show 4 sitemap URLs for eBay.com:

=Dump( RegexpFindOnUrl("https://www.ebay.com/robots.txt", "(?i)Sitemap:\s*(?=.*)(.+)\s*", 1) )

Hope this helps!
-TimW

BTW, if the domain does not use a Sitemap: directive in their robots.txt file (e.g. Newsweek.com), I usually start by looking for the most commonly used name of /sitemap.xml. Since most sites have redirect rules in place to take you to the correct instance, this will usually identify the primary sitemap:

=UnshortUrl( "https://example.com/sitemap.xml" )

-TimW

Thnx,

Works like a charm!! :grinning:

1 Like