The other day I needed to get the URLs for all pages in my blog for some PowerShell scripting I wanted to do Like most websites, this blog has a sitemap and I wanted to use that as a source.
As I could not find any existing PowerShell scripts on the web that I could use, I just wrote one myself.
Now I like to share this script with you.
At the end of the article, you will find the link to the source code.
Basic usage
Just execute the script with an URL to an XML Sitemap.
Which results in something like this:
Displaying only the first 3 entries
Note
Under the section XML tag definitions
the protocol states that the lastmod
, changefreq
and priority
attributes are optional,
so they can be missing in the sitemap.
Now you can do basic PowerShell manipulation of the result set like sorting, selecting, filtering and formatting.
For example:
Outputs:
Ignoring Sitemap Index entries
A sitemap can also be a Sitemap Index File. The file then contains links to other sitemap files.
By default, the script will follow these links.
If you don’t want this to happen you can set the NoFollow
switch.
Source code
The complete source code is available as a GitHub Gist.