Recently, I had to go through a bunch of legacy pages to update some links. These links were nested in rich text fields and there were thousands of them, so I had to come up with a plan to batch update these. I ended up using Sitecore Powershell and wrote a RegEx expression to parse out all of the link tag href attribute values and then updated them based on logic specific to the website’s requirements.
I’m no stranger to RegEx, but no expert either, so coming up with a workable pattern was a bit of a challenge. I ended up playing around with a RegEx build and test tool ( https://regexr.com/ ) that made the process a lot easier. After I came up with a RegEx expression that did what I needed it to do, I simply looped through all of the pages I needed to update, parsed out the links, and updated them.
All in all, the process was pretty fast and saved countless hours trying to do this manually.
Below is a sample script to help you get started. Note that this can be expanded or adjusted to meet the needs of your specific problem. I’ve left comments in the script explaining what’s going on and where adjustments can be made.
# The content path where the items to update are found.
$rootPath = "/sitecore/content/TENANT/SITE/path/to/folder"
# Loop through all of the child items of the $rootPath and grab any that meet the filter's requirements.
# In this case, I filtered based on template name, but you could change this to anything you want.
Get-ChildItem -Recurse $rootPath | Where-Object { ($_.TemplateName -eq "TEMPLATE NAME") } | ForEach-Object {
# In order to avoid looping through items that don't need updates, I'm only checking items whose
# rich text fields (Body) contain an href attribute. You could further filter here if you only
# want to update specific links (i.e: links of a particular domain, path, etc.)
if($_["Body"].Contains("href=")){
# The RegEx expression. Note that the first "Group" matches the href attribute.
$internalLinkRegexString = '<a.*?href=["`''](.*?)["`''][^>]*>.*?</a>'
$internalLinks = [regex]::matches($_["Body"] , $internalLinkRegexString)
# If the RegEx returned any results we'll loop through them.
if($internalLinks.Success){
foreach($internalLink in $internalLinks){
$hrefAttribute = $internalLink.Groups[1].Value
# Update the $newHrefAttribute variable with what you want the new link to be.
# This can be expanded to include logic (i.e: if you wanted to map certain link patterns to
# certain pages, do that here.
$newHrefAttribute = ""
# Finally, update the item in Sitecore.
# Additionally, you can add a try catch here with a CancelEdit call if something goes wrong.
# As I was updating these on a local environment for a site that has not yet launched, I didn't bother.
$bodyUpdate = $_["Body"].Replace($hrefAttribute, $newHrefAttribute)
$_.Editing.BeginEdit()
$_["Body"] = $bodyUpdate
$_.Editing.EndEdit()
}
}
}
}

Leave a comment