Using htmlagility pack to replace src values
Asked Answered
U

2

6

I'm using a CMS system for a website. My content contributors have put some very hefty images in the system and have then gone on to resize them in the cms so they are appropriate for the page or article. When a webuser hits that page, they download the full image, even though the contributor has resized the image. I have found a image resizer plugin, and all I need to do is add the width and height parameters behind the image name in src. Doing a search it looks like I should be using the html agility pack to achieve this but can someone help me finish off my code. Ive figured out how to find the img tags within the content, but I dont know how to append src with the width and height.

Old Tag

<img src="/IMG_3556E__sq2.jpg?n=9418" id="/IMG_3556E__sq2.jpg?n=9418" width="83px" height="83px" />

To this - notice src value has changed

<img src="/IMG_3556E__sq2.jpg?width=83&amp;height=83" id="/IMG_3556E__sq2.jpg?n=9418" width="83px" height="83px" />

This is my code so far. All I need is help within the if statement to say if the img tag contains a width or a height, append those to the src attribute.

ContentManager contentManager = new ContentManager();
ContentData Content = contentManager.GetItem(id);

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(Content.Html);

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//img/@src"))
{
    if (//img has a width or height, it means image has been resized) {
        //append that nodes src within the content.html with the ?width=x&height=x
    }
}
Unipod answered 9/7, 2012 at 20:53 Comment(0)
A
14

Try this:

static void Main(string[] args)
{
    var htmlDoc = new HtmlDocument();
    htmlDoc.Load(@"E:\Libs\HtmlAgilityPack.1.4.0\htmldoc.html");

    foreach(HtmlNode node in htmlDoc.DocumentNode
                                   .SelectNodes("//img[@src and (@width or @height)]"))
    {
        var src = node.Attributes["src"].Value.Split('?');

        var width = node.Attributes["width"].Value.Replace("px", "");

        var height = node.Attributes["height"].Value.Replace("px", "");

        node.SetAttributeValue("src",
                                src[0] +
                                string.Format("?width={0}&height{1}", width, height));
    }
}
Airwaves answered 9/7, 2012 at 21:6 Comment(4)
Thanks Leniel, this reads as though it will work. One thing I can't figure out is how to make changes to Content.html. I know you setattribute, but presumably that sets it in memory as when I response.write(content.html) it is still displaying the original htmlUnipod
Thanks again for the pointer.I think Save() is if you want to save a physical copy wth the alterations. Not sure how to do this in memory, on the fly for CMS packages. Will do a bit of reading, but should be a case of passing straight into another variable for html and displaying this.Unipod
Well, in the sample I'm running locally if I'd do response.write(htmlDoc) it would write the in memory modified version of the doc. Not sure what's your case.Airwaves
Thank you - I managed to get it outputing by putting in Content.Html = Content.Html.Replace(src[0], src[0] + string.Format("?width={0}&height{1}", width, height));Unipod
F
2

If you use an XPath that selects only nodes with src and width or height, you can omit the if:

foreach (HtmlNode node in doc.DocumentNode
    .SelectNodes("//img[@src and (@width or @height)]"))
{
    node.SetAttributeValue("src",  ...);
}

but be careful: SelectNodes returns null, if nothing matches - as far as I remember HtmlAgilityPack.

Fink answered 9/7, 2012 at 21:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.