Which is the best HTML tidy pack? Is there any option in HTML agility pack to make HTML webpage tidy? [closed]
Asked Answered
S

2

2

I am using html agility pack to parse html tabular information. Now there is some html content with missing ending tags and from such page because of missing ending tags html agility pack does not parse information properly.So I want to insert ending tags where there are missing ending tags so html agility pack parse information properly. So to insert the missing ending tags what should I do ?Should I do write my own code for that or use html tidy pack to do that ?

If html tidy pack then which is the best html tidy pack,and how to use it any example if possible ? And if my own code than what it can be like ?

Is there any option in html agility pack which can make us able to first make the html page tidy and then parse the webpage.

Serialize answered 22/3, 2010 at 8:24 Comment(0)
S
7

In Html Agility Pack I could not find any option that make html page tidy.There is one option that inserts the missing closing tags but it works in some html page only.That Option in html agility pack is,

  HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
      doc.OptionFixNestedTags=true;

I have also tried regex for that but it also work for some html page only.

So I found the best html tidy pack is :

http://www.devx.com/dotnet/Article/20505/1763/page/2.

We can see there : how to import the dll and how to use that tidy pack, there is sample code also available. It is great at all.It can insert the missing closing tags and makes your html page tidy .

Thanks for helping everyone..

Serialize answered 24/3, 2010 at 12:43 Comment(0)
W
0

I have found HTML Tidy (www.html-tidy.org) to do the best job of tidying and cleaning HTML.

The different binaries are here -> http://binaries.html-tidy.org

Also there are wrappers for HTML Tidy in many languages. I use one called TidyHtml5ManagedRepack for C#.

I have specific needs to clean poorly formed HTML and also compare it to the same or similar HTML that gets adjusted via javascript in different browsers. HTML Tidy allows me to clean the HTML to a state where its normal / normalised so I can then compare it to the same HTML that was adjusted by other browsers to have the confidence that it is most likely the same.

Wilmer answered 11/7, 2020 at 11:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.