Close all HTML tags (not only IMG)
Asked Answered
D

3

1

I saw this question and answer regarding closing img tags.

However, what if I want to close also other tags like link?

I tried to write

(<img|link[^>]+)(?<!/)>

But it wouldn't work

What is wrong?

Example:

<link href="myhref">
<img src="mysrc">

but not

<link href="myhref"/>
<img src="mysrc"/>
Drubbing answered 1/1, 2013 at 14:7 Comment(8)
You are using the wrong technology for working with HTML.Frameup
Show us the code and example input.Infant
@Frameup I wish it could be true. I am working in objective-c with a library of KissXML that wouldn't parse unclosed HTMLs..Drubbing
@Odelya - It is an XML library, not an HTML library. Use the right tool... Unclosed entities are illegal in XML, but some are legal in HTML.Frameup
@Frameup I know, but I need a library to READ and WRITE HTML at the same time. And this is the only one that I found. See my question #14085542Drubbing
Turning HTML into valid XML isn't as simple as just closing some tags.Infant
Looks like the accepted answer has a fork that should work for you. If it doesn't, have a word with the author - he might be able to extend the fork for your use.Frameup
@Daij-Djan - That's what I read. But from this question it doesn't appear to parse valid HTML with unclosed elements. Is that the case?Frameup
I
3

You need to limit the scope of your alternation. Otherwise the < will only be matched if the left part of the alternation matches, and [^>]+ will only be matched if the right part does.

(<(?:img|link)[^>]+)(?<!/)>

should fix this problem. (?:...) is a non-capturing group, i. e., just used for grouping, not for capturing. The replace operation (with \1/>) remains the same.

Impressionist answered 1/1, 2013 at 14:33 Comment(1)
Sorry for my ignorance but how can I use this in c#? I'm trying to do this with Regex.Replace() but I don't know how... text = Regex.Replace(text, @"<img[^>]*>", "(<(?:img|link)[^>]+)(?<!/)>");Synonym
B
1

You Need to use an HTML parser or libxml2 based parser. There is a libxml2 wrapper in objective-c called hpple. hpple can parse messy HTML without any problem.

Bernice answered 1/1, 2013 at 14:14 Comment(4)
I know. My problem was that I also need to read and write. And hpple provides only read. I am using KissXML insteadDrubbing
KissXML should be able to ... options:NSXMLDocumentTidyHTMLLaparotomy
I tried. I doesn't help. I initialize the document like this:DDXMLDocument *theDocument = [[DDXMLDocument alloc] initWithXMLString:content options:1 error:&error];Drubbing
KissXML just wraps libxml2 as well and tries it with html mode and also bundles CTidyLaparotomy
L
1

KissXML should be able to parse it ...
it wraps libxml2 in xml mode BUT falls back to html mode!

  • when you pass options:NSXMLDocumentTidyHTML it calls CTidy too

it WORKS fine :D really (as I keep saying ;))

- (void)processNode:(DDXMLNode*)node {
if(node.kind==DDXMLElementKind) {
    NSLog(@"%@", node.name);
    for (id child in node.children) {
        [self processNode:child];
    }
}
}

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
id sample = @"<link href=\"myhref\"><img src=\"mysrc\">";
id data = [sample dataUsingEncoding:NSUTF8StringEncoding];
DDXMLDocument *doc = [[DDXMLDocument alloc] initWithData:data options:DDXMLDocumentTidyHTML error:nil];
[self processNode:doc.rootElement];
}
Laparotomy answered 1/1, 2013 at 14:35 Comment(9)
Will this help treating entity reference like &nsbp?Drubbing
I tried. But this <img src="mysrc"> makes it fall. I initialize the document like this:DDXMLDocument *theDocument = [[DDXMLDocument alloc] initWithXMLString:content options:1 error:&error]; . Please note that I develop for iphoneDrubbing
I know, else the whole thing wouldnt be needed... it works fine with bad htmlLaparotomy
I just tried your code and get the following in error: Domain=DDXMLErrorDomain Code=1 "The operation couldn’t be completed. (DDXMLErrorDomain error 1.)" Please note that I put 1 in options and not DDXMLDocumentTidyHTMLDrubbing
that code works in multiple projects I dont know about... seems wrongLaparotomy
What do you suggest for me to check? Is there a way to get a more specific error message? I get it from error descriptionDrubbing
dont pass one .. if that doesnt help maybe debug it and step through!?Laparotomy
I am trying your advice to debug it. However on this line xmlDocPtr doc = xmlParseMemory([data bytes], [data length]); in DDXMLDocument it returns an error. I couldn't debug deeper. What could be the problem?Drubbing
doc is null. I see here: xmlDocPtr doc = xmlParseMemory([data bytes], [data length]); if (doc == NULL) { if (error) *error = [NSError errorWithDomain:@"DDXMLErrorDomain" code:1 userInfo:nil]; return nil; }Drubbing

© 2022 - 2024 — McMap. All rights reserved.