Python - remove excessive html tags -

- August 15, 2010

so i'm having text:

<i>this article written </i><a href="http://google.com"><i>test</i></a><i>.</i>

i think html, however, want clean up, remove excessive <i> tags , simplify single <i> tag:

<i>this article written <a href="http://google.com">test</a>.</i>

i tried clean myself, i'd need ahead text, , haven't had success this. there package can use or way can or i'd have manually it?

thank you

the use of html parser reliable solution. able cope tags split across many lines.

the following solve example, not more...

def outeri(text):     outer = re.search("(.*?)(\<i\>.*<\/i\>)(.*)", text)      if outer:         return "%s<i>%s</i>%s" % (outer.group(1), re.sub(r"(\<\/?[ii]\>)", "", outer.group(2)), outer.group(3))     else:         return text  print outeri('<i>this article written </i><a href="http://google.com"><i>test</i></a><i>.</i>') print outeri('text before <i>this article written </i><a href="http://google.com"><i>test</i></a><i>.</i> text after')

Search This Blog

Alconcel

Python - remove excessive html tags -

Comments

Post a Comment

Popular posts from this blog

c# - Where does the .ToList() go in LINQ query result -

Listeners to visualise results of load test in JMeter -

android - CollapsingToolbarLayout: position the ExpandedText programmatically -