c# - text of node's inner text and first child nodes text -
i have multiple links in page of structure this:
<a ....> <b>text need</b> text need </a>
and want extract string example code above "text needalso text need" extract second part, i'm not sure how select text inside b tags well, i'm using this:
var link_list = doc.documentnode.selectnodes(@"/a/text()"); foreach (var link in link_list) { console.writeline(link.innertext); }
should perhaps instead not text html of , remove tags regex , extract text then, or there other ways?
accessing innertext
property of <a>
should give text nodes @ once :
var html = @"<a ....> <b>text need</b> text need </a>"; var doc = new htmldocument(); doc.loadhtml(html); var link_list = doc.documentnode.selectnodes("/a"); foreach (var link in link_list) { console.writeline(link.innertext); }
or if need direct child text nodes , grand child text nodes, try way :
var link_list = doc.documentnode.selectnodes("/a"); foreach (var link in link_list) { var texts = link.selectnodes("text() | */text()"); console.writeline(string.join("", texts.select(o => o.innertext))); }
output :
text need text need
Comments
Post a Comment