c# - text of node's inner text and first child nodes text -


i have multiple links in page of structure this:

<a ....>     <b>text need</b>     text need </a> 

and want extract string example code above "text needalso text need" extract second part, i'm not sure how select text inside b tags well, i'm using this:

var link_list = doc.documentnode.selectnodes(@"/a/text()"); foreach (var link in link_list) {    console.writeline(link.innertext); } 

should perhaps instead not text html of , remove tags regex , extract text then, or there other ways?

accessing innertext property of <a> should give text nodes @ once :

var html = @"<a ....>     <b>text need</b>     text need </a>"; var doc = new htmldocument(); doc.loadhtml(html); var link_list = doc.documentnode.selectnodes("/a"); foreach (var link in link_list) {     console.writeline(link.innertext); } 

or if need direct child text nodes , grand child text nodes, try way :

var link_list = doc.documentnode.selectnodes("/a"); foreach (var link in link_list) {     var texts = link.selectnodes("text() | */text()");     console.writeline(string.join("", texts.select(o => o.innertext))); } 

output :

text need text need 

Comments

Popular posts from this blog

How has firefox/gecko HTML+CSS rendering changed in version 38? -

javascript - Complex json ng-repeat -

jquery - Cloning of rows and columns from the old table into the new with colSpan and rowSpan -