Load HTML file and force UTF8 with PHP -
i accessing external url specific content of xpath.
i tried several different ways achieve this, of them end presenting little problem. after big research, way:
i create stream context open file right headers: utf-8
$opts=array('http' => array('header' => 'accept-charset: utf-8, *;q=0')); $context=stream_context_create($opts); $html=file_get_contents($url,false,$context);
then, inside class, created domdocument object, load fetched html string, follows:
$this->dom->loadhtml(mb_convert_encoding($html, 'html-entities', "utf-8"), libxml_html_noimplied | libxml_html_nodefdtd);
it works fine in every case, strip away complex characters, á, ó, ç, etc..
example: "gobierno marroquí para" turns "gobierno marroqu para"
i tried loading html plain text prefix <?xml encoding...
, works fine, have issues further htmlpurifier operations.
any kind of information appreciated, not looking task me, right , efficient way. need understand can work it.
peace.
Comments
Post a Comment