lxml模块对xpath定位的元素进行转HTML源码

2019-12-02
python模块

lxml模块对xpath定位的元素进行转HTML源码

2019-12-02

需求：

如何将xpath定位到的元素进行转为HTML源码

方法1：使用from lxml.html import tostring的tostring方法功能

from lxml.html import tostring
from lxml import etree

html_get = etree.HTML(resp_text)
div_ok = html_get.xpath('//div[@id="mw-content-text"]')[0]
div_content = tostring(div_ok).decode('utf-8')

方法2（推荐使用，经过我效率测试，使用etree返回的html使用xpath定位到的元素，还使用etree转换为HTML源码效率更快）：

from lxml import etree

html_get = etree.HTML(resp_text)
div_ok = html_get.xpath('//div[@id="mw-content-text"]')[0]
print(div_ok,type(div_ok))
div_content = etree.tostring(div_ok, pretty_print=True, method='html').decode('utf-8')  # 转为字符串

最后更新时间：2019-12-02 15:26:18
这里可以写作者留言，标签和 hexo 中所有变量及辅助函数等均可调用，示例：https://zhaojiafu.github.io/2019/12/02/cbb467b2802079ffbe86ff2ee57648cc/

赵家富

赏