扫描二维码下载沐宇APP

沐宇

微信扫码使用沐宇小程序

沐宇

spaCy鎬庝箞鍛藉悕瀹炰綋閾炬帴

扬州沐宇科技
2024-05-11 17:53:51
spaCy

鍦╯paCy涓紝鍙互浣跨敤set_extension鏂规硶鏉ヤ负瀹炰綋娣诲姞涓€涓嚜瀹氫箟鐨勯摼鎺ュ睘鎬с€備緥濡傦紝鍙互鍒涘缓涓€涓柊鐨勯摼鎺ュ睘鎬?code>linked_entity锛岀劧鍚庡皢鍏惰缃负鎵€闇€鐨勫疄浣撻摼鎺ャ€備笅闈㈡槸涓€涓ず渚嬩唬鐮侊細

import spacy

# 鍔犺浇妯″瀷
nlp = spacy.load("en_core_web_sm")

# 娣诲姞閾炬帴灞炴€?/span>
def add_linked_entity(doc):
    for ent in doc.ents:
        ent._.linked_entity = "https://en.wikipedia.org/wiki/" + ent.text.replace(" ", "_")
    return doc

# 灏嗛摼鎺ュ睘鎬ф坊鍔犲埌pipeline涓?/span>
nlp.add_pipe(add_linked_entity, last=True)

# 澶勭悊鏂囨湰
text = "Barack Obama was the 44th President of the United States."
doc = nlp(text)

# 鎵撳嵃瀹炰綋鍙婂叾閾炬帴
for ent in doc.ents:
    print(ent.text, ent.label_, ent._.linked_entity)

鍦ㄤ笂闈㈢殑绀轰緥涓紝鎴戜滑棣栧厛鍔犺浇浜嗕竴涓猻paCy妯″瀷锛岀劧鍚庡垱寤轰簡涓€涓柊鐨勫嚱鏁?code>add_linked_entity鏉ユ坊鍔犻摼鎺ュ睘鎬с€傛帴鐫€锛屾垜浠皢璇ュ嚱鏁版坊鍔犲埌浜唒ipeline涓紝鐒跺悗澶勭悊浜嗕竴涓寘鍚疄浣撶殑鏂囨湰銆傛渶鍚庯紝鎴戜滑鎵撳嵃浜嗘瘡涓疄浣撳強鍏堕摼鎺ュ睘鎬с€?/p>

扫码添加客服微信