扫描二维码下载沐宇APP

沐宇

微信扫码使用沐宇小程序

沐宇

bloom filter娴呮瀽(鍩烘湰姒傚康,姒傜巼鍒嗘瀽,婧愮爜鍒嗘瀽)

扬州沐宇科技
2023-09-19 06:04:48

Bloom filter鏄竴绉嶆鐜囧瀷鏁版嵁缁撴瀯锛岀敤浜庡垽鏂煇涓厓绱犳槸鍚﹀睘浜庝竴涓泦鍚堛€傚畠鍙互蹇€熷湴妫€绱㈠厓绱狅紝鑰屼笉闇€瑕佸瓨鍌ㄥ疄闄呯殑鍏冪礌鏈韩锛屽洜姝ゅ叿鏈夊緢灏忕殑瀛樺偍绌洪棿銆?/p>

鍩烘湰姒傚康锛?/p>

  1. 甯冮殕杩囨护鍣ㄤ娇鐢ㄤ竴涓綅鏁扮粍锛坆itmap锛夋潵琛ㄧず闆嗗悎锛屽垵濮嬫椂鎵€鏈変綅閮借缃负0銆?/p>

  2. 閫氳繃澶氫釜鍝堝笇鍑芥暟灏嗗厓绱犳槧灏勫埌浣嶆暟缁勭殑涓嶅悓浣嶇疆涓婏紝灏嗗搴斾綅缃殑浣嶈缃负1銆?/p>

  3. 褰撹鏌ヨ涓€涓厓绱犳椂锛屽皢璇ュ厓绱犻€氳繃鐩稿悓鐨勫搱甯屽嚱鏁版槧灏勫埌浣嶆暟缁勭殑鐩稿簲浣嶇疆锛屽鏋滄墍鏈夊搴斾綅缃殑浣嶉兘涓?锛屽垯琛ㄧず璇ュ厓绱犲彲鑳藉瓨鍦ㄤ簬闆嗗悎涓紝鍚﹀垯涓€瀹氫笉瀛樺湪銆?/p>

姒傜巼鍒嗘瀽锛?/p>

鐢变簬甯冮殕杩囨护鍣ㄤ娇鐢ㄥ涓搱甯屽嚱鏁拌繘琛屾槧灏勶紝鍥犳鍙兘浼氬瓨鍦ㄥ搱甯屽啿绐佺殑鎯呭喌銆傚亣璁惧竷闅嗚繃婊ゅ櫒涓湁n涓厓绱狅紝浣嶆暟缁勭殑闀垮害涓簃锛屼娇鐢╧涓搱甯屽嚱鏁帮紝鍒欐瘡涓厓绱犳槧灏勫埌浣嶆暟缁勭殑浣嶇疆鐨勬鐜囦负1/m锛屽洜姝や竴涓綅涓?鐨勪綅缃殑姒傜巼涓?1-1/m)^kn銆傚綋鏌ヨ涓€涓厓绱犳椂锛屽鏋滄墍鏈夊搴斾綅缃殑浣嶉兘涓?锛屽垯琛ㄧず璇ュ厓绱犲彲鑳藉瓨鍦ㄤ簬闆嗗悎涓€備絾鏄湁鍙兘瀛樺湪鍝堝笇鍐茬獊锛屽鑷村叾浠栧厓绱犱篃灏嗗搴斾綅缃殑浣嶇疆涓?锛屽洜姝ゅ瓨鍦ㄤ竴瀹氱殑璇垽鐜囥€?/p>

婧愮爜鍒嗘瀽锛?/p>

浠ヤ笅鏄竴涓畝鍗曠殑甯冮殕杩囨护鍣ㄧ殑绀轰緥浠g爜锛?/p>

from bitarray import bitarray
import mmh3
class BloomFilter:
def __init__(self, size, num_hashes):
self.size = size
self.num_hashes = num_hashes
self.bit_array = bitarray(size)
self.bit_array.setall(0)
def add(self, item):
for i in range(self.num_hashes):
index = mmh3.hash(item, i) % self.size
self.bit_array[index] = 1
def contains(self, item):
for i in range(self.num_hashes):
index = mmh3.hash(item, i) % self.size
if self.bit_array[index] == 0:
return False
return True
# 浣跨敤绀轰緥
bf = BloomFilter(1000000, 5)
bf.add("apple")
print(bf.contains("apple"))  # 杈撳嚭True
print(bf.contains("banana")) # 杈撳嚭False

杩欐浠g爜浣跨敤浜嗙涓夋柟搴揵itarray鍜宮mh3鏉ュ疄鐜板竷闅嗚繃婊ゅ櫒銆俠itarray鐢ㄤ簬琛ㄧず浣嶆暟缁勶紝mmh3鐢ㄤ簬璁$畻鍝堝笇鍊笺€傚湪鍒濆鍖栧竷闅嗚繃婊ゅ櫒鏃讹紝闇€瑕佹寚瀹氫綅鏁扮粍鐨勯暱搴﹀拰鍝堝笇鍑芥暟鐨勪釜鏁般€俛dd鏂规硶鐢ㄤ簬灏嗗厓绱犳坊鍔犲埌甯冮殕杩囨护鍣ㄤ腑锛宑ontains鏂规硶鐢ㄤ簬鍒ゆ柇涓€涓厓绱犳槸鍚﹀瓨鍦ㄤ簬甯冮殕杩囨护鍣ㄤ腑銆傚湪鏌ヨ鍏冪礌鏃讹紝闇€瑕佷娇鐢ㄧ浉鍚岀殑鍝堝笇鍑芥暟杩涜璁$畻锛屽苟妫€鏌ュ搴斾綅缃殑浣嶆槸鍚︿负1銆?/p>

扫码添加客服微信