當前位置：首頁 » 編程語言 » apriori演算法python實現

apriori演算法python實現

發布時間: 2021-03-18 03:15:36

㈠ python 有apriori演算法的模塊嗎

Apriori演算法是數據挖掘中頻發模式挖掘的鼻祖，從60年代就開始流行，其演算法思想也十分簡單樸素，首先挖掘出長度為1的頻繁模式，然後k=2
將這些頻繁模式合並組成長度為k的頻繁模式，算出它們的頻繁次數，而且要保證其所有k-1長度的子集也是頻繁的，值得注意的是，為了避免重復，合並的時候，只合並那些前k-2個字元都相同，而k-1的字元一邊是少於另一邊的。
以下是演算法的Python實現：

[python] view plain print?
__author__ = 'linfuyuan'
min_frequency = int(raw_input('please input min_frequency:'))
file_name = raw_input('please input the transaction file:')
transactions = []

def has_infrequent_subset(candidate, Lk):
for i in range(len(candidate)):
subset = candidate[:-1]
subset.sort()
if not ''.join(subset) in Lk:
return False
lastitem = candidate.pop()
candidate.insert(0, lastitem)
return True

def countFrequency(candidate, transactions):
count = 0
for transaction in transactions:
if transaction.issuperset(candidate):
count += 1
return count

with open(file_name) as f:
for line in f.readlines():
line = line.strip()
tokens = line.split(',')
if len(tokens) > 0:
transaction = set(tokens)
transactions.append(transaction)
currentFrequencySet = {}
for transaction in transactions:
for item in transaction:
time = currentFrequencySet.get(item, 0)
currentFrequencySet[item] = time + 1
Lk = set()
for (itemset, count) in currentFrequencySet.items():
if count >= min_frequency:
Lk.add(itemset)
print ', '.join(Lk)

while len(Lk) > 0:
newLk = set()
for itemset1 in Lk:
for itemset2 in Lk:
cancombine = True
for i in range(len(itemset1)):
if i < len(itemset1) - 1:
cancombine = itemset1[i] == itemset2[i]
if not cancombine:
break
else:
cancombine = itemset1[i] < itemset2[i]
if not cancombine:
break
if cancombine:
newitemset = []
for char in itemset1:
newitemset.append(char)
newitemset.append(itemset2[-1])
if has_infrequent_subset(newitemset, Lk) and countFrequency(newitemset, transactions) >= min_frequency:
newLk.add(''.join(newitemset))
print ', '.join(newLk)
Lk = newLk

㈡ python哪個包實現apriori

如連接中的例子，雖然新的{I1, I2, I3, I4}項集滿足子集{I1, I2, I3}; {I1, I2, I4} 都是頻繁項集，但其他子集也得滿足，這里特指剩下兩個{I1, I3, I4},{I2, I3, I4}。所以驗證一下他們，如果他們不滿足，可根據定理1，新的項集也肯定不頻繁。

所以剪枝的過程就是驗證Ck中所有項集的所有k-1子集是否都頻繁（只要看看他們是不是在Lk-1集合中即可），這樣雖然要檢查很多遍，但不需要對整個資料庫進行遍歷就能篩去許多不滿足的情況。

上述方法是經典的Apriori演算法，這兩個步驟在k較高（3或以上）時效果非常好，因為商品同時存在的可能性會隨k增大顯著減小。

但是在k=2的時候（k=1用不到Apriori演算法，必須遍歷一遍資料庫，相當於「鏈引發」），因為1項集一般都是頻繁的，所以上述兩個步驟基本上相當於沒有用，還得遍歷C(n,2)次資料庫，n為頻繁1項集的數量。

㈢ apriori演算法用什麼程序實現

你說的是什麼語言吧，這樣問也不對，既然是演算法，那麼用什麼語言都能實現。

㈣有大神會用Python做關聯規則apriori演算法嗎

#include #include using namespace std; class Vector { private: int size, length; int * data; public: Vector(int input_size) { size = input_size; length = 0; data = new int[size]; } }; int main() { int n; cin >> n; Vector arr(n)...

㈤ apriori里python調用了哪些庫有哪些關鍵函數

第一，apriori只是一種挖掘演算法，沒有特定的只能用pyton或者某一種語言；

apriori演算法的邏輯流程

首先找出所有的頻集，這些項集出現的頻繁性至少和預定義的最小支持度一樣。然後由頻集產生強關聯規則，這些規則必須滿足最小支持度和最小可信度。然後使用第1步找到的頻集產生期望的規則，產生只包含集合的項的所有規則，其中每一條規則的右部只有一項，這里採用的是中規則的定義。一旦這些規則被生成，那麼只有那些大於用戶給定的最小可信度的規則才被留下來。為了生成所有頻集，使用了遞歸的方法。

（1） L1 = find_frequent_1-itemsets(D);

（2） for (k=2;Lk-1 ≠Φ ;k++) {

（3） Ck = apriori_gen(Lk-1 ,min_sup);

（4） for each transaction t ∈D{//scan D for counts

（5） Ct = subset(Ck,t);//get the subsets of t that are candidates

（6） for each candidate c ∈ Ct

（7） c.count++;

（8）}

（9） Lk ={c ∈ Ck|c.count≥min_sup}

（10）}

（11） return L= ∪ k Lk;

可能產生大量的候選集,以及可能需要重復掃描資料庫，是Apriori演算法的兩大缺點。

從邏輯上看，會用到的庫：

字元處理的庫
資料庫處理的庫
集合運算的庫
概率期望運算的庫（入numpy）

㈥ python apriori包怎麼使用

classApriori(object):def__init__(self,filename,min_support,item_start,item_end):self.filename=filenameself.min_support=min_support#最小支持度self.min_confidence=50self.line_num=0#item的行數self.item_start=item_start#取哪行的itemself.item_end=item_endself.location=[[i]foriinrange(self.item_end-self.item_start+1)]self.support=self.sut(self.location)self.num=list(sorted(set([jforiinself.locationforjini])))#記錄itemself.pre_support=[]#保存前一個support,location,numself.pre_location=[]self.pre_num=[]self.item_name=[]#項目名self.find_item_name()self.loop()self.confidence_sup()defdeal_line(self,line):"提取出需要的項"return[i.strip()foriinline.split('')ifi][self.item_start-1:self.item_end]deffind_item_name(self):"根據第一行抽取item_name"withopen(self.filename,'r')asF:forindex,lineinenumerate(F.readlines()):ifindex==0:self.item_name=self.deal_line(line)breakdefsut(self,location):"""輸入[[1,2,3],[2,3,4],[1,3,5]]輸出每個位置集的support[123,435,234]"""withopen(self.filename,'r')asF:support=[0]*len(location)forindex,lineinenumerate(F.readlines()):ifindex==0:continue#提取每信息item_line=self.deal_line(line)forindex_num,iinenumerate(location):flag=0forjini:ifitem_line[j]!='T':flag=1breakifnotflag:support[index_num]+=1self.line_num=index#一共多少行,出去第一行的item_namereturnsupportdefselect(self,c):"返回位置"stack=[]foriinself.location:forjinself.num:ifjini:iflen(i)==c:stack.append(i)else:stack.append([j]+i)#多重列表去重importitertoolss=sorted([sorted(i)foriinstack])location=list(sfors,_initertools.groupby(s))returnlocationdefdel_location(self,support,location):"清除不滿足條件的候選集"#小於最小支持度的剔除forindex,iinenumerate(support):ifiself.min_confidence:print','.join(s),'->>',self.item_name[each_location[index]],'min_support:',str(support)+'%','min_confidence:',str(confidence)+'%'defmain():c=Apriori('basket.txt',14,3,13)d=Apriori('simple.txt',50,2,6)if__name__=='__main__':main()Apriori(filename,min_support,item_start,item_end)參數說明filename:(路徑)文件名min_support:最小支持度item_start:item起始位置item_end:item結束位置importaprioric=apriori.Apriori('basket.txt',11,3,13)輸出：

㈦如何實現apriori演算法

java">importjava.util.HashMap;
importjava.util.HashSet;
importjava.util.Iterator;
importjava.util.Map;
importjava.util.Set;
importjava.util.TreeMap;
/**
*<B>關聯規則挖掘：Apriori演算法</B>
*
*<P>按照Apriori演算法的基本思想來實現
*
*@authorking
*@since2013/06/27
*
*/
publicclassApriori{
	privateMap<Integer,Set<String>>txDatabase;//事務資料庫
	privateFloatminSup;//最小支持度
	privateFloatminConf;//最小置信度
	privateIntegertxDatabaseCount;//事務資料庫中的事務數
	
	privateMap<Integer,Set<Set<String>>>freqItemSet;//頻繁項集集合
	privateMap<Set<String>,Set<Set<String>>>assiciationRules;//頻繁關聯規則集合
	
	publicApriori(
	Map<Integer,Set<String>>txDatabase,
	FloatminSup,
	FloatminConf){
	this.txDatabase=txDatabase;
	this.minSup=minSup;
	this.minConf=minConf;
	this.txDatabaseCount=this.txDatabase.size();
	freqItemSet=newTreeMap<Integer,Set<Set<String>>>();
	assiciationRules=newHashMap<Set<String>,Set<Set<String>>>();
	}
	
	/**
	*掃描事務資料庫，計算頻繁1-項集
	*@return
	*/
	publicMap<Set<String>,Float>getFreq1ItemSet(){
	Map<Set<String>,Float>freq1ItemSetMap=newHashMap<Set<String>,Float>();
	Map<Set<String>,Integer>candFreq1ItemSet=this.getCandFreq1ItemSet();
	Iterator<Map.Entry<Set<String>,Integer>>it=candFreq1ItemSet.entrySet().iterator();
	while(it.hasNext()){
	Map.Entry<Set<String>,Integer>entry=it.next();
	//計算支持度
	Floatsupported=newFloat(entry.getValue().toString())/newFloat(txDatabaseCount);
	if(supported>=minSup){
	freq1ItemSetMap.put(entry.getKey(),supported);
	}
	}
	returnfreq1ItemSetMap;
	}
	
	/**
	*計算候選頻繁1-項集
	*@return
	*/
	publicMap<Set<String>,Integer>getCandFreq1ItemSet(){
	Map<Set<String>,Integer>candFreq1ItemSetMap=newHashMap<Set<String>,Integer>();
	Iterator<Map.Entry<Integer,Set<String>>>it=txDatabase.entrySet().iterator();
	//統計支持數，生成候選頻繁1-項集
	while(it.hasNext()){
	Map.Entry<Integer,Set<String>>entry=it.next();
	Set<String>itemSet=entry.getValue();
	for(Stringitem:itemSet){
	Set<String>key=newHashSet<String>();
	key.add(item.trim());
	if(!candFreq1ItemSetMap.containsKey(key)){
	Integervalue=1;
	candFreq1ItemSetMap.put(key,value);
	}
	else{
	Integervalue=1+candFreq1ItemSetMap.get(key);
	candFreq1ItemSetMap.put(key,value);
	}
	}
	}
	returncandFreq1ItemSetMap;
	}
	
	/**
	*根據頻繁(k-1)-項集計算候選頻繁k-項集
	*
	*@paramm其中m=k-1
	*@paramfreqMItemSet頻繁(k-1)-項集
	*@return
	*/
	publicSet<Set<String>>aprioriGen(intm,Set<Set<String>>freqMItemSet){
	Set<Set<String>>candFreqKItemSet=newHashSet<Set<String>>();
	Iterator<Set<String>>it=freqMItemSet.iterator();
	Set<String>originalItemSet=null;
	while(it.hasNext()){
	originalItemSet=it.next();
	Iterator<Set<String>>itr=this.getIterator(originalItemSet,freqMItemSet);
	while(itr.hasNext()){
	Set<String>identicalSet=newHashSet<String>();//兩個項集相同元素的集合(集合的交運算)
	identicalSet.addAll(originalItemSet);
	Set<String>set=itr.next();
	identicalSet.retainAll(set);//identicalSet中剩下的元素是identicalSet與set集合中公有的元素
	if(identicalSet.size()==m-1){//(k-1)-項集中k-2個相同
	Set<String>differentSet=newHashSet<String>();//兩個項集不同元素的集合(集合的差運算)
	differentSet.addAll(originalItemSet);
	differentSet.removeAll(set);//因為有k-2個相同，則differentSet中一定剩下一個元素，即differentSet大小為1
	differentSet.addAll(set);//構造候選k-項集的一個元素(set大小為k-1,differentSet大小為k)
	if(!this.has_infrequent_subset(differentSet,freqMItemSet))
	candFreqKItemSet.add(differentSet);//加入候選k-項集集合
	}
	}
	}
	returncandFreqKItemSet;
	}
	
	/**
	*使用先驗知識，剪枝。若候選k項集中存在k-1項子集不是頻繁k-1項集，則刪除該候選k項集
	*@paramcandKItemSet
	*@paramfreqMItemSet
	*@return
	*/
	privatebooleanhas_infrequent_subset(Set<String>candKItemSet,Set<Set<String>>freqMItemSet){
		Set<String>tempSet=newHashSet<String>();
		tempSet.addAll(candKItemSet);
		Iterator<String>itItem=candKItemSet.iterator();
		while(itItem.hasNext()){
			Stringitem=itItem.next();
			tempSet.remove(item);//該候選去掉一項後變為k-1項集
			if(!freqMItemSet.contains(tempSet))//判斷k-1項集是否是頻繁項集
				returntrue;
			tempSet.add(item);//恢復
		}
		returnfalse;
	}
	
	/**
	*根據一個頻繁k-項集的元素(集合)，獲取到頻繁k-項集的從該元素開始的迭代器實例
	*@paramitemSet
	*@paramfreqKItemSet頻繁k-項集
	*@return
	*/
	privateIterator<Set<String>>getIterator(Set<String>itemSet,Set<Set<String>>freqKItemSet){
	Iterator<Set<String>>it=freqKItemSet.iterator();
	while(it.hasNext()){
	if(itemSet.equals(it.next())){
	break;
	}
	}
	returnit;
	}
	
	/**
	*根據頻繁(k-1)-項集，調用aprioriGen方法，計算頻繁k-項集
	*
	*@paramk
	*@paramfreqMItemSet頻繁(k-1)-項集
	*@return
	*/
	publicMap<Set<String>,Float>getFreqKItemSet(intk,Set<Set<String>>freqMItemSet){
	Map<Set<String>,Integer>candFreqKItemSetMap=newHashMap<Set<String>,Integer>();
	//調用aprioriGen方法，得到候選頻繁k-項集
	Set<Set<String>>candFreqKItemSet=this.aprioriGen(k-1,freqMItemSet);
	
	//掃描事務資料庫
	Iterator<Map.Entry<Integer,Set<String>>>it=txDatabase.entrySet().iterator();
	//統計支持數
	while(it.hasNext()){
	Map.Entry<Integer,Set<String>>entry=it.next();
	Iterator<Set<String>>kit=candFreqKItemSet.iterator();
	while(kit.hasNext()){
	Set<String>kSet=kit.next();
	Set<String>set=newHashSet<String>();
	set.addAll(kSet);
	set.removeAll(entry.getValue());//候選頻繁k-項集與事務資料庫中元素做差運算
	if(set.isEmpty()){//如果拷貝set為空，支持數加1
	if(candFreqKItemSetMap.get(kSet)==null){
	Integervalue=1;
	candFreqKItemSetMap.put(kSet,value);
	}
	else{
	Integervalue=1+candFreqKItemSetMap.get(kSet);
	candFreqKItemSetMap.put(kSet,value);
	}
	}
	}
	}

㈧數據挖掘中的apriori演算法的具體步驟是什麼

演算法：Apriori
輸入：D - 事務資料庫；min_sup - 最小支持度計數閾值
輸出：L - D中的頻繁項集
方法：
L1=find_frequent_1-itemsets(D); // 找出所有頻繁1項集
For(k=2;Lk-1!=null;k++){
Ck=apriori_gen(Lk-1); // 產生候選，並剪枝
For each 事務t in D{ // 掃描D進行候選計數
Ct =subset(Ck,t); // 得到t的子集
For each 候選c 屬於 Ct
c.count++;
}
Lk={c屬於Ck | c.count>=min_sup}
}
Return L=所有的頻繁集；

Procere apriori_gen(Lk-1:frequent(k-1)-itemsets)
For each項集l1屬於Lk-1
For each項集 l2屬於Lk-1
If((l1[1]=l2[1])&&( l1[2]=l2[2])&&……..
&& (l1[k-2]=l2[k-2])&&(l1[k-1]<l2[k-1])) then{
c=l1連接l2 //連接步：產生候選
if has_infrequent_subset(c,Lk-1) then
delete c; //剪枝步：刪除非頻繁候選
else add c to Ck;
}
Return Ck;

Procere has_infrequent_sub(c:candidate k-itemset; Lk-1:frequent(k-1)-itemsets)
For each(k-1)-subset s of c
If s不屬於Lk-1 then
Return true;
Return false;

㈨ python apriori演算法代碼怎麼實現

classApriori(object):
def__init__(self,filename,min_support,item_start,item_end):
self.filename=filename
self.min_support=min_support#最小支持度
self.min_confidence=50
self.line_num=0#item的行數
self.item_start=item_start#取哪行的item
self.item_end=item_end
self.location=[[i]foriinrange(self.item_end-self.item_start+1)]
self.support=self.sut(self.location)
self.num=list(sorted(set([jforiinself.locationforjini])))#記錄item
self.pre_support=[]#保存前一個support,location,num
self.pre_location=[]
self.pre_num=[]
self.item_name=[]#項目名
self.find_item_name()
self.loop()
self.confidence_sup()
defdeal_line(self,line):
"提取出需要的項"
return[i.strip()foriinline.split('')ifi][self.item_start-1:self.item_end]
deffind_item_name(self):
"根據第一行抽取item_name"
withopen(self.filename,'r')asF:
forindex,lineinenumerate(F.readlines()):
ifindex==0:
self.item_name=self.deal_line(line)
break
defsut(self,location):
"""
輸入[[1,2,3],[2,3,4],[1,3,5]...]
輸出每個位置集的support[123,435,234...]
"""
withopen(self.filename,'r')asF:
support=[0]*len(location)
forindex,lineinenumerate(F.readlines()):
ifindex==0:continue
#提取每信息
item_line=self.deal_line(line)
forindex_num,iinenumerate(location):
flag=0
forjini:
ifitem_line[j]!='T':
flag=1
break
ifnotflag:
support[index_num]+=1
self.line_num=index#一共多少行,出去第一行的item_name
returnsupport
defselect(self,c):
"返回位置"
stack=[]
foriinself.location:
forjinself.num:
ifjini:
iflen(i)==c:
stack.append(i)
else:
stack.append([j]+i)
#多重列表去重
importitertools
s=sorted([sorted(i)foriinstack])
location=list(sfors,_initertools.groupby(s))
returnlocation
defdel_location(self,support,location):
"清除不滿足條件的候選集"
#小於最小支持度的剔除
forindex,iinenumerate(support):
ifi<self.line_num*self.min_support/100:
support[index]=0
#apriori第二條規則,剔除
forindex,jinenumerate(location):
sub_location=[j[:index_loc]+j[index_loc+1:]forindex_locinrange(len(j))]
flag=0
forkinsub_location:
ifknotinself.location:
flag=1
break
ifflag:
support[index]=0
#刪除沒用的位置
location=[ifori,jinzip(location,support)ifj!=0]
support=[iforiinsupportifi!=0]
returnsupport,location
defloop(self):
"s級頻繁項級的迭代"
s=2
whileTrue:
print'-'*80
print'The',s-1,'loop'
print'location',self.location
print'support',self.support
print'num',self.num
print'-'*80
#生成下一級候選集
location=self.select(s)
support=self.sut(location)
support,location=self.del_location(support,location)
num=list(sorted(set([jforiinlocationforjini])))
s+=1
iflocationandsupportandnum:
self.pre_num=self.num
self.pre_location=self.location
self.pre_support=self.support
self.num=num
self.location=location
self.support=support
else:
break
defconfidence_sup(self):
"計算confidence"
ifsum(self.pre_support)==0:
print'min_supporterror'#第一次迭代即失敗
else:
forindex_location,each_locationinenumerate(self.location):
del_num=[each_location[:index]+each_location[index+1:]forindexinrange(len(each_location))]#生成上一級頻繁項級
del_num=[iforiindel_numifiinself.pre_location]#刪除不存在上一級頻繁項級子集
del_support=[self.pre_support[self.pre_location.index(i)]foriindel_numifiinself.pre_location]#從上一級支持度查找
#printdel_num
#printself.support[index_location]
#printdel_support
forindex,iinenumerate(del_num):#計算每個關聯規則支持度和自信度
index_support=0
iflen(self.support)!=1:
index_support=index
support=float(self.support[index_location])/self.line_num*100#支持度
s=[jforindex_item,jinenumerate(self.item_name)ifindex_itemini]
ifdel_support[index]:
confidence=float(self.support[index_location])/del_support[index]*100
ifconfidence>self.min_confidence:
print','.join(s),'->>',self.item_name[each_location[index]],'min_support:',str(support)+'%','min_confidence:',str(confidence)+'%'
defmain():
c=Apriori('basket.txt',14,3,13)
d=Apriori('simple.txt',50,2,6)
if__name__=='__main__':
main()

Apriori(filename, min_support, item_start, item_end)

參數說明

filename:(路徑)文件名
min_support:最小支持度
item_start:item起始位置

item_end:item結束位置

importapriori
c=apriori.Apriori('basket.txt',11,3,13)

輸出：

閱讀全文

apriori演算法python實現

與apriori演算法python實現相關的閱讀推薦