当前位置：首页 » 编程语言 » apriori算法python实现

apriori算法python实现

发布时间: 2021-03-18 03:15:36

㈠ python 有apriori算法的模块吗

Apriori算法是数据挖掘中频发模式挖掘的鼻祖，从60年代就开始流行，其算法思想也十分简单朴素，首先挖掘出长度为1的频繁模式，然后k=2
将这些频繁模式合并组成长度为k的频繁模式，算出它们的频繁次数，而且要保证其所有k-1长度的子集也是频繁的，值得注意的是，为了避免重复，合并的时候，只合并那些前k-2个字符都相同，而k-1的字符一边是少于另一边的。
以下是算法的Python实现：

[python] view plain print?
__author__ = 'linfuyuan'
min_frequency = int(raw_input('please input min_frequency:'))
file_name = raw_input('please input the transaction file:')
transactions = []

def has_infrequent_subset(candidate, Lk):
for i in range(len(candidate)):
subset = candidate[:-1]
subset.sort()
if not ''.join(subset) in Lk:
return False
lastitem = candidate.pop()
candidate.insert(0, lastitem)
return True

def countFrequency(candidate, transactions):
count = 0
for transaction in transactions:
if transaction.issuperset(candidate):
count += 1
return count

with open(file_name) as f:
for line in f.readlines():
line = line.strip()
tokens = line.split(',')
if len(tokens) > 0:
transaction = set(tokens)
transactions.append(transaction)
currentFrequencySet = {}
for transaction in transactions:
for item in transaction:
time = currentFrequencySet.get(item, 0)
currentFrequencySet[item] = time + 1
Lk = set()
for (itemset, count) in currentFrequencySet.items():
if count >= min_frequency:
Lk.add(itemset)
print ', '.join(Lk)

while len(Lk) > 0:
newLk = set()
for itemset1 in Lk:
for itemset2 in Lk:
cancombine = True
for i in range(len(itemset1)):
if i < len(itemset1) - 1:
cancombine = itemset1[i] == itemset2[i]
if not cancombine:
break
else:
cancombine = itemset1[i] < itemset2[i]
if not cancombine:
break
if cancombine:
newitemset = []
for char in itemset1:
newitemset.append(char)
newitemset.append(itemset2[-1])
if has_infrequent_subset(newitemset, Lk) and countFrequency(newitemset, transactions) >= min_frequency:
newLk.add(''.join(newitemset))
print ', '.join(newLk)
Lk = newLk

㈡ python哪个包实现apriori

如连接中的例子，虽然新的{I1, I2, I3, I4}项集满足子集{I1, I2, I3}; {I1, I2, I4} 都是频繁项集，但其他子集也得满足，这里特指剩下两个{I1, I3, I4},{I2, I3, I4}。所以验证一下他们，如果他们不满足，可根据定理1，新的项集也肯定不频繁。

所以剪枝的过程就是验证Ck中所有项集的所有k-1子集是否都频繁（只要看看他们是不是在Lk-1集合中即可），这样虽然要检查很多遍，但不需要对整个数据库进行遍历就能筛去许多不满足的情况。

上述方法是经典的Apriori算法，这两个步骤在k较高（3或以上）时效果非常好，因为商品同时存在的可能性会随k增大显著减小。

但是在k=2的时候（k=1用不到Apriori算法，必须遍历一遍数据库，相当于“链引发”），因为1项集一般都是频繁的，所以上述两个步骤基本上相当于没有用，还得遍历C(n,2)次数据库，n为频繁1项集的数量。

㈢ apriori算法用什么程序实现

你说的是什么语言吧，这样问也不对，既然是算法，那么用什么语言都能实现。

㈣有大神会用Python做关联规则apriori算法吗

#include #include using namespace std; class Vector { private: int size, length; int * data; public: Vector(int input_size) { size = input_size; length = 0; data = new int[size]; } }; int main() { int n; cin >> n; Vector arr(n)...

㈤ apriori里python调用了哪些库有哪些关键函数

第一，apriori只是一种挖掘算法，没有特定的只能用pyton或者某一种语言；

apriori算法的逻辑流程

首先找出所有的频集，这些项集出现的频繁性至少和预定义的最小支持度一样。然后由频集产生强关联规则，这些规则必须满足最小支持度和最小可信度。然后使用第1步找到的频集产生期望的规则，产生只包含集合的项的所有规则，其中每一条规则的右部只有一项，这里采用的是中规则的定义。一旦这些规则被生成，那么只有那些大于用户给定的最小可信度的规则才被留下来。为了生成所有频集，使用了递归的方法。

（1） L1 = find_frequent_1-itemsets(D);

（2） for (k=2;Lk-1 ≠Φ ;k++) {

（3） Ck = apriori_gen(Lk-1 ,min_sup);

（4） for each transaction t ∈D{//scan D for counts

（5） Ct = subset(Ck,t);//get the subsets of t that are candidates

（6） for each candidate c ∈ Ct

（7） c.count++;

（8）}

（9） Lk ={c ∈ Ck|c.count≥min_sup}

（10）}

（11） return L= ∪ k Lk;

可能产生大量的候选集,以及可能需要重复扫描数据库，是Apriori算法的两大缺点。

从逻辑上看，会用到的库：

字符处理的库
数据库处理的库
集合运算的库
概率期望运算的库（入numpy）

㈥ python apriori包怎么使用

classApriori(object):def__init__(self,filename,min_support,item_start,item_end):self.filename=filenameself.min_support=min_support#最小支持度self.min_confidence=50self.line_num=0#item的行数self.item_start=item_start#取哪行的itemself.item_end=item_endself.location=[[i]foriinrange(self.item_end-self.item_start+1)]self.support=self.sut(self.location)self.num=list(sorted(set([jforiinself.locationforjini])))#记录itemself.pre_support=[]#保存前一个support,location,numself.pre_location=[]self.pre_num=[]self.item_name=[]#项目名self.find_item_name()self.loop()self.confidence_sup()defdeal_line(self,line):"提取出需要的项"return[i.strip()foriinline.split('')ifi][self.item_start-1:self.item_end]deffind_item_name(self):"根据第一行抽取item_name"withopen(self.filename,'r')asF:forindex,lineinenumerate(F.readlines()):ifindex==0:self.item_name=self.deal_line(line)breakdefsut(self,location):"""输入[[1,2,3],[2,3,4],[1,3,5]]输出每个位置集的support[123,435,234]"""withopen(self.filename,'r')asF:support=[0]*len(location)forindex,lineinenumerate(F.readlines()):ifindex==0:continue#提取每信息item_line=self.deal_line(line)forindex_num,iinenumerate(location):flag=0forjini:ifitem_line[j]!='T':flag=1breakifnotflag:support[index_num]+=1self.line_num=index#一共多少行,出去第一行的item_namereturnsupportdefselect(self,c):"返回位置"stack=[]foriinself.location:forjinself.num:ifjini:iflen(i)==c:stack.append(i)else:stack.append([j]+i)#多重列表去重importitertoolss=sorted([sorted(i)foriinstack])location=list(sfors,_initertools.groupby(s))returnlocationdefdel_location(self,support,location):"清除不满足条件的候选集"#小于最小支持度的剔除forindex,iinenumerate(support):ifiself.min_confidence:print','.join(s),'->>',self.item_name[each_location[index]],'min_support:',str(support)+'%','min_confidence:',str(confidence)+'%'defmain():c=Apriori('basket.txt',14,3,13)d=Apriori('simple.txt',50,2,6)if__name__=='__main__':main()Apriori(filename,min_support,item_start,item_end)参数说明filename:(路径)文件名min_support:最小支持度item_start:item起始位置item_end:item结束位置importaprioric=apriori.Apriori('basket.txt',11,3,13)输出：

㈦如何实现apriori算法

java">importjava.util.HashMap;
importjava.util.HashSet;
importjava.util.Iterator;
importjava.util.Map;
importjava.util.Set;
importjava.util.TreeMap;
/**
*<B>关联规则挖掘：Apriori算法</B>
*
*<P>按照Apriori算法的基本思想来实现
*
*@authorking
*@since2013/06/27
*
*/
publicclassApriori{
	privateMap<Integer,Set<String>>txDatabase;//事务数据库
	privateFloatminSup;//最小支持度
	privateFloatminConf;//最小置信度
	privateIntegertxDatabaseCount;//事务数据库中的事务数
	
	privateMap<Integer,Set<Set<String>>>freqItemSet;//频繁项集集合
	privateMap<Set<String>,Set<Set<String>>>assiciationRules;//频繁关联规则集合
	
	publicApriori(
	Map<Integer,Set<String>>txDatabase,
	FloatminSup,
	FloatminConf){
	this.txDatabase=txDatabase;
	this.minSup=minSup;
	this.minConf=minConf;
	this.txDatabaseCount=this.txDatabase.size();
	freqItemSet=newTreeMap<Integer,Set<Set<String>>>();
	assiciationRules=newHashMap<Set<String>,Set<Set<String>>>();
	}
	
	/**
	*扫描事务数据库，计算频繁1-项集
	*@return
	*/
	publicMap<Set<String>,Float>getFreq1ItemSet(){
	Map<Set<String>,Float>freq1ItemSetMap=newHashMap<Set<String>,Float>();
	Map<Set<String>,Integer>candFreq1ItemSet=this.getCandFreq1ItemSet();
	Iterator<Map.Entry<Set<String>,Integer>>it=candFreq1ItemSet.entrySet().iterator();
	while(it.hasNext()){
	Map.Entry<Set<String>,Integer>entry=it.next();
	//计算支持度
	Floatsupported=newFloat(entry.getValue().toString())/newFloat(txDatabaseCount);
	if(supported>=minSup){
	freq1ItemSetMap.put(entry.getKey(),supported);
	}
	}
	returnfreq1ItemSetMap;
	}
	
	/**
	*计算候选频繁1-项集
	*@return
	*/
	publicMap<Set<String>,Integer>getCandFreq1ItemSet(){
	Map<Set<String>,Integer>candFreq1ItemSetMap=newHashMap<Set<String>,Integer>();
	Iterator<Map.Entry<Integer,Set<String>>>it=txDatabase.entrySet().iterator();
	//统计支持数，生成候选频繁1-项集
	while(it.hasNext()){
	Map.Entry<Integer,Set<String>>entry=it.next();
	Set<String>itemSet=entry.getValue();
	for(Stringitem:itemSet){
	Set<String>key=newHashSet<String>();
	key.add(item.trim());
	if(!candFreq1ItemSetMap.containsKey(key)){
	Integervalue=1;
	candFreq1ItemSetMap.put(key,value);
	}
	else{
	Integervalue=1+candFreq1ItemSetMap.get(key);
	candFreq1ItemSetMap.put(key,value);
	}
	}
	}
	returncandFreq1ItemSetMap;
	}
	
	/**
	*根据频繁(k-1)-项集计算候选频繁k-项集
	*
	*@paramm其中m=k-1
	*@paramfreqMItemSet频繁(k-1)-项集
	*@return
	*/
	publicSet<Set<String>>aprioriGen(intm,Set<Set<String>>freqMItemSet){
	Set<Set<String>>candFreqKItemSet=newHashSet<Set<String>>();
	Iterator<Set<String>>it=freqMItemSet.iterator();
	Set<String>originalItemSet=null;
	while(it.hasNext()){
	originalItemSet=it.next();
	Iterator<Set<String>>itr=this.getIterator(originalItemSet,freqMItemSet);
	while(itr.hasNext()){
	Set<String>identicalSet=newHashSet<String>();//两个项集相同元素的集合(集合的交运算)
	identicalSet.addAll(originalItemSet);
	Set<String>set=itr.next();
	identicalSet.retainAll(set);//identicalSet中剩下的元素是identicalSet与set集合中公有的元素
	if(identicalSet.size()==m-1){//(k-1)-项集中k-2个相同
	Set<String>differentSet=newHashSet<String>();//两个项集不同元素的集合(集合的差运算)
	differentSet.addAll(originalItemSet);
	differentSet.removeAll(set);//因为有k-2个相同，则differentSet中一定剩下一个元素，即differentSet大小为1
	differentSet.addAll(set);//构造候选k-项集的一个元素(set大小为k-1,differentSet大小为k)
	if(!this.has_infrequent_subset(differentSet,freqMItemSet))
	candFreqKItemSet.add(differentSet);//加入候选k-项集集合
	}
	}
	}
	returncandFreqKItemSet;
	}
	
	/**
	*使用先验知识，剪枝。若候选k项集中存在k-1项子集不是频繁k-1项集，则删除该候选k项集
	*@paramcandKItemSet
	*@paramfreqMItemSet
	*@return
	*/
	privatebooleanhas_infrequent_subset(Set<String>candKItemSet,Set<Set<String>>freqMItemSet){
		Set<String>tempSet=newHashSet<String>();
		tempSet.addAll(candKItemSet);
		Iterator<String>itItem=candKItemSet.iterator();
		while(itItem.hasNext()){
			Stringitem=itItem.next();
			tempSet.remove(item);//该候选去掉一项后变为k-1项集
			if(!freqMItemSet.contains(tempSet))//判断k-1项集是否是频繁项集
				returntrue;
			tempSet.add(item);//恢复
		}
		returnfalse;
	}
	
	/**
	*根据一个频繁k-项集的元素(集合)，获取到频繁k-项集的从该元素开始的迭代器实例
	*@paramitemSet
	*@paramfreqKItemSet频繁k-项集
	*@return
	*/
	privateIterator<Set<String>>getIterator(Set<String>itemSet,Set<Set<String>>freqKItemSet){
	Iterator<Set<String>>it=freqKItemSet.iterator();
	while(it.hasNext()){
	if(itemSet.equals(it.next())){
	break;
	}
	}
	returnit;
	}
	
	/**
	*根据频繁(k-1)-项集，调用aprioriGen方法，计算频繁k-项集
	*
	*@paramk
	*@paramfreqMItemSet频繁(k-1)-项集
	*@return
	*/
	publicMap<Set<String>,Float>getFreqKItemSet(intk,Set<Set<String>>freqMItemSet){
	Map<Set<String>,Integer>candFreqKItemSetMap=newHashMap<Set<String>,Integer>();
	//调用aprioriGen方法，得到候选频繁k-项集
	Set<Set<String>>candFreqKItemSet=this.aprioriGen(k-1,freqMItemSet);
	
	//扫描事务数据库
	Iterator<Map.Entry<Integer,Set<String>>>it=txDatabase.entrySet().iterator();
	//统计支持数
	while(it.hasNext()){
	Map.Entry<Integer,Set<String>>entry=it.next();
	Iterator<Set<String>>kit=candFreqKItemSet.iterator();
	while(kit.hasNext()){
	Set<String>kSet=kit.next();
	Set<String>set=newHashSet<String>();
	set.addAll(kSet);
	set.removeAll(entry.getValue());//候选频繁k-项集与事务数据库中元素做差运算
	if(set.isEmpty()){//如果拷贝set为空，支持数加1
	if(candFreqKItemSetMap.get(kSet)==null){
	Integervalue=1;
	candFreqKItemSetMap.put(kSet,value);
	}
	else{
	Integervalue=1+candFreqKItemSetMap.get(kSet);
	candFreqKItemSetMap.put(kSet,value);
	}
	}
	}
	}

㈧数据挖掘中的apriori算法的具体步骤是什么

算法：Apriori
输入：D - 事务数据库；min_sup - 最小支持度计数阈值
输出：L - D中的频繁项集
方法：
L1=find_frequent_1-itemsets(D); // 找出所有频繁1项集
For(k=2;Lk-1!=null;k++){
Ck=apriori_gen(Lk-1); // 产生候选，并剪枝
For each 事务t in D{ // 扫描D进行候选计数
Ct =subset(Ck,t); // 得到t的子集
For each 候选c 属于 Ct
c.count++;
}
Lk={c属于Ck | c.count>=min_sup}
}
Return L=所有的频繁集；

Procere apriori_gen(Lk-1:frequent(k-1)-itemsets)
For each项集l1属于Lk-1
For each项集 l2属于Lk-1
If((l1[1]=l2[1])&&( l1[2]=l2[2])&&……..
&& (l1[k-2]=l2[k-2])&&(l1[k-1]<l2[k-1])) then{
c=l1连接l2 //连接步：产生候选
if has_infrequent_subset(c,Lk-1) then
delete c; //剪枝步：删除非频繁候选
else add c to Ck;
}
Return Ck;

Procere has_infrequent_sub(c:candidate k-itemset; Lk-1:frequent(k-1)-itemsets)
For each(k-1)-subset s of c
If s不属于Lk-1 then
Return true;
Return false;

㈨ python apriori算法代码怎么实现

classApriori(object):
def__init__(self,filename,min_support,item_start,item_end):
self.filename=filename
self.min_support=min_support#最小支持度
self.min_confidence=50
self.line_num=0#item的行数
self.item_start=item_start#取哪行的item
self.item_end=item_end
self.location=[[i]foriinrange(self.item_end-self.item_start+1)]
self.support=self.sut(self.location)
self.num=list(sorted(set([jforiinself.locationforjini])))#记录item
self.pre_support=[]#保存前一个support,location,num
self.pre_location=[]
self.pre_num=[]
self.item_name=[]#项目名
self.find_item_name()
self.loop()
self.confidence_sup()
defdeal_line(self,line):
"提取出需要的项"
return[i.strip()foriinline.split('')ifi][self.item_start-1:self.item_end]
deffind_item_name(self):
"根据第一行抽取item_name"
withopen(self.filename,'r')asF:
forindex,lineinenumerate(F.readlines()):
ifindex==0:
self.item_name=self.deal_line(line)
break
defsut(self,location):
"""
输入[[1,2,3],[2,3,4],[1,3,5]...]
输出每个位置集的support[123,435,234...]
"""
withopen(self.filename,'r')asF:
support=[0]*len(location)
forindex,lineinenumerate(F.readlines()):
ifindex==0:continue
#提取每信息
item_line=self.deal_line(line)
forindex_num,iinenumerate(location):
flag=0
forjini:
ifitem_line[j]!='T':
flag=1
break
ifnotflag:
support[index_num]+=1
self.line_num=index#一共多少行,出去第一行的item_name
returnsupport
defselect(self,c):
"返回位置"
stack=[]
foriinself.location:
forjinself.num:
ifjini:
iflen(i)==c:
stack.append(i)
else:
stack.append([j]+i)
#多重列表去重
importitertools
s=sorted([sorted(i)foriinstack])
location=list(sfors,_initertools.groupby(s))
returnlocation
defdel_location(self,support,location):
"清除不满足条件的候选集"
#小于最小支持度的剔除
forindex,iinenumerate(support):
ifi<self.line_num*self.min_support/100:
support[index]=0
#apriori第二条规则,剔除
forindex,jinenumerate(location):
sub_location=[j[:index_loc]+j[index_loc+1:]forindex_locinrange(len(j))]
flag=0
forkinsub_location:
ifknotinself.location:
flag=1
break
ifflag:
support[index]=0
#删除没用的位置
location=[ifori,jinzip(location,support)ifj!=0]
support=[iforiinsupportifi!=0]
returnsupport,location
defloop(self):
"s级频繁项级的迭代"
s=2
whileTrue:
print'-'*80
print'The',s-1,'loop'
print'location',self.location
print'support',self.support
print'num',self.num
print'-'*80
#生成下一级候选集
location=self.select(s)
support=self.sut(location)
support,location=self.del_location(support,location)
num=list(sorted(set([jforiinlocationforjini])))
s+=1
iflocationandsupportandnum:
self.pre_num=self.num
self.pre_location=self.location
self.pre_support=self.support
self.num=num
self.location=location
self.support=support
else:
break
defconfidence_sup(self):
"计算confidence"
ifsum(self.pre_support)==0:
print'min_supporterror'#第一次迭代即失败
else:
forindex_location,each_locationinenumerate(self.location):
del_num=[each_location[:index]+each_location[index+1:]forindexinrange(len(each_location))]#生成上一级频繁项级
del_num=[iforiindel_numifiinself.pre_location]#删除不存在上一级频繁项级子集
del_support=[self.pre_support[self.pre_location.index(i)]foriindel_numifiinself.pre_location]#从上一级支持度查找
#printdel_num
#printself.support[index_location]
#printdel_support
forindex,iinenumerate(del_num):#计算每个关联规则支持度和自信度
index_support=0
iflen(self.support)!=1:
index_support=index
support=float(self.support[index_location])/self.line_num*100#支持度
s=[jforindex_item,jinenumerate(self.item_name)ifindex_itemini]
ifdel_support[index]:
confidence=float(self.support[index_location])/del_support[index]*100
ifconfidence>self.min_confidence:
print','.join(s),'->>',self.item_name[each_location[index]],'min_support:',str(support)+'%','min_confidence:',str(confidence)+'%'
defmain():
c=Apriori('basket.txt',14,3,13)
d=Apriori('simple.txt',50,2,6)
if__name__=='__main__':
main()

Apriori(filename, min_support, item_start, item_end)

参数说明

filename:(路径)文件名
min_support:最小支持度
item_start:item起始位置

item_end:item结束位置

importapriori
c=apriori.Apriori('basket.txt',11,3,13)

输出：

阅读全文

apriori算法python实现

与apriori算法python实现相关的阅读推荐