当前位置：首页 » 编程语言 » logisticpython

logisticpython

发布时间: 2021-03-24 20:44:21

A. 您好，请问python运行二元logistics回归怎样设置对照呢

逻辑回归里面像男女这种类别类型的特征，都要转换成两个特征，如果是男，就是01，如专果是女，就是10。你说的属哑变量，也就是大家常说的独热编码，在sklearn里有onehotencoder，可以去查查怎么用。拿着个男女属性被转化成独热编码以后，通过逻辑回归，可以算出那个特征的权重。

B. python逻辑回归调用哪个包

可以使用机器学习，使用很方便（相当于别人早已经把具体过程做好了，像公式、模板一样自己代入数据就可以得到结果）

from sklearn.linear_model import LogisticRegression

C. R语言logistic回归模型

R语言logistic回归模型
logistic回归模型为：

对上面的模型进行变换，得到线性形式的logistic回归模型：

在二项分布族中，logistic回归是重要的模型。在某些回归问题中，响应变量是分类的，经常是要么成功，要么失败。

在R语言构建数据框时，应输入一列成功（响应）的次数和一列不成功（不响应）的次数，例如：

[python] view plain

norell<-data.frame(
x=0:5,n=rep(70,6),success=c(0,9,21,47,60,63)
)
norell$Ymat<-cbind(norell$success,norell$n-norell$success)
glm.sol<-glm(Ymat~x,family=binomial,data=norell)
summary(glm.sol)
#预测并画出回归曲线
d<-seq(0, 5, len=100)
pre<-predict(glm.sol, data.frame(x = d))
p<-exp(pre)/(1+exp(pre))
norell$y<-norell$success/norell$n
plot(norell$x, norell$y); lines(d, p)

得到回归方程（变换后的）右侧为：-3.3010+1.2459X
于是回归方程为：

D. 怎么用python做logistic回归

Logistic回归主要分为三类，一种是因变量为二分类得logistic回归，这种回归叫做二项专logistic回归，一种是因变量属为无序多分类得logistic回归，比如倾向于选择哪种产品，这种回归叫做多项logistic回归。还有一种是因变量为有序多分类的logistic回...

E. 谁会多项式logistic回归分析

不知道你想基于什么软件进行多项式logistic回归分析，这里提供两种：

1、Python

使用statsmodels包中的MNLogit模块

2、Minitab

F. Python3.4机器学习的Logistic回归算法的stocGradAscent1(dataMatrix, classLabels, numIter=150)问题求解

把del那句改成del(list(dataIndex)[randIndex])

G. 怎么看逻辑回归的python代码

你把大于0,改成大于等于0,再重新试试。另外你的逻辑弄得复杂了，好好想想，把逻辑简化一下。

如果你会画状态图，可以画个图给自己看看，好多逻辑是重复的。

比如if H3MRRFlag == 1: 象这样的语句是一需要的，直接删除。因为从python语法角度看，可能会有runtime error，因为你没有初始化变量

H. GDA和Logistic方法的区别及相应的python代码

GDA方法与Logistic方法的主要区别在于这两个模型的假设不同：GDA方法假设p(x|y)服从多元高斯分布，并且输入特征是连续的；Logistic方法并没有GDA那么强的假设，它既没有要求p(x|y)服从多元高斯分布，也没有要求输入特征是连续的。因此Logistic的适用范围比GDA更加广泛。例如：如果输入特征符合泊松分布，则Logistic得到的结果会比GDA更加准确。如果输入特征满足GDA的要求时，既可以用Logistic方法也可以用GDA，但是在这种情况下GDA得到的结果会比Logistic方法得到的结果准确些。下面给出GDA和Logistic方法的简要说明，最后给出相应的 python代码。
GDA是一种生成学习法，主要利用贝叶斯准则得到后验分布律，然后通过最大后验分布对输入数据进行分类。简单地说，也就是在给定某个特征情况下，拥有此特征的数据属于哪个类的概率大就属于哪个类。GDA的优势：由于有高斯分布的先验信息，如果确实符合实际数据，则只需要少量的样本就可以得到较好的模型。
Logistic是一种判别想学习法，判别学习法通过建立输入数据与输出信息之间的映射关系学得p(y|x)，这个与生成学习法是不同的。在生成学习法中首先要确定p(x|y)和p(y)。Logistic主要是通过sigmoid函数来确定输入数据及是将如何进行分类的。Logistic的优势：具有更高的鲁棒性和对数据的分布不明感(不想GDA那样需要特征服从高斯分布)。
下面是具体的python代码：
一、GDA模型的python代码：

点击(此处)折叠或打开
def GDA(dataIn, classLabel):

m = len(classLabel);

sum_1 = sum(classLabel);

q = sum_1/(float(m));

notLabel = ones((len(classLabel),),dtype=int)-array(classLabel);

row,col = shape(dataIn);

y0x = y1x = mat(zeros(col));

for i in range(m):

y0x += mat(dataIn[i])*notLabel[i];

y1x += mat(dataIn[i])*classLabel[i];

mean_0 = y0x/(m-sum_1);

mean_1 = y1x/sum_1;

correlation = 0;

for i in range(m):

correlation += (mat(dataIn[i]-mean_0)).T*(mat(dataIn[i]-mean_0))*notLabel[i] \

+(mat(dataIn[i]-mean_1)).T*(mat(dataIn[i]-mean_1))*classLabel[i];

correlation = correlation/m;

return q,mean_0,mean_1,correlation;

def calculate_pxy0(x,n=2):

return ((2*math.pi)**(-n/2))*(linalg.det(correlation)**(-0.5))*exp(-0.5*(x-mean_0).T*correlation.I*(x-mean_0));
def calculate_pxy1(n=2):

return ((2*math.pi)**(-n/2))*(linalg.det(correlation)**(-0.5))*exp(-0.5*(x-mean_1).T*correlation.I*(x-mean_1));

def GDAClass(testPoint,dataIn,classLabel):
import math;
x = testPoint;
q,mean_0,mean_1,correlation = GDA(dataIn,classLabel);
n=shape(dataIn)[0];
py0 = 1-q;
py1 = q;
pxy0 = calculate_pxy0(x,n);
pxy1 = calculate_pxy1(x,n);

if pxy0*py0 > pxy1*py1:

return 0;

return 1;
二、Logistic模型的python代码：

点击(此处)折叠或打开
def sigmoid(w,x):

return 1/(1+exp(-w*x))

def logisticRegression(xMat,yMat,maxCycles = 500):

'''

ones((m,n)): 产生m维的向量，且每个值为n

'''

col = shape(xMat)[1];

weight = ones((col,1));

alpha = 0.001;

for j in range(maxCycles):

h = sigmoid(weight,xMat);

err = (yMat-h);

weight += alpha*xMat.transpose*err;

return weight;

I. 怎么看python中逻辑回归输出的解释

以下为python代码，由于训练数据比较少，这边使用了批处理梯度下降法，没有使用增量梯度下降法。

##author:lijiayan##data:2016/10/27
##name:logReg.pyfrom numpy import *import matplotlib.pyplot as pltdef loadData(filename):
data = loadtxt(filename)
m,n = data.shape print 'the number of examples:',m print 'the number of features:',n-1 x = data[:,0:n-1]
y = data[:,n-1:n] return x,y#the sigmoid functiondef sigmoid(z): return 1.0 / (1 + exp(-z))#the cost functiondef costfunction(y,h):
y = array(y)
h = array(h)
J = sum(y*log(h))+sum((1-y)*log(1-h)) return J# the batch gradient descent algrithmdef gradescent(x,y):
m,n = shape(x) #m: number of training example; n: number of features x = c_[ones(m),x] #add x0 x = mat(x) # to matrix y = mat(y)
a = 0.0000025 # learning rate maxcycle = 4000 theta = zeros((n+1,1)) #initial theta J = [] for i in range(maxcycle):
h = sigmoid(x*theta)
theta = theta + a * (x.T)*(y-h)
cost = costfunction(y,h)
J.append(cost)

plt.plot(J)
plt.show() return theta,cost#the stochastic gradient descent (m should be large,if you want the result is good)def stocGraddescent(x,y):
m,n = shape(x) #m: number of training example; n: number of features x = c_[ones(m),x] #add x0 x = mat(x) # to matrix y = mat(y)
a = 0.01 # learning rate theta = ones((n+1,1)) #initial theta J = [] for i in range(m):
h = sigmoid(x[i]*theta)
theta = theta + a * x[i].transpose()*(y[i]-h)
cost = costfunction(y,h)
J.append(cost)
plt.plot(J)
plt.show() return theta,cost#plot the decision boundarydef plotbestfit(x,y,theta):
plt.plot(x[:,0:1][where(y==1)],x[:,1:2][where(y==1)],'ro')
plt.plot(x[:,0:1][where(y!=1)],x[:,1:2][where(y!=1)],'bx')
x1= arange(-4,4,0.1)
x2 =(-float(theta[0])-float(theta[1])*x1) /float(theta[2])

plt.plot(x1,x2)
plt.xlabel('x1')
plt.ylabel(('x2'))
plt.show()def classifyVector(inX,theta):
prob = sigmoid((inX*theta).sum(1)) return where(prob >= 0.5, 1, 0)def accuracy(x, y, theta):
m = shape(y)[0]
x = c_[ones(m),x]
y_p = classifyVector(x,theta)
accuracy = sum(y_p==y)/float(m) return accuracy

调用上面代码：

from logReg import *
x,y = loadData("horseColicTraining.txt")
theta,cost = gradescent(x,y)print 'J:',cost

ac_train = accuracy(x, y, theta)print 'accuracy of the training examples:', ac_train

x_test,y_test = loadData('horseColicTest.txt')
ac_test = accuracy(x_test, y_test, theta)print 'accuracy of the test examples:', ac_test

学习速率=0.0000025，迭代次数=4000时的结果：

似然函数走势（J = sum(y*log(h))+sum((1-y)*log(1-h))），似然函数是求最大值，一般是要稳定了才算最好。

从上面这个例子，我们可以看到对特征进行归一化操作的重要性。

J. 机器学习中，使用逻辑回归(python)做二分类时，recall，f1_score,support的含义是

假设预测目标为0和1
数据中1的个数为a，预测1的次数为b，预测1命中的次数为c
准确率 precision = c / b
召回率 recall = c / a
f1_score = 2 * precision * recall / (precision + recall)

阅读全文

logisticpython

与logisticpython相关的阅读推荐