Tuesday 24 July 2018

LOGISTIC REGRESSION FROM SCRATCH,PYTHON

This is classification algorithm:
example:
        suppose, you have given the features such as (length of hair,type of dress,age,weight,height,etc) of a person and you have to predict whether
person is male or female.so you need to classify as male or female.

LOGISTIC MODEL:
- so to fit this model linear regression could is useless not exactly but the line cannot fit this
  distribution
- we take an initution from linear regression to build logistic regression.
- In LR we actually predicting the value when it comes to logistic regression you are classifying       (male 1 or female 0) it is Binary Classification.
- so linear model could not predict beacuse it can predict beyond 1 and below 0
- so sigmoid function will restrict this value between 0 and 1 and and values above 0.5 from sigmoid
  function are classified as 1 and below 0.5 as 0
                                       
**read more about logistic regression here:
PYTHON PROGRAM :
import csv
import numpy as np
import matplotlib.pyplot as plt
def loadCSV(fn):
    with open(fn,'r') as csf:
        lines=csv.reader(csf)
        data=list(lines)
        for i in range(len(data)):
            data[i]=[float(x) for x in data[i]]
    return np.array(data)
In [2]:
           
def norm(X):
    mi=np.min(X,axis=0)
    mx=np.max(X,axis=0)
    rng=mx-mi
    norm_X=1-((mx-X)/rng)
    return norm_X
def logfn(theta,X):
    return 1/(1+np.exp(-np.dot(X,theta.T)))
def log_grad(theta,X,y):
    fc=logfn(theta,X)-y.reshape(X.shape[0],-1)
    fl=np.dot(fc.T,X)
    return fl   
def cost(theta,X,y):
    log_fn=logfn(theta,X)
    y=np.squeeze(y)
    s1=y*np.log(log_fn)
    s2=(1-y)*np.log(1-log_fn)
    fi=-(s1+s2)
    return np.mean(fi)
def grad_desc(X,y,theta,lr=0.05,conv_change=0.001):
    cos=cost(theta,X,y)
    chgcos=1
    noi=1
    while chgcos>conv_change and noi<500:
        oldcos=cos
        theta=theta-(lr*log_grad(theta,X,y))/len(y)
        cos=cost(theta,X,y)
        chgcos=oldcos-cos
        noi+=1
    return theta,noi
def pred(theta,X):
    prob=logfn(theta,X)
    value=np.where(prob>=0.5,1,0)
    return np.squeeze(value)
data=loadCSV('../input/logistic.csv')
X=norm(data[:,:-1])
X=np.hstack((np.matrix(np.ones(X.shape[0])).T,X))
y=data[:,-1]
theta=np.matrix(np.zeros(X.shape[1]))
theta,noi=grad_desc(X,y,theta)
print("estimated regression coefficients:",theta)
print("no of iterations:",noi)
ypred=pred(theta,X)
print("correctly predicted labels",np.sum(y==ypred))
estimated regression coefficients: [[ 0.2297094   1.20762038 -1.8150657 ]]
no of iterations: 500
correctly predicted labels 100

LINEAR REGRESSION FROM SCRATCH,PYTHON

This algorithm is basic one to dive into machine learning.
Linear models for regression:
- y=mX+c is the linear regression that fits the plots.
- Bascially, it reduces the distance between the line and all data points and the best 
  m and c values are found using optmisation algorithm.
- The optmisation algorithm used here is Gradient Descent usually reduces cost
- y is the output(the value to be predicted) and X is the feautures(predictors)
- y is dependent variable on variables of X
The optimisation algorithm used is GRADIENT DESCENT
PYTHON PROGRAM:

from numpy import *
ptsp=[]
theta1=0
theta2=0
thetal=[]
def predict(x):
    y= 1.322*x + 7.991
    return y
def error(b,m,pts):
    tr=0
    for i in range(0,len(pts)):
        x=pts[i,0]
        y=pts[i,1]
        tr+=(y-(m*x+b))**2
    return tr/float(len(pts))
def run():
    global theta1
    global theta2
    global ptsp
    pts=genfromtxt("../input/data.csv",delimiter=",")
    ptsp=pts
    import numpy as np
    xx=ptsp[:,0]
    yy=ptsp[:,1]
    mask=[]
    for i in range(0,len(xx)):
        if xx[i]<=35 or xx[i]>=65:
            mask.append(i)
    xx=np.delete(xx,mask)
    yy=np.delete(yy,mask)
    mask=[]
    for i in range(0,len(yy)):
        if yy[i]<=45 or yy[i]>=100:
            mask.append(i)
    xx=np.delete(xx,mask)
    yy=np.delete(yy,mask)
    ptsp=[[k,v] for k,v in zip(xx,yy)]
    ptsp=np.array(ptsp)
    pts=ptsp
    lr=0.0001
    ib=0
    im=0
    noi=100000
    print("starting gradient descent at b={0},m={1},error={2}".format(ib,im,error(ib,im,pts)))
    print("running....")
    b,m=gradesc_runner(pts,ib,im,lr,noi)
    print("after {0} iterations ,b={1},m={2},error={3}".format(noi,b,m,error(b,m,pts)))
    theta1=m
    theta2=b
def gradesc_runner(pts,ib,im,lr,noi):
    b=ib
    m=im
    global thetal
    for i in range(noi):
        b,m=gradesc(b,m,array(pts),lr)
        tp=[b,m,error(b,m,pts)]
        thetal.append(tp)
    return [b,m]
def gradesc(b,m,pts,lr):
    bg=0
    mg=0
    N=float(len(pts))
    for i in range(0,len(pts)):
        x=pts[i,0]
        y=pts[i,1]
        bg+=-(2/N)*(y-((m*x)+b))
        mg+=-(2/N)*x*(y-((m*x)+b))
    b=b-(lr*bg)
    m=m-(lr*mg)
    return b,m
run()
starting gradient descent at b=0,m=0,error=5565.947156039996
running....
after 100000 iterations ,b=6.232803360817703,m=1.3444468870190596,error=94.6469921800091
In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.scatter(ptsp[:,0],ptsp[:,1],c='r')
plt.plot(ptsp[:,0],[predict(k) for k in ptsp[:,0]])
Out[2]:
[<matplotlib.lines.Line2D at 0x7f656c7b4ba8>]

CODING FPGROWTH IN PYTHON FROM SCRATCH

dats=[['google','amazon',],['amazon','google','python','cse'],['cse','google&#...