Linear Regression with different Python libs: numpy, scipy

Linearing regression is one of the fundamental techqiques to use when analyze the data. If you were using Python, you would have several options to do this, including numpy, scipy and sklearn.

A free eBook to recommend is An Introduction to Statistical Learning, available at http://www-bcf.usc.edu/~gareth/ISL/

numpy.linalg.lstsq

numpy.linalg.lstsq(a, b, rcond=-1) return the least-squares solution to a linear matrix equation in the form of ax=b

Input: a,b are array —— we could use np.array to transform numbers into array Input: rcond is an optional paramenter that gives the cut-off ratio

Return: in the form of array

example:

import numpy as np
from matplotlib import pyplot as plt
import read_a_cell_in_excel2013

# load data file
workbook_name="D:\\cloud\\Dropbox\\9_data\\20150706_Fbench\\log_Fbench_values_used.xlsx"
    
sheet_name='HOMO'

exp_value=[]

B3LYP_value=[]

# read data and make them into an array

col_list = ['B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','AA','AB','AC','AD']

row_list = ['7']

for row_name in row_list:
    for col_name in col_list:
        exp_value.append(read_a_cell_in_excel2013.read_a_cell_in_excel(workbook_name,sheet_name,col_name,row_name))
print (exp_value)
    
col_list = ['B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','AA','AB','AC','AD']

row_list = ['9']

for row_name in row_list:
    for col_name in col_list:
        B3LYP_value.append(read_a_cell_in_excel2013.read_a_cell_in_excel(workbook_name,sheet_name,col_name,row_name))
print (B3LYP_value)

# prepare the input to np.linalg.lstsq

n=len(B3LYP_value)
print (n)

A=np.array(([[B3LYP_value[j], 1] for j in range(n)]))
B=np.array(exp_value)

# solve the equations
X=np.linalg.lstsq(A,B)[0]
k=X[0]; b=X[1]
T=np.arctan2(A,B)


#plot the points figure
plt.figure(figsize=(9,6))
plt.scatter(B3LYP_value,exp_value,c=T,s=25,alpha=0.4,marker='o')

#plot the fitting curve
start=[min(B3LYP_value),max(B3LYP_value)]
end=[min(B3LYP_value)*k+b,max(B3LYP_value)*k+b]

plt.plot(start,end,'b',linewidth=2.0)

plt.xlim(-6.5,-4.5)
plt.ylim(-6.5,-4.5)
plt.show()

scipy.stats.linregress

scipy.stats.linregress(x,y=none) return the least-squares solution to a linear matrix equation in the form of ax=b

Input: x,y are array —— we could use np.array to transform numbers into array

Return: slope: slope of the fitting curve in float intercept: intercept of the fitting curve in float r-value: correlation coefficient p-value: two-sided p-value for a hypothesis test whose null hypothesis is that the slope is zero, in float stderr: standard error of the estimate in float

example:

import numpy as np
from matplotlib import pyplot as plt
import read_a_cell_in_excel2013
from scipy import stats

# load data file
workbook_name="D:\\cloud\\Dropbox\\9_data\\20150706_Fbench\\log_Fbench_values_used.xlsx"
    
sheet_name='HOMO'

exp_value=[]

B3LYP_value=[]

# read data and make them into an array

col_list = ['B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','AA','AB','AC','AD']

row_list = ['7']

for row_name in row_list:
    for col_name in col_list:
        exp_value.append(read_a_cell_in_excel2013.read_a_cell_in_excel(workbook_name,sheet_name,col_name,row_name))
print (exp_value)
    
col_list = ['B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','AA','AB','AC','AD']

row_list = ['9']

for row_name in row_list:
    for col_name in col_list:
        B3LYP_value.append(read_a_cell_in_excel2013.read_a_cell_in_excel(workbook_name,sheet_name,col_name,row_name))
print (B3LYP_value)

# prepare the input to np.linalg.lstsq

n=len(B3LYP_value)
print (n)

B=np.array(exp_value)
A=np.array(B3LYP_value)


# solve the equations
slope, intercept, r_value, p_value, std_err=stats.linregress(A,B)

k=slope
b=intercept

#plot the points figure
plt.figure(figsize=(9,6))
plt.scatter(B3LYP_value,exp_value,s=25,alpha=0.4,marker='o')

#plot the fitting curve
start=[min(B3LYP_value),max(B3LYP_value)]
end=[min(B3LYP_value)*k+b,max(B3LYP_value)*k+b]

plt.plot(start,end,'b',linewidth=2.0)

plt.xlim(-6.5,-4.5)
plt.ylim(-6.5,-4.5)
plt.show()

sklearn.linear_model.LinearRegression

sklearn.linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)[source]

This is a wrap of scipy.But more focused on prediction and predictor.

Contents

numpy.linalg.lstsq

scipy.stats.linregress

sklearn.linear_model.LinearRegression