Package Bio :: Package Statistics :: Module lowess
[hide private]
[frames] | no frames]

Source Code for Module Bio.Statistics.lowess

 1  """ 
 2  This module implements the Lowess function for nonparametric regression. 
 3   
 4  Functions: 
 5  lowess        Fit a smooth nonparametric regression curve to a scatterplot. 
 6   
 7  For more information, see 
 8   
 9  William S. Cleveland: "Robust locally weighted regression and smoothing 
10  scatterplots", Journal of the American Statistical Association, December 1979, 
11  volume 74, number 368, pp. 829-836. 
12   
13  William S. Cleveland and Susan J. Devlin: "Locally weighted regression: An 
14  approach to regression analysis by local fitting", Journal of the American 
15  Statistical Association, September 1988, volume 83, number 403, pp. 596-610. 
16  """ 
17   
18  try: 
19      from Numeric import * 
20      from LinearAlgebra import solve_linear_equations 
21  except ImportError, x: 
22      raise ImportError, "This module requires Numeric (precursor to NumPy) with the LinearAlgebra and MLab libraries" 
23   
24  try: 
25      from Bio.Cluster import median 
26      # The function median in Bio.Cluster is faster than the function median 
27      # in Numeric's MLab, as it does not require a full sort. 
28  except ImportError, x: 
29      # Use the median function in Numeric's MLab if Bio.Cluster is not available 
30      try: 
31          from MLab import median 
32      except ImportError, x: 
33          raise ImportError, "This module requires Numeric (precursor to NumPy) with the LinearAlgebra and MLab libraries" 
34   
35 -def lowess(x, y, f=2./3., iter=3):
36 """lowess(x, y, f=2./3., iter=3) -> yest 37 38 Lowess smoother: Robust locally weighted regression. 39 The lowess function fits a nonparametric regression curve to a scatterplot. 40 The arrays x and y contain an equal number of elements; each pair 41 (x[i], y[i]) defines a data point in the scatterplot. The function returns 42 the estimated (smooth) values of y. 43 44 The smoothing span is given by f. A larger value for f will result in a 45 smoother curve. The number of robustifying iterations is given by iter. The 46 function will run faster with a smaller number of iterations.""" 47 n = len(x) 48 r = int(ceil(f*n)) 49 h = [sort(abs(x-x[i]))[r] for i in range(n)] 50 w = clip(abs(([x]-transpose([x]))/h),0.0,1.0) 51 w = 1-w*w*w 52 w = w*w*w 53 yest = zeros(n,'d') 54 delta = ones(n,'d') 55 for iteration in range(iter): 56 for i in range(n): 57 weights = delta * w[:,i] 58 b = array([sum(weights*y), sum(weights*y*x)]) 59 A = array([[sum(weights), sum(weights*x)], 60 [sum(weights*x), sum(weights*x*x)]]) 61 beta = solve_linear_equations(A,b) 62 yest[i] = beta[0] + beta[1]*x[i] 63 residuals = y-yest 64 s = median(abs(residuals)) 65 delta = clip(residuals/(6*s),-1,1) 66 delta = 1-delta*delta 67 delta = delta*delta 68 return yest
69