newLISP Fan Club

Forum => Anything else we might add? => Topic started by: CaveGuy on May 30, 2007, 03:16:00 PM

Title: Standard Deviation ?
Post by: CaveGuy on May 30, 2007, 03:16:00 PM
I am at the limits of my understanding of math here, and am looking for an example or lib function, that can preform a Standard Deviation on a list of values and give me back the magic number, so I can move on with my life :)



Lutz: I do like how the language has evolved, Its been a few years, but I have begun a new project, I hope to abuse NewLisp's IP capibilities, this time around.



Later

Bob
Title:
Post by: Lutz on May 30, 2007, 03:45:51 PM
There is a statistics module "stat.lsp" in the source distribution in the modules directory and documented here: http://newlisp.org/code/modules/stat.lsp.html


(load "stat.lsp")

(set 'lst '(4 5 2 3 7 6 8 9 4 5 6 9 2))

(stat:sdev lst) => 2.39925202


Lutz
Title:
Post by: CaveGuy on May 30, 2007, 05:50:49 PM
Thanks Guy: I was sure I has seen it somewhere :)



As it was the only function I needed, I packed it a bit into a oneliner:



(define (sdev X) (sqrt (div (sub (apply add (map mul X X)) (div (mul (apply add X) (apply add X)) (length X))) (sub (length X) 1))



I was interested how this would affect performance so I ran some 10K loop tests using both 10 and 100 element lists.



(define (testsdev)

   (setq lst (random 0 10000000000 10))

   (println "stat:sdev 10  = " (time (stat:sdev lst) 10000))

   (println "MAIN:sdev 10 = " (time (MAIN:sdev lst) 10000))

   (setq lst (random 0 10000000000 100))

   (println "stat:sdev 100  = " (time (stat:sdev lst) 10000))

   (println "MAIN:sdev 100 = " (time (MAIN:sdev lst) 10000))

)



Returned numbers in the range of what I expected about 60% to  80%



stat:sdev 10  = 125

MAIN:sdev 10 = 78

stat:sdev 100  = 719

MAIN:sdev 100 = 579



Keep up the good work !



Bob
Title:
Post by: rickyboy on May 30, 2007, 06:10:49 PM
I'm no stat expert, but don't you have to know if your list is either:

(//%3C/s%3E%3CURL%20url=%22http://www.online-web-design-course.com/a/spacer.gif%22%3Ehttp://www.online-web-design-course.com/a/spacer.gif%3C/URL%3E%3Ce%3E)(//%3C/s%3E%3CURL%20url=%22http://www.online-web-design-course.com/a/spacer.gif%22%3Ehttp://www.online-web-design-course.com/a/spacer.gif%3C/URL%3E%3Ce%3E)(//%3C/s%3E%3CURL%20url=%22http://www.online-web-design-course.com/a/spacer.gif%22%3Ehttp://www.online-web-design-course.com/a/spacer.gif%3C/URL%3E%3Ce%3E)(//%3C/s%3E%3CURL%20url=%22http://www.online-web-design-course.com/a/spacer.gif%22%3Ehttp://www.online-web-design-course.com/a/spacer.gif%3C/URL%3E%3Ce%3E)    1) sample data from a population, or

(//%3C/s%3E%3CURL%20url=%22http://www.online-web-design-course.com/a/spacer.gif%22%3Ehttp://www.online-web-design-course.com/a/spacer.gif%3C/URL%3E%3Ce%3E)(//%3C/s%3E%3CURL%20url=%22http://www.online-web-design-course.com/a/spacer.gif%22%3Ehttp://www.online-web-design-course.com/a/spacer.gif%3C/URL%3E%3Ce%3E)(//%3C/s%3E%3CURL%20url=%22http://www.online-web-design-course.com/a/spacer.gif%22%3Ehttp://www.online-web-design-course.com/a/spacer.gif%3C/URL%3E%3Ce%3E)(//%3C/s%3E%3CURL%20url=%22http://www.online-web-design-course.com/a/spacer.gif%22%3Ehttp://www.online-web-design-course.com/a/spacer.gif%3C/URL%3E%3Ce%3E)    2) the population data, itself?



The STDEV formulas are different for 1 versus 2, according to Wikipedia: Standard Deviation (//http).
Title: Sure Hope I got the right one :)
Post by: CaveGuy on May 30, 2007, 06:42:56 PM
In my case I am dealing with:



2) the population data, itself?



These are finite samples to be averaged and compaired over time.



Sure Hope I got the right one !



Heck, it will make a nice looking chart either way :)
Title:
Post by: rickyboy on May 30, 2007, 07:35:46 PM
... the use of the word "samples" is tricky here.  Basically, wikipedia says if "every member of a population is sampled" use sigma for stddev:

(//%3C/s%3E%3CURL%20url=%22http://upload.wikimedia.org/math/6/3/3/6336e4c48fd253b7a6f552fa2579525b.png%22%3E%3CLINK_TEXT%20text=%22http://upload.wikimedia.org/math/6/3/3/%20...%2079525b.png%22%3Ehttp://upload.wikimedia.org/math/6/3/3/6336e4c48fd253b7a6f552fa2579525b.png%3C/LINK_TEXT%3E%3C/URL%3E%3Ce%3E)



But if you only have a proper sample of the population (i.e. not the whole population) use the estimator (of the population's stddev) s:



(//%3C/s%3E%3CURL%20url=%22http://upload.wikimedia.org/math/8/5/3/853c79575bd7e5a9fdbc480844b76337.png%22%3E%3CLINK_TEXT%20text=%22http://upload.wikimedia.org/math/8/5/3/%20...%20b76337.png%22%3Ehttp://upload.wikimedia.org/math/8/5/3/853c79575bd7e5a9fdbc480844b76337.png%3C/LINK_TEXT%3E%3C/URL%3E%3Ce%3E)



The only difference between the two expressions is the N versus N - 1 and that (//%3C/s%3E%3CURL%20url=%22http://upload.wikimedia.org/math/4/d/8/4d8a563baa616b3bd56826256c46d50e.png%22%3E%3CLINK_TEXT%20text=%22http://upload.wikimedia.org/math/4/d/8/%20...%2046d50e.png%22%3Ehttp://upload.wikimedia.org/math/4/d/8/4d8a563baa616b3bd56826256c46d50e.png%3C/LINK_TEXT%3E%3C/URL%3E%3Ce%3E) is the population mean in the sigma expression, whereas (//%3C/s%3E%3CURL%20url=%22http://upload.wikimedia.org/math/4/d/8/4d8a563baa616b3bd56826256c46d50e.png%22%3E%3CLINK_TEXT%20text=%22http://upload.wikimedia.org/math/4/d/8/%20...%2046d50e.png%22%3Ehttp://upload.wikimedia.org/math/4/d/8/4d8a563baa616b3bd56826256c46d50e.png%3C/LINK_TEXT%3E%3C/URL%3E%3Ce%3E) is the sample mean in the s expression (no pun intended).



Lutz's sdev function is based on the latter, s, expression, i.e. it's the sample standard deviation.



The short answer is: if your data is really the entire population, you need to divide by N, not N - 1.
Title:
Post by: CaveGuy on May 31, 2007, 12:58:51 PM
Thanks a lot, as it turns out it looks like I will be needing both. in one case I have the entire sample and in the other case I find I only have a representive sample. So here for the archives is the short sweet mininum overhead flavors of each.



(define (sdev X) (sqrt (div (sub (apply add (map mul X X)) (div (mul (apply add X) (apply add X)) (length X))) (sub (length X) 1))))



(define (stdev X) (sqrt (div (sub (apply add (map mul X X)) (div (mul (apply add X) (apply add X)) (length X))) (length X))))



Thanks for your helpfull pointers.



Back under my rock again, well at least for a while :)

Bob