Skip to main content

Questions tagged [statistics]

Statistics is the mathematical study of using probability to infer characteristics of a population from a limited number of samples or observations. Non-programming statistics questions are off-topic here, and they should be posted at https://stats.stackexchange.com instead.

503 votes
36 answers
409k views

How to find the statistical mode?

In R, mean() and median() are standard functions which do what you'd expect. mode() tells you the internal storage mode of the object, not the value that occurs the most in its argument. But is there ...
Nick's user avatar
  • 21.9k
805 votes
12 answers
1.7m views

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

I have a dataframe df and I use several columns from it to groupby: df['col1','col2','col3','col4'].groupby(['col1','col2']).mean() In the above way, I almost get the table (dataframe) that I need. ...
Roman's user avatar
  • 129k
271 votes
50 answers
405k views

Simple way to calculate median with MySQL

What's the simplest (and hopefully not too slow) way to calculate the median with MySQL? I've used AVG(x) for finding the mean, but I'm having a hard time finding a simple way of calculating the ...
davr's user avatar
  • 19.1k
196 votes
13 answers
219k views

Fitting empirical distribution to theoretical ones with Scipy (Python)?

INTRODUCTION: I have a list of more than 30,000 integer values ranging from 0 to 47, inclusive, e.g.[0,0,0,0,..,1,1,1,1,...,2,2,2,2,...,47,47,47,...] sampled from some continuous distribution. The ...
s_sherly's user avatar
  • 2,347
105 votes
2 answers
89k views

How to highlight specific x-value ranges

I'm making a visualization of historical stock data for a project, and I'd like to highlight regions of drops. For instance, when the stock is experiencing significant drawdown, I would like to ...
alexgolec's user avatar
  • 27.9k
80 votes
14 answers
41k views

Select k random elements from a list whose elements have weights

Selecting without any weights (equal probabilities) is beautifully described here. I was wondering if there is a way to convert this approach to a weighted one. I am also interested in other ...
nimcap's user avatar
  • 10.3k
189 votes
14 answers
45k views

Workflow for statistical analysis and report writing

Does anyone have any wisdom on workflows for data analysis related to custom report writing? The use-case is basically this: Client commissions a report that uses data analysis, e.g. a population ...
forkandwait's user avatar
  • 5,117
108 votes
6 answers
67k views

Browser statistics on JavaScript disabled [closed]

I am having a hard time collecting publically available statistics on the percentage of web users that browse with JavaScript disabled. Yahoo has published data from 2010 and R. Reid published data ...
Jesper Rønn-Jensen's user avatar
37 votes
5 answers
47k views

PHP algorithm to generate all combinations of a specific size from a single set

I am trying to deduce an algorithm which generates all possible combinations of a specific size something like a function which accepts an array of chars and size as its parameter and return an array ...
asim-ishaq's user avatar
  • 2,220
318 votes
16 answers
1.1m views

How to normalize a numpy array to a unit vector

I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function: def normalize(v): norm = np.linalg.norm(v) if ...
Donbeo's user avatar
  • 17.5k
188 votes
6 answers
370k views

Compute a confidence interval from sample data

I have sample data which I would like to compute a confidence interval for, assuming a normal distribution. I have found and installed the numpy and scipy packages and have gotten numpy to return a ...
Bmayer0122's user avatar
  • 2,208
101 votes
17 answers
167k views

How to efficiently calculate a running standard deviation

I have an array of lists of numbers, e.g.: [0] (0.01, 0.01, 0.02, 0.04, 0.03) [1] (0.00, 0.02, 0.02, 0.03, 0.02) [2] (0.01, 0.02, 0.02, 0.03, 0.02) ... [n] (0.01, 0.00, 0.01, 0.05, 0.03) I would ...
Alex Reynolds's user avatar
251 votes
11 answers
417k views

Find p-value (significance) in scikit-learn LinearRegression

How can I find the p-value (significance) of each coefficient? lm = sklearn.linear_model.LinearRegression() lm.fit(x,y)
elplatt's user avatar
  • 3,337
188 votes
3 answers
163k views

How to make execution pause, sleep, wait for X seconds in R?

How do you pause an R script for a specified number of seconds or miliseconds? In many languages, there is a sleep function, but ?sleep references a data set. And ?pause and ?wait don't exist. The ...
Dan Goldstein's user avatar
157 votes
15 answers
248k views

Multiple linear regression in Python

I can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against several independent ...
Zach's user avatar
  • 4,694

15 30 50 per page
1
2 3 4 5
95