Questions tagged [statistics]
Statistics is the mathematical study of using probability to infer characteristics of a population from a limited number of samples or observations. Non-programming statistics questions are off-topic here, and they should be posted at https://stats.stackexchange.com instead.
statistics
1,413
questions
503
votes
36
answers
409k
views
How to find the statistical mode?
In R, mean() and median() are standard functions which do what you'd expect. mode() tells you the internal storage mode of the object, not the value that occurs the most in its argument. But is there ...
805
votes
12
answers
1.7m
views
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
I have a dataframe df and I use several columns from it to groupby:
df['col1','col2','col3','col4'].groupby(['col1','col2']).mean()
In the above way, I almost get the table (dataframe) that I need. ...
271
votes
50
answers
405k
views
Simple way to calculate median with MySQL
What's the simplest (and hopefully not too slow) way to calculate the median with MySQL? I've used AVG(x) for finding the mean, but I'm having a hard time finding a simple way of calculating the ...
196
votes
13
answers
219k
views
Fitting empirical distribution to theoretical ones with Scipy (Python)?
INTRODUCTION: I have a list of more than 30,000 integer values ranging from 0 to 47, inclusive, e.g.[0,0,0,0,..,1,1,1,1,...,2,2,2,2,...,47,47,47,...] sampled from some continuous distribution. The ...
105
votes
2
answers
89k
views
How to highlight specific x-value ranges
I'm making a visualization of historical stock data for a project, and I'd like to highlight regions of drops. For instance, when the stock is experiencing significant drawdown, I would like to ...
80
votes
14
answers
41k
views
Select k random elements from a list whose elements have weights
Selecting without any weights (equal probabilities) is beautifully described here.
I was wondering if there is a way to convert this approach to a weighted one.
I am also interested in other ...
189
votes
14
answers
45k
views
Workflow for statistical analysis and report writing
Does anyone have any wisdom on workflows for data analysis related to custom report writing? The use-case is basically this:
Client commissions a report that uses data analysis, e.g. a population ...
108
votes
6
answers
67k
views
Browser statistics on JavaScript disabled [closed]
I am having a hard time collecting publically available statistics on the percentage of web users that browse with JavaScript disabled.
Yahoo has published data from 2010 and R. Reid published data ...
37
votes
5
answers
47k
views
PHP algorithm to generate all combinations of a specific size from a single set
I am trying to deduce an algorithm which generates all possible combinations of a specific size something like a function which accepts an array of chars and size as its parameter and return an array ...
318
votes
16
answers
1.1m
views
How to normalize a numpy array to a unit vector
I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function:
def normalize(v):
norm = np.linalg.norm(v)
if ...
188
votes
6
answers
370k
views
Compute a confidence interval from sample data
I have sample data which I would like to compute a confidence interval for, assuming a normal distribution.
I have found and installed the numpy and scipy packages and have gotten numpy to return a ...
101
votes
17
answers
167k
views
How to efficiently calculate a running standard deviation
I have an array of lists of numbers, e.g.:
[0] (0.01, 0.01, 0.02, 0.04, 0.03)
[1] (0.00, 0.02, 0.02, 0.03, 0.02)
[2] (0.01, 0.02, 0.02, 0.03, 0.02)
...
[n] (0.01, 0.00, 0.01, 0.05, 0.03)
I would ...
251
votes
11
answers
417k
views
Find p-value (significance) in scikit-learn LinearRegression
How can I find the p-value (significance) of each coefficient?
lm = sklearn.linear_model.LinearRegression()
lm.fit(x,y)
188
votes
3
answers
163k
views
How to make execution pause, sleep, wait for X seconds in R?
How do you pause an R script for a specified number of seconds or miliseconds? In many languages, there is a sleep function, but ?sleep references a data set. And ?pause and ?wait don't exist.
The ...
157
votes
15
answers
248k
views
Multiple linear regression in Python
I can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against several independent ...