0

Bascially, i have a code that filters the data coming from the API. The data is reasonable, however, the values of the active Users are in string format. Once I changed them (to integers), my max value and all the value change! Even if I use excel, the same thing happens, and it is not realistic to have 1 million active users as it appears later on. Why this can be happening? I checked a lot of stuff and nothing.

This is the max value before transforming:


activeUsers    994

This is the describe after the transformation:


count    6.110000e+02
mean     3.721185e+03
std      4.682527e+04
min      3.000000e+00
25%      3.550000e+01
50%      1.740000e+02
75%      7.140000e+02
max      1.133340e+06
Name: activeUsers, dtype: float64
2
  • Consider converting data in the column to integers df['activeUsers'] = df['activeUsers'].astype(int) Commented Jun 5 at 17:12
  • Already did it. I already use pd.numeric as well. That is my problem, once converting to integers, the data appears to change (at least its max value, which was a string number before). But maybe the problem is how the strings are recognized as max values, and the underlying problem is that the data cannot be the same due problems with the API. Who knows!
    – Teko JR
    Commented Jun 7 at 12:48

0