Skip to content

Kmeans clustering algorithm is unsupervised machine learning library that can divide this large 2018 dataset of over 1 million rows of answers from survey that answers 50 questions with level 1 through 5 on Big 5 personality traits. Predicted raw score is further normalized to show percentiles of relative importance.

Notifications You must be signed in to change notification settings

xxl4tomxu98/Big5-kmeans-clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cluster-personality-kmeans (Kaggle dataset: IPIP-FFM-data-8Nov2018)

Dataset with no label so we cluster it using kmeans to generate 10(arbitary) personality catagories

The clustered catagory numbers are normalized and bar charts created in jupyter notebook

data is based 2018 update version with metadata which can be used for further ML study

Introduction

The Big Five personality traits, also known as the five-factor model (FFM) and the OCEAN model, is a taxonomy, or grouping, for personality traits. When factor analysis (a statistical technique) is applied to personality survey data, some words used to describe aspects of personality are often applied to the same person. For example, someone described as conscientious is more likely to be described as "always prepared" rather than "messy". This theory is based therefore on the association between words but not on neuropsychological experiments. This theory uses descriptors of common language and therefore suggests five broad dimensions commonly used to describe the human personality and psyche.

The Dataset

This dataset contains 1,015,342 questionnaire answers collected online by Open Psychometrics.

Source:

"Possible Questionnaire Format for Administering the 50-Item Set of IPIP Big-Five Factor Markers". International Personality Item Pool.

References:

Goldberg, Lewis R. "The development of markers for the Big-Five factor structure." Psychological assessment 4.1 (1992): 26.

About

Kmeans clustering algorithm is unsupervised machine learning library that can divide this large 2018 dataset of over 1 million rows of answers from survey that answers 50 questions with level 1 through 5 on Big 5 personality traits. Predicted raw score is further normalized to show percentiles of relative importance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages