Skip to content

Latest commit



110 lines (99 loc) · 7.16 KB

File metadata and controls

110 lines (99 loc) · 7.16 KB


A tool for predicting and interpreting smell data obtained from Smell Pittsburgh. The code for the system can be found here. The design, evaluation, and data analysis are documented in the following paper. If you find this useful, we would greatly appreciate it if you could cite the paper below.

Yen-Chia Hsu, Jennifer Cross, Paul Dille, Michael Tasota, Beatrice Dias, Randy Sargent, Ting-Hao (Kenneth) Huang, and Illah Nourbakhsh. 2020. Smell Pittsburgh: Engaging Community Citizen Science for Air Quality. ACM Transactions on Interactive Intelligent Systems. 10, 4, Article 32. DOI: Preprint:


Install conda. This assumes that Ubuntu is installed. A detailed documentation is here. First visit here to obtain the downloading path. The following script install conda for all users:

sudo sh -b -p /opt/miniconda3

sudo vim /etc/bash.bashrc
# Add the following lines to this file
export PATH="/opt/miniconda3/bin:$PATH"
. /opt/miniconda3/etc/profile.d/

source /etc/bash.bashrc

For Mac OS, I recommend installing conda by using Homebrew.

brew cask install miniconda
echo 'export PATH="/usr/local/Caskroom/miniconda/base/bin:$PATH"' >> ~/.bash_profile
echo '. /usr/local/Caskroom/miniconda/base/etc/profile.d/' >> ~/.bash_profile
source ~/.bash_profile

Clone this repository.

git clone
sudo chown -R $USER smell-pittsburgh-prediction

Create conda environment and install packages. It is important to install python 3.8 and pip first inside the newly created conda environment.

conda create -n smell-pittsburgh-prediction
conda activate smell-pittsburgh-prediction
conda install python=3.8
conda install pip
which pip # make sure this is the pip inside the smell-pittsburgh-prediction environment
sh smell-pittsburgh-prediction/

If the environment already exists and you want to remove it before installing packages, use the following:

conda env remove -n smell-pittsburgh-prediction

Get data, preprocess data, extract features, train the classifier, perform cross validation, analyze data, and interpret the model. This will create a directory (py/prediction/data_main/) to store all downloaded and processed data. Notice that if you change the is_regr parameter in the "" file, you will need to run "python feature" again to create a new set of features.

cd smell-pittsburgh-prediction/py/prediction/

# Run the entire pipeline
python pipeline

# For each step in the pipeline
python data # get data (you do not need to run this if you have the dataset ready, as mentioned below in the Dataset section)
python preprocess # preprocess data
python feature # extract features
python validation # perform cross validation
python analyze # analyze data and interpret model

To deploy the model and generate push notifications when smell events are predicted, run the following:

# Train the classifier
python train

# Perform prediction
python predict

If you want to disable the crowd-based smell event notifications, go to the "" file and comment out the following line:

if y_pred in (2, 3): pushType2(end_dt, logger)

Use crontab to call the above two commands periodically. The following example re-trains the model on every Sunday at 0:00. The prediction task is performed between 5:00 and 13:00 for each day at the 0 and 30 minutes clock (e.g., 5:00, and 5:30).

sudo crontab -e

# Add the following lines in the crontab file
0 0 * * 0 export PATH="/opt/miniconda3/bin:$PATH"; . "/opt/miniconda3/etc/profile.d/"; conda activate smell-pittsburgh-prediction; cd /var/www/smell-pittsburgh-prediction/py/prediction; run-one python train
0 5-13 * * * export PATH="/opt/miniconda3/bin:$PATH"; . "/opt/miniconda3/etc/profile.d/"; conda activate smell-pittsburgh-prediction; cd /var/www/smell-pittsburgh-prediction/py/prediction; run-one python predict
15 5-13 * * * export PATH="/opt/miniconda3/bin:$PATH"; . "/opt/miniconda3/etc/profile.d/"; conda activate smell-pittsburgh-prediction; cd /var/www/smell-pittsburgh-prediction/py/prediction; run-one python predict
30 5-13 * * * export PATH="/opt/miniconda3/bin:$PATH"; . "/opt/miniconda3/etc/profile.d/"; conda activate smell-pittsburgh-prediction; cd /var/www/smell-pittsburgh-prediction/py/prediction; run-one python predict
45 5-13 * * * export PATH="/opt/miniconda3/bin:$PATH"; . "/opt/miniconda3/etc/profile.d/"; conda activate smell-pittsburgh-prediction; cd /var/www/smell-pittsburgh-prediction/py/prediction; run-one python predict

IMPORTANT: the above crontab commands only work in bash, not shell. Make sure that you add the following at the first line in the crontab:


We can simplify the crontab as shown below. This means that we are running the command every 15 minutes. Check this website for setting up the crontab.

sudo crontab -e

# Add the following lines in the crontab file
0 0 * * 0 export PATH='/opt/miniconda3/bin:$PATH'; . '/opt/miniconda3/etc/profile.d/'; conda activate smell-pittsburgh-prediction; cd /var/www/smell-pittsburgh-prediction/py/prediction; run-one python train
*/15 5-13 * * * export PATH='/opt/miniconda3/bin:$PATH'; . '/opt/miniconda3/etc/profile.d/'; conda activate smell-pittsburgh-prediction; cd /var/www/smell-pittsburgh-prediction/py/prediction; run-one python predict


The web/GeoHeatmap.html visualizes distribution of smell reports by zipcodes. You can open this by using a browser, such as Google Chrome.


There are two datasets in this repository. Version one is the dataset that we used in the paper. Version two is an updated dataset that covers a wider range of geographical regions and time range. To copy the data, use the following command:

cd smell-pittsburgh-prediction/py/prediction
mkdir data_main
cd data_main
cp -R ../../../dataset/v2/esdr_raw/ .
cp ../../../dataset/v2/smell_raw.csv .

To get recent data, change the end_dt (ending date time) variable in the file and then run the following:

python data

This will download smell data (py/prediction/data_main/smell_raw.csv) and sensor data (py/prediction/data_main/esdr_raw/). The smell data is obtained from Smell Pittsburgh. The sensor data is obtained from ESDR.