3. Process
Under this section, the data will be processed. Due to the amount of
data and easiness to create visualizations to share with stakeholders, R
will be used.
3.1. Packages
The following packages were installed and opened:
- janitor
- lubridate
- Rcmdr
- scales
- tidyverse
# Load packages
library("janitor")
library("lubridate")
library("Rcmdr")
library("scales")
library("tidyverse")
3.2. Import datasets
According to the central limit theorem, given a sufficiently large
sample size from a population with a finite level of variance, the mean
of all sampled variables from the same population will be approximately
equal to the mean of the whole population. What is a sufficient sample
size varies depending on industry and business, but sample sizes equal
to or greater than 30 are often considered sufficient for the central
limit theorem to hold. This analysis only used datasets fulfilling that
sample size, which means heart rate data (14 users), sleep data (24
users), and weight data (8 users) were not used.
For this analysis, minutial data would not provide more insights than
hourly and/or daily data could provide. Hence, datasets containing
minutial data was not used.
The dataset called dailyActivity_merged contains daily
calories, intensities, and steps, which made the datasets dedicated
specifically to those data redundant for this analysis.
To summarise, this analysis used the following datasets:
- dailyActivity_merged
- hourlySteps_merged
# Import datasets
daily_activity <- read.csv("data/dailyActivity_merged.csv")
hourly_steps <- read.csv("data/hourlySteps_merged.csv")
3.3. Preview datasets
A preview allows for familiarization with the data.
# Preview datasets
glimpse(daily_activity)
## Rows: 940
## Columns: 15
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ ActivityDate <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/…
## $ TotalSteps <int> 13162, 10735, 10460, 9762, 12669, 9705, 13019…
## $ TotalDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8…
## $ TrackerDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5…
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3…
## $ LightActiveDistance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0…
## $ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveMinutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4…
## $ FairlyActiveMinutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21…
## $ LightlyActiveMinutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205, …
## $ SedentaryMinutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 818…
## $ Calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203…
glimpse(hourly_steps)
## Rows: 22,099
## Columns: 3
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150396036…
## $ ActivityHour <chr> "4/12/2016 12:00:00 AM", "4/12/2016 1:00:00 AM", "4/12/20…
## $ StepTotal <int> 373, 160, 151, 0, 0, 0, 0, 0, 250, 1864, 676, 360, 253, 2…
3.4. Clean and format datasets
Now when the data structures are known, it is time to look for errors
and inconsistencies.
3.4.1. Verify number of users
Double checked the number of distinct users.
# Check number of distinct users
n_distinct(daily_activity$Id)
## [1] 33
n_distinct(hourly_steps$Id)
## [1] 33
3.4.2. Identify and remove potential duplicates¶
Looked for duplicates.
# Find potential duplicates
sum(duplicated(daily_activity)) # Returns the number of duplicate rows
## [1] 0
sum(duplicated(hourly_steps))
## [1] 0
3.4.3. Clean and rename columns¶
To ensure the column names followed a good naming convention they
were formatted to snake case and renamed.
# Format to snake_case
daily_activity <- clean_names(daily_activity)
hourly_steps <- clean_names(hourly_steps)
# Rename activity_date to date
daily_activity <-
daily_activity %>%
rename(date = activity_date)
# Rename activity hour to date_time
hourly_steps <-
hourly_steps %>%
rename(date_time = activity_hour)
3.4.4. Date and time
For the data set daily_activity, the dates were stored as characters
in american standard, MM/DD/YYYY. To transform it to ISO standard,
YYYY-MM-DD, the following code was used:
# Transform into date
daily_activity$date <- mdy(daily_activity$date)
Check so dataset looks as desired.
glimpse(daily_activity)
## Rows: 940
## Columns: 15
## $ id <dbl> 1503960366, 1503960366, 1503960366, 1503960…
## $ date <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-0…
## $ total_steps <int> 13162, 10735, 10460, 9762, 12669, 9705, 130…
## $ total_distance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9…
## $ tracker_distance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9…
## $ logged_activities_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_distance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3…
## $ moderately_active_distance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1…
## $ light_active_distance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5…
## $ sedentary_active_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_minutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66,…
## $ fairly_active_minutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, …
## $ lightly_active_minutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205…
## $ sedentary_minutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 8…
## $ calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2…
For the dataset hourly_steps, both the date and the time were stored
as characters in american standard. To transform it into date-time
format the following code was used.
# Transform into date-time
hourly_steps$date_time <- mdy_hms(hourly_steps$date_time)
glimpse(hourly_steps)
## Rows: 22,099
## Columns: 3
## $ id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,…
## $ date_time <dttm> 2016-04-12 00:00:00, 2016-04-12 01:00:00, 2016-04-12 02:00…
## $ step_total <int> 373, 160, 151, 0, 0, 0, 0, 0, 250, 1864, 676, 360, 253, 221…
4. Analyze & Share
Under this section, the Fitbit datasets will be analyzed to discover
trends that can be used to inform Bellabeats marketing strategy.
4.1. Summary statistics
To start the analysis, summary statistics were calculated.
# Numerical Summaries: daily_activity
numSummary(daily_activity[,c("total_steps", "total_distance", "tracker_distance", "logged_activities_distance", "very_active_distance", "moderately_active_distance", "light_active_distance", "sedentary_active_distance", "very_active_minutes", "fairly_active_minutes", "lightly_active_minutes", "sedentary_minutes", "calories"), drop=FALSE],
statistics=c("mean", "sd", "quantiles"), quantiles=c(0,.5,1))
## mean sd 0% 50% 100%
## total_steps 7.637911e+03 5.087151e+03 0 7405.500 36019.000000
## total_distance 5.489702e+00 3.924606e+00 0 5.245 28.030001
## tracker_distance 5.475351e+00 3.907276e+00 0 5.245 28.030001
## logged_activities_distance 1.081709e-01 6.198965e-01 0 0.000 4.942142
## very_active_distance 1.502681e+00 2.658941e+00 0 0.210 21.920000
## moderately_active_distance 5.675426e-01 8.835803e-01 0 0.240 6.480000
## light_active_distance 3.340819e+00 2.040655e+00 0 3.365 10.710000
## sedentary_active_distance 1.606383e-03 7.346176e-03 0 0.000 0.110000
## very_active_minutes 2.116489e+01 3.284480e+01 0 4.000 210.000000
## fairly_active_minutes 1.356489e+01 1.998740e+01 0 6.000 143.000000
## lightly_active_minutes 1.928128e+02 1.091747e+02 0 199.000 518.000000
## sedentary_minutes 9.912106e+02 3.012674e+02 0 1057.500 1440.000000
## calories 2.303610e+03 7.181669e+02 0 2134.000 4900.000000
## n
## total_steps 940
## total_distance 940
## tracker_distance 940
## logged_activities_distance 940
## very_active_distance 940
## moderately_active_distance 940
## light_active_distance 940
## sedentary_active_distance 940
## very_active_minutes 940
## fairly_active_minutes 940
## lightly_active_minutes 940
## sedentary_minutes 940
## calories 940
Initial thoughts regarding the dataset:
- Mean total steps per day was approx 7638 steps (SD=5087), median
total steps per day was slightly under mean (MDN=7406). Min recorded
steps is 0 and max is as much as 36019 steps! Seems like there were some
very active users that increased the average, but also some sedentary
users that balanced the very active. Need to investigate further.
- Mean very active minutes is 21 min (SD=33), mean fairly active
minutes is 13 min (SD=20). This means on average users reach the
recommended active minutes to gain significant health benefits. However,
large standard deviation. Investigate further.
- The activity minutes did not add up to 1440 min (60 min per hour, 24
hours in a day) on each row. Indicates that the users did not have their
tracking devises on during the whole day. Investigate further.
4.2. Lifestyle type based on daily steps
As there was no available demographic data, users were categorized
into lifestyle types based on their daily steps. The levels are defined
in the table below.
Lifestyle type
|
Steps per day
|
Sedentary
|
< 5000
|
Low active
|
5000 - 7499
|
Somewhat active
|
7500 - 9999
|
Active
|
10000 - 12499
|
Highly active
|
≥ 12500
|
The categorization was based on the article “How many steps/day are
enough? Preliminary pedometer indices for public health” by Catrine
Tudor-Locke and David R Bassett Jr. The article can be read here.
Thereafter, the users were categorized.
# Calculate average steps per day for each user
mean_daily_activity <-
daily_activity %>%
select(id, date, total_steps, very_active_minutes, fairly_active_minutes) %>%
group_by(id) %>%
summarise(mean_steps = mean(total_steps),
mean_very_active_minutes = mean(very_active_minutes),
mean_fairly_active_minutes = mean(fairly_active_minutes))
# Categorize users lifestyle based on number of daily steps
mean_daily_activity <-
mean_daily_activity %>%
mutate(lifestyle_type = case_when (mean_steps < 5000 ~ "Sedentary: 0 - 4999 steps",
mean_steps < 7500 ~ "Low active: 5000 - 7499 steps",
mean_steps < 10000 ~ "Somewhat active: 7500 - 9999 steps",
mean_steps < 12500 ~ "Active: 10000 - 12499 steps",
mean_steps >= 12500 ~ "Highly active: > 12500 steps"))
# Calculate number of users in each group
lifestyle_pct <-
mean_daily_activity %>%
select(id, lifestyle_type) %>%
group_by(lifestyle_type) %>%
summarise(n = n())
# Calculate percentage
lifestyle_pct <-
lifestyle_pct %>%
mutate(dbl = (n/sum(n)),
round = round(dbl, digits = 2),
error = 1 - dbl/round)
lifestyle_pct
## # A tibble: 5 × 5
## lifestyle_type n dbl round error
## <chr> <int> <dbl> <dbl> <dbl>
## 1 Active: 10000 - 12499 steps 5 0.152 0.15 -0.0101
## 2 Highly active: > 12500 steps 2 0.0606 0.06 -0.0101
## 3 Low active: 5000 - 7499 steps 9 0.273 0.27 -0.0101
## 4 Sedentary: 0 - 4999 steps 8 0.242 0.24 -0.0101
## 5 Somewhat active: 7500 - 9999 steps 9 0.273 0.27 -0.0101
The sum of rounded decimals did not add up to 1 (100%). As rounding
induced equal error for all lifestyle types, it was decided to randomly
round up one of them. The lifestyle type that was rounded up was
sedentary.
# Round up sedentary lifestyle
lifestyle_pct$round[4] <- lifestyle_pct$round[4] + 0.01
# Transform into percentage
lifestyle_pct <-
lifestyle_pct %>%
mutate(pct = scales::percent(round)) %>%
select(-c(dbl, error))
# Reorder lifestyle types after activity level by adding factor levels
lifestyle_pct$lifestyle_type <- factor(lifestyle_pct$lifestyle_type, levels=c("Sedentary: 0 - 4999 steps", "Low active: 5000 - 7499 steps", "Somewhat active: 7500 - 9999 steps", "Active: 10000 - 12499 steps", "Highly active: > 12500 steps"))
# Pie chart
ggplot(lifestyle_pct, aes(x=" ", y=round, fill=lifestyle_type)) +
geom_bar(width=1, stat="identity") +
coord_polar("y", start=0) +
scale_fill_manual(values = c("#61E3FA", "#6376DB", "#B96CF6", "#DB65A2", "#FB9778")) +
labs(title = "User Distribution based on Daily Steps",
fill = "Lifestyle") +
theme_bw() +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.ticks = element_blank(),
axis.text.x=element_blank(),
panel.border = element_blank(),
panel.grid=element_blank(),
plot.title=element_text(size=14, face="bold", hjust = 0.5)
) +
geom_text(aes(label = pct),
position = position_stack(vjust = 0.5))
As the pie chart shows, all lifestyle types were represented in the
tracker data. However, most users had a somewhat active (27%), low
active (27%), or sedentary lifestyle (25%). Only 21% of the users were
considered to have an active (15%) or very active (6%) lifestyle based
on their total daily steps.
4.3. Active minutes¶
The number of steps is not the only measure by which activity level
is measured. According to Verywell Fit (article
here), the number of active minutes mean even more than steps. They
say the recommended activity level is minimum 150 min moderate-intensity
exercise or 75 min vigorous-intensity exercise per week to reduce health
risks such as heart disease, type 2 diabetes, etc. For potentially
greater health benefits they recommend 300 min moderate-intensity
exercise or 150 min vigorous-intensity exercise per week. Hence, it was
investigated how many of the users that reached these levels of
activity.
It was assumed that moderate-intensity exercise corresponds to fairly
active minutes in the dataset, and that vigorous-intensity exercise
corresponds to very active minutes. Based on above recommendations,
three levels of activity were created based on the health benefits they
entail.
Enough active minutes
|
Fairly active minutes
|
Very active minutes
|
Health benefits
|
Not enough
|
0 - 22
|
AND 0 - 11
|
Low health benefits
|
Enough
|
22 - 42
|
OR 11 - 21
|
Significant health benefits
|
More than enough
|
≥ 43
|
OR ≥ 22
|
High health benefits
|
Thereafter, the users were categorized into an activity level based
on the if they had enough active minutes.
mean_daily_activity <-
mean_daily_activity %>%
mutate(activity_level = case_when (mean_fairly_active_minutes < 22 &
mean_very_active_minutes < 11 ~
"Not enough",
mean_fairly_active_minutes >= 43 |
mean_very_active_minutes >= 22 ~
"More than enough",
mean_fairly_active_minutes < 43 |
mean_very_active_minutes < 22 ~
"Enough"))
# Reorder lifestyle types after activity level by adding factor levels
mean_daily_activity$lifestyle_type <- factor(mean_daily_activity$lifestyle_type, levels=c("Sedentary: 0 - 4999 steps", "Low active: 5000 - 7499 steps", "Somewhat active: 7500 - 9999 steps", "Active: 10000 - 12499 steps", "Highly active: > 12500 steps"))
# Reorder activity levels by adding factor levels
mean_daily_activity$activity_level <- factor(mean_daily_activity$activity_level, levels=c("Not enough", "Enough", "More than enough"))
# Bar chart
ggplot(mean_daily_activity, aes(x=activity_level, fill=lifestyle_type)) +
geom_bar() +
scale_fill_manual(values = c("#61E3FA", "#6376DB", "#B96CF6", "#DB65A2", "#FB9778")) +
labs(title = "Users Reaching Enough Activity Minutes",
x = " ", y = "Count",
fill = "Lifestyle") +
theme_bw() +
theme(plot.title=element_text(size=14, face="bold", hjust = 0.5))
Reaching the recommended active minutes per day to gain health
benefits was achievable regardless of what lifestyle in terms of steps
the user had. However, it seems like it is more likely to reach the
recommended steps to gain high health benefits when having a somewhat
active, active, or highly active lifestyle in terms of daily steps.
Hence, encouraging sedentary and low active users to be more active
during the day seems like a good idea. Emphasis spending more time being
fairly active and very active rather that just increasing the steps. For
example, rather a brisk walk than a slow one.
4.4. Timing of steps¶
Next the timing of the users’ steps was analysed. First the timing
based on weekday was analysed and then the timing based on time during
the day.
# Add column with weekday name
daily_activity <-
daily_activity %>%
mutate(weekday = wday(date, label=TRUE, abbr=FALSE, locale="en_US"))
# Calculate average steps per day
weekday_mean_steps <-
daily_activity %>%
group_by(weekday) %>%
summarise(mean_steps = mean(total_steps))
# Reorder weekdays by adding factor levels
weekday_mean_steps$weekday <- factor(weekday_mean_steps$weekday, levels=c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))
# Barplot
ggplot(weekday_mean_steps, aes(x=weekday, y=mean_steps)) +
geom_bar(stat="identity", fill="#583475") +
labs(title = "User Mean Steps per Weekday",
x = " ", y = "Number of steps") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45,vjust = 0.5, hjust = 1),
plot.title=element_text(size=14, face="bold", hjust = 0.5))
On average, users took slightly more steps on Tuesdays and Saturdays,
and slightly less steps on Sundays.
To find out the daily step distribution per lifestyle type, the
following code was used.
# Select relevant columns from daily_activity
weekday_steps <-
daily_activity %>%
select(id, date, total_steps, weekday)
# Select relevant columns from mean_daily_activity
df_join <-
mean_daily_activity %>%
select(id, lifestyle_type)
# Join dataframes
weekday_mean_steps_grouped <- left_join(weekday_steps, df_join, by="id")
# Calculate mean steps grouped by lifestyle and weekday
weekday_mean_steps_grouped <-
weekday_mean_steps_grouped %>%
group_by(lifestyle_type, weekday) %>%
summarise(mean_steps = mean(total_steps))
# Reorder weekdays by adding factor levels
weekday_mean_steps_grouped$weekday <- factor(weekday_mean_steps_grouped$weekday, levels=c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))
# Barplot
ggplot(weekday_mean_steps_grouped, aes(x=weekday, y=mean_steps, fill=lifestyle_type)) +
geom_bar(stat="identity", position = position_dodge()) +
scale_fill_manual(values = c("#61E3FA", "#6376DB", "#B96CF6", "#DB65A2", "#FB9778")) +
labs(title = "User Mean Steps per Weekday",
x = " ", y = "Number of steps",
fill = "Lifestyle") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45,vjust = 0.5, hjust = 1),
plot.title=element_text(size=14, face="bold", hjust = 0.5))
However, looking at the average steps per weekday grouped by
lifestyle shows how the steps of the active and highly active users
masked the low number of steps taken by the sedentary and low active
users. Both low active and sedentary users seems to take most steps on
Saturdays. Somewhat active people seems to take most steps in the
beginning of the week (Mon and Tue) and less steps on Sundays. Maybe the
Bellabeat app could custom reminders on being active based on user
lifestyle.
To find out at what time during the day users are most active, the
users hourly steps were analysed.
# Separate date and time into two different columns
hourly_steps <-
hourly_steps %>%
separate(date_time, into = c("date", "time"), sep = " ")
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 934 rows [1, 25, 49, 73,
## 97, 121, 145, 169, 193, 217, 241, 265, 289, 313, 337, 361, 385, 409, 433, 457,
## ...].
# Calculate mean steps per hour
hourly_mean_steps <-
hourly_steps %>%
group_by(time) %>%
summarise(mean_steps = mean(step_total))
# Barplot
ggplot(hourly_mean_steps, aes(x=time, y=mean_steps)) +
geom_bar(stat="identity", fill="#583475") +
labs(title = "User Mean Steps per Hour",
x = " ", y = "Number of steps") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45,vjust = 0.5, hjust = 1),
plot.title=element_text(size=14, face="bold", hjust = 0.5))
On average, users take most steps around lunch time (noon - 2 pm) and
between 5 pm - 7 pm. Might indicate that most of the users have office
jobs.
# Join hourly_steps and df_join
hourly_mean_steps_grouped <- left_join(hourly_steps, df_join, by="id")
# Calculate average hourly steps per user
hourly_mean_steps_grouped <-
hourly_mean_steps_grouped %>%
group_by(lifestyle_type, time) %>%
summarise(mean_steps = mean(step_total))
# Barplots showing mean steps throughout the day for each lifestyle type
ggplot(hourly_mean_steps_grouped, aes(x=time, y=mean_steps, fill=lifestyle_type)) +
geom_bar(stat = "identity") +
facet_wrap(vars(lifestyle_type)) +
scale_fill_manual(values = c("#61E3FA", "#6376DB", "#B96CF6", "#DB65A2", "#FB9778")) +
labs(title = "User Mean Step Distribution Throughout the Day",
x = " ", y = "Number of steps") +
theme_bw() +
theme(axis.text.x = element_blank(),
legend.position = "none",
plot.title=element_text(size=14, face="bold", hjust = 0.5))
The more active of a user, the more the distribution deviated from a
uniform distribution. Seems like most somewhat active, active, and
highly active users have periods where they take a large amount of steps
followed by periods where they take less steps. Both active and very
active users have a peak at 2 pm. Perhaps they are going for a mid-day
walk or workout. Also, both active and very active users have a peak at
6 - 7 pm. Perhaps they are walking home from work or going for a
walk/workout. Active users also have a peak at 9 am, maybe they walk to
work.
In general, it seems like a good idea to promote more activity
throughout the whole day. Maybe a little reminder in the app to have an
activity break and move the body a bit. Encourage users to commute by
foot or bike if they are able, to have a lunch time walk/work out, and
to continue to move after work. I.e. to integrate movement into their
everyday life.
4.5. Tracker usage
While observing the summary statistics, it was noticed that users do
not keep their tracking devices on all the time. First, it was
investigated how many days, out of the 31 days, the users utilized their
fitness tracker.
The users were categorized into four different groups based on their
daily usage. The groups are shown in the table below.
User type
|
Days of usage
|
Sporadic
|
0 - 9
|
Moderate
|
10 - 19
|
Frequent
|
20 - 30
|
Everyday
|
31
|
To calculate the daily usage, the following code was used.
# Categorize based on usage
tracker_usage_days <-
daily_activity %>%
group_by(id) %>%
summarise(n_days = n()) %>%
mutate (usage_days = case_when(n_days < 10 ~ "Sporadic user: 0 - 9 days",
n_days < 20 ~ "Moderate user: 10 - 19 days",
n_days < 31 ~ "Frequent user: 20 - 30 days",
n_days == 31 ~ "Everyday user: 31 days"))
# Calculate percentages for graph
tracker_usage_days_pct <-
tracker_usage_days %>%
group_by(usage_days) %>%
summarise(n_users = n()) %>%
mutate(dbl = n_users / sum(n_users),
dbl = round(dbl, digits = 2),
pct = percent(dbl))
tracker_usage_days_pct
## # A tibble: 4 × 4
## usage_days n_users dbl pct
## <chr> <int> <dbl> <chr>
## 1 Everyday user: 31 days 21 0.64 64%
## 2 Frequent user: 20 - 30 days 9 0.27 27%
## 3 Moderate user: 10 - 19 days 2 0.06 6%
## 4 Sporadic user: 0 - 9 days 1 0.03 3%
# Reorder groups by adding factor levels
tracker_usage_days_pct$usage_days <- factor(tracker_usage_days_pct$usage_days, levels=c("Sporadic user: 0 - 9 days", "Moderate user: 10 - 19 days", "Frequent user: 20 - 30 days", "Everyday user: 31 days"))
ggplot(tracker_usage_days_pct, aes(x=" ", y=dbl, fill=usage_days)) + geom_bar(width=1, stat="identity") +
coord_polar("y", start=0) +
scale_fill_manual(values = c("#6376DB", "#B96CF6", "#DB65A2", "#FB9778")) +
labs(title = "Tracker Usage based on Days",
fill = "User type") +
theme_bw() +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.ticks = element_blank(),
axis.text.x=element_blank(),
panel.border = element_blank(),
panel.grid=element_blank(),
legend.position = "left",
plot.title = element_text(face = "bold", size = 14, hjust = 0.5)) +
geom_text(aes(label = pct),
position = position_stack(vjust = 0.5))
The users utilized their trackers often! A majority of users (64%)
used their trackers daily, and more than a forth (27%) of the users used
their trackers 20 or more days of the month.
Thereafter, it was investigated how many minutes a day the users used
their fitness tracker. The usage was divided into four groups. The
grouping is based on that one day and night is 24 hours (1440 min).
People are recommended to sleep 8 hours (480 min) per night. In this
analysis, day is defined as the 16 hours (960 min) that are left. The
usage groups are shown in the table below.
Usage group
|
Minutes of usage
|
Less than half of the day
|
0-479
|
More than half of the day
|
480-959
|
Most of the day and night
|
960- 1439
|
All day and night
|
1440
|
To calculate the minutely usage, the following code was used.
# Categorize based on usage
tracker_usage_minutes <-
daily_activity %>%
mutate(tracked_minutes = very_active_minutes + fairly_active_minutes + lightly_active_minutes + sedentary_minutes) %>%
select(id, tracked_minutes) %>%
mutate(usage_minutes = case_when(tracked_minutes < 480 ~ "Less than half of the day: 0 - 479 min",
tracked_minutes < 960 ~ "More than half of the day: 480 - 959 min",
tracked_minutes < 1440 ~ "Most of the day and night: 960 - 1439 min",
tracked_minutes == 1440 ~ "All day and night: 1440 min"))
# Calculate percentages for graph
tracker_usage_minutes_pct <-
tracker_usage_minutes %>%
group_by(usage_minutes) %>%
summarise(n_users = n()) %>%
mutate(dbl = n_users / sum(n_users),
dbl = round(dbl, digits = 2),
pct = percent(dbl))
tracker_usage_minutes_pct
## # A tibble: 4 × 4
## usage_minutes n_users dbl pct
## <chr> <int> <dbl> <chr>
## 1 All day and night: 1440 min 478 0.51 51%
## 2 Less than half of the day: 0 - 479 min 13 0.01 1%
## 3 More than half of the day: 480 - 959 min 167 0.18 18%
## 4 Most of the day and night: 960 - 1439 min 282 0.3 30%
# Reorder by adding factor levels
tracker_usage_minutes_pct$usage_minutes <- factor(tracker_usage_minutes_pct$usage_minutes, levels=c("All day and night: 1440 min", "Most of the day and night: 960 - 1439 min", "More than half of the day: 480 - 959 min", "Less than half of the day: 0 - 479 min"))
# Visualize results in pie chart
ggplot(tracker_usage_minutes_pct, aes(x=" ", y=dbl, fill=usage_minutes)) +
geom_bar(width=1, stat="identity") +
coord_polar("y", start=0) +
scale_fill_manual(values = c("#FB9778", "#DB65A2", "#B96CF6", "#6376DB")) +
labs(title = "Tracker Usage based on Minutes",
fill = "Minutes of usage") +
theme_bw() +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.ticks = element_blank(),
axis.text.x=element_blank(),
panel.border = element_blank(),
panel.grid=element_blank(),
plot.title = element_text(face = "bold", size = 14, hjust = 0.5)) +
geom_text(aes(label = pct),
position = position_stack(vjust = 0.5))
Also when it comes to minutes per day the users utilized their
fitness trackers a lot. A majority of the users (51%) used their device
every single minute of the day and night. 30% of users used their
tracker most of the day and night. Almost a fifth (18%) of the users
used their fitness tracker between 8-16 hours of the day and night.
Lastly, the minutely use per user type was analysed.
# Join tracker_usage_days and tracker_usage_minutes
tracker_usage <- left_join(tracker_usage_days, tracker_usage_minutes, by = "id")
# Group by daily use and minutely use
tracker_usage <-
tracker_usage %>%
group_by(usage_days, usage_minutes) %>%
summarise(n = n())
# Reorder groups by adding factor levels
tracker_usage$usage_days <- factor(tracker_usage$usage_days, levels=c("Sporadic user: 0 - 9 days", "Moderate user: 10 - 19 days", "Frequent user: 20 - 30 days", "Everyday user: 31 days"))
# Reorder by adding factor levels
tracker_usage$usage_minutes <- factor(tracker_usage$usage_minutes, levels=c("Less than half of the day: 0 - 479 min", "More than half of the day: 480 - 959 min", "Most of the day and night: 960 - 1439 min","All day and night: 1440 min"))
# Bar plots based on user types
ggplot(tracker_usage, aes(x=" ", y=n, fill=usage_minutes)) +
geom_bar(stat = "identity", position = position_dodge()) +
facet_grid(. ~ usage_days, labeller = label_wrap_gen(width=16)) +
scale_fill_manual(values = c("#6376DB", "#B96CF6", "#DB65A2", "#FB9778")) +
labs(title = "Fitness Tracker Utilization",
x=" ", y="Number of entries",
fill = "Minutes of usage") +
theme_bw() +
theme (plot.title = element_text(face = "bold", size = 14, hjust = 0.5))
As seen in the figure, the users use their fitness trackers a lot in
terms of both days and minutes. Users who use their trackers daily are
also likely to use them all day and night or most of the day and night.
This indicates user engagement is high. However, to increase the usage
of the fitness tracker even further, it might be a good idea to add a
reminder in the app to wear the fitness tracker after longer times of
inactivity.
4.6. Reccomendations
This section will present a summary of the high-level recommendations
to be used in Bellabeat’s marketing strategy.
- Encourage sedentary and low active users to be more active during
the day. Emphasis spending more time being fairly active and very active
rather that just increasing the steps. For example, rather a brisk walk
than a slow one.
- Custom reminders on being active based on user lifestyle.
- Promote more activity throughout the whole day. Maybe a little
reminder in the app to have an activity break and move the body a bit.
Encourage users to commute by foot or bike if they are able, to have a
lunch time walk/work out, and to continue to move after work. I.e. to
integrate movement into their everyday life.
- Add a little reminder in app to use fitness tracker if there are
long periods of inactivity.
- Remake analysis with own data; Considering the demographics of the
Fitbit users were unknown, it is unsure if the results of this analysis
are applicable to Bellabeat. Also, the sample of the analysis is very
small. Hence, it is recommended to use Bellabeat’s own data to perform
the same or a similar analysis.