The most common monitored metrics among smartwatch user, specifically as a runner, are heart rate, distance, and pace. Perhaps it is easy for us to remember our performance daily or weekly, but we hardly see our progress in long-term. Thus, I am curious to analyze running performance in Python to inspect them from time to time.
Someone gladly gives me his Garmin data to do some analysis on his running performance. Hopefully, it could give him insights on current performance and how he should plan and improve his running training program so he could running efficiently without injuries for his upcoming running events.
Data Collection
Garmin offers several ways to extract and import user’s exercises data. In this case, I obtained the data in csv file.
Data Cleaning
Before we jump to analysis and visualization, we have to prepare our data frame so that it only consists of informations that we need in organized and standardized format. Any other missing and unnecessary data should be eliminated to create an effective observation.
Data Preview and Description
First, we will import the data using Pandas library in our IDE, Visual Studio Code. After that, we have to check the information of our data so we know which part of the data that we should clean and fix.
import pandas as pd # reading csv file df = pd.read_csv('Garmin/Activities.csv') # print the information about the data df.info()
Based on the data frame information, we can see that we have 41 columns and we do not require all of them for our study.
Data Filtering
Based on the preview of our data in csv, we can see that all the type of activities are exported (walking, treadmill running, etc). However, we only would like to specifically examine on how the progress of running. Thus, we will drop all other activities and keep the ‘Running’ then save them in a new data frame named Running_Activities
.
# Dropping all other type of activity except 'running' # Create in new data frame 'Running_Activities' Running_Activities = All_Activities.drop(All_Activities[(All_Activities['Activity Type'] != 'Running')].index)
Other data that we need are 1) date, 2) distance, 3) calories, 4) time, 5) average heart rate, 6) maximum heart rate, 7) average run cadence, 8) average pace, 9) best pace, 10) average stride length, 11) elapsed time.
Thus, we have to update Running_Activities data frame to keep only the needed columns.
Running_Activities = Running_Activities[['Date', 'Distance','Calories','Avg HR','Avg Run Cadence', 'Avg Pace', 'Best Pace', 'Avg Stride Length', 'Elapsed Time']]
Then, we will make new data frame, Running_Activities_copy
, contains all the data type in the required format for analysis; Date
into datetime format while we have to change Avg Pace
, Best Pace
, and Elapsed Time
into datetime then calculate them into number of minutes.
#convert 'Date', 'Avg Pace', 'Best Pace', 'Elapsed Time' objects to datetime format Running_Activities_copy = Running_Activities.copy() Running_Activities_copy['Date'] = pd.to_datetime(Running_Activities_copy['Date']) Running_Activities_copy['Avg Pace'] = pd.to_datetime(Running_Activities_copy['Avg Pace'], format='%M:%S') Running_Activities_copy['Best Pace'] = pd.to_datetime(Running_Activities_copy['Best Pace'], format='%M:%S') Running_Activities_copy['Elapsed Time'] = pd.to_datetime(Running_Activities_copy['Elapsed Time']) #convert 'Avg Pace', 'Best Pace', 'Elapced Time' objects to the number of minutes Running_Activities_copy['Avg Pace'] = Running_Activities_copy['Avg Pace'].dt.hour*60 + Running_Activities_copy['Avg Pace'].dt.minute + Running_Activities_copy['Avg Pace'].dt.second/60 Running_Activities_copy['Best Pace'] = Running_Activities_copy['Best Pace'].dt.hour*60 + Running_Activities_copy['Best Pace'].dt.minute + Running_Activities_copy['Best Pace'].dt.second/60 Running_Activities_copy['Elapsed Time'] = Running_Activities_copy['Elapsed Time'].dt.hour*60 + Running_Activities_copy['Elapsed Time'].dt.minute + Running_Activities_copy['Elapsed Time'].dt.second/60
Then, we have to change the rest of the columns data type from object into float.
# Change the remaining column(s) with object data tyoe into float obj_column = Running_Activities_copy.select_dtypes(include='object').columns Running_Activities_copy[obj_column] = Running_Activities_copy[obj_column].astype("float") # check data type Running_Activities_copy.info()
As we can see in output table below, now we have the proper data type for data exploration:
Data Analysis
To visualize and analyze our data, we have to import the following libraries first:
import seaborn as sns import matplotlib.pyplot as plt
Heart rate analysis
One of the most common beginner mistake among runners is they push too hard too early, impatiently obsess to become a pro runner. They tend to run too often and too fast up to a level where they actually haven’t reach. Gladly, sport science has become more advance and training strategies has been more reachable so recreational and elite runners can develop their fitness performance safely without any injuries.
Recent study from 14,000 runners with 1.6 million exercise sessions shows that running in frequent low average intensity training regularly helps runners to become much more faster. The main objective of this low-intensity session is to build a strong baseline fitness performance. Running while still able to speak in complete sentences is a sign that you already running in low intensity correctly.
To visualize our heart rate data, we can type this code in our IDE:
plt.figure(figsize=(14,6)) sns.histplot(data = Running_Activities_copy, x='Avg HR', color='orange').set(title='Average Heart Rate Distribution') #set the ticks plt.xticks([145,150,155,160,165,170]) plt.xlabel('Average Heart Rate (bpm)')
The output of the code will show us the bar graph:
Based on our analyzed data, the bar graph above shows us that most training routine was performed in high heart rate which is not following the suggested training plan. In the future, running in much slower speed should be performed more often than the faster one.
Pace-cadence correlation
Runners monitor a lot of things to make sure they are running efficiently, not only getting faster but also lessen the possibility to get injured. Most injuries in runners are due to wrong running form, over striding, or low cadence.
Cadence is the total number of steps in a certain point of time, usually calculated in strides per minute (spm). The precise number of cadence is hotly debated among runners. Generally, cadence between 150 and 170 strides per minute (spm) is recommended for recreational runners. Having low cadence is not bad, but the possibility of running ineffectively is high. People with cadence below 150 means that they most likely have long stride, wobbly run, and more vulnerable to injury.
Pace is another way of runner to describe speed in minutes per km or minutes per miles. Low number of pace means high fast run. Elite runners have pace in around 2 minutes per kilometer. No matter how fast your running is, safety always comes first. It is not wise and depressing to run fast today and end up hurting yourself, then losing your running routine.
To study the link of our pace and cadence:
sns.regplot(x='Avg Pace',y='Avg Run Cadence', data=Running_Activities_copy.dropna(), color='orange') print("The correlation coefficient between cadence and pace:", Running_Activities_copy['Avg Pace'].corr(Running_Activities_copy['Avg Run Cadence'])) plt.title('Pace-Cadence Correlation') plt.xlabel('Avg Pace (minutes per kilometer)') plt.ylabel('Avg Run Cadence (steps per minute)')
Output:
The plot shows negative correlation of pace-cadence—running fast (the lower pace values represent faster running speeds), the cadence goes up. We can see that the cadence is change naturally from the slower to the faster pace—which means good. The range of cadence is within the range of safety recommendation, which is above 150 spm.
It is an issue if the cadence is static no matter how slow or how fast the run is because it implies that the runner is over striding to compensate. As it does not happen in our data, so this kind of issue does not exist.
Pace-stride length correlation
Stride length is the distance of our feet when we land on the ground with the same feet (illustration click here). Stride is contributed by the strength and supported by forward lean and gravity. Strength and power from glutes, hamstrings and quadriceps is critical to have a proper and safe stride for runner.
Faster and better run can be improved by having quick stride rate as a results of short stride length. In addition, short stride length will help us to increase our cadence and give us better posture where our foot lands so lead to less injury prone. Still, there is no specific recommendation number for stride because it is also highly depend on the runner’s height (source: link).
To generate the scatter plot of pace and stride length:
sns.regplot(x='Avg Pace',y='Avg Stride Length', data=Running_Activities_copy.dropna(), color='orange') print("The correlation coefficient between pace and stride lenght:", Running_Activities_copy['Avg Pace'].corr(Running_Activities_copy['Avg Stride Length'])) plt.title('Pace-Stride Length Correlation') plt.xlabel('Avg Pace (minutes per kilometer)') plt.ylabel('Avg Stride Length (meter)')
Output:
Similar with cadence, stride length is also alter synergically along with pace—and it shows in our data above.
Recommendation
Our personal Garmin data shows that the user has to increase the frequency of running in lower intensity and the easiest way to measure this while running is he still can run while holding conversation in full sentences. Hopefully, this will improve his baseline fitness. Along with maintaining the good posture of running, hopefully the user will be able to run faster and free of injury.