MATH 4910/5010 - R Lab 4
In this lab you will learn how to use functions from the in development tdavesre package to perform topological data analysis in R.
Objectives:- Learn how to plot Vietoris-Rips complexes in R
- Learn how to compute persistence homology classes of data using ripserr
- Learn how to plot persistence diagrams and barcodes using ggtda
You can find the R markdown template and the datasets used for this lab on Canvas in the course files under the "R Lab 4" folder. Follow the instructions below. Instructions in green indicate tasks that should be completed in the R markdown file for this lab.
Packages and data
We are now ready to preform topological data analysis on our datasets using tools made for R. There is a package currently in development called tdaverse which will contain a suite of TDA tools. Because this package has not been completed yet, we will install the subpackages we plan to use individually.
Install ripserr package.
Some the packages we need have not been submitted to the R archive network yet so we'll have to install it from Github. Install the remotes package and load it in the R console. Install the ggtda package by running the following code in the R console.
> remotes::install_github("rrrlw/ggtda", vignettes = TRUE)
In code block 1, write code to load the tidyverse, ripserr, and ggtda packages into R.
You should have found two .csv
files on canvas.
One first file, 3dpoints.csv
contains coordinates for points in \(\mathbb{R}^3\).
The second file, rome_airbnb.csv
, contains data from Airbnb listings in Rome.
In code block 2, write code to load data from
3dpoints.csv
androme_airbnb.csv
into a variables respectively calledpnts
andairbnb
.
Plotting complexes using ggtda
The package ggtda adds extra functions which can used to plot TDA information with ggplot
.
Let's start by drawing a Vietoris-Rips complex.
In code block 3, use the
data.frame()
function to create a data frame with the following information and store it in a variable calledP
.x y 5 -4 4 1 3 -2 1 3 0 -3 -1 4 -1 3 -1 -2 -2 0
Use the
ggplot()
function to plot these points in the R console.
We can use the stat_disk()
function to add balls around our data points.
We can specify the radius by passing the radius
argument.
Let's draw a cover of the points in
P
with balls.> ggplot(P,aes(x=x,y=y))+theme_classic()+geom_point()+stat_disk(radius=1.5)
We can also add
geom_vietoris0()
,geom_vietoris1()
, andgeom_vietoris2()
to draw the Vietoris-Rips complex. Notice that these functions use diameter not radius.> ggplot(P,aes(x=x,y=y)) +
theme_classic() +
geom_point() +
stat_vietoris0() +
stat_vietoris1(diameter = 3, alpha = .25) +
stat_vietoris2(diameter = 3, fill = "darkred")
Add code to your plot in code block 3 that adds a cover of balls of radius 2. Then add code plot the Vietoris-Rips complex \(\mathbb{VR}^2(P)\).
Compute the Betti numbers \(\beta_0(\mathbb{VR}^2(P))\) and \(\beta_1(\mathbb{VR}^2(P))\). Type your answer in the space marked [Answer Here 1].
Persistence homology and diagrams with ripserr and ggtda
Now let's compute some persistence homology!
This can done using the vietoris_rips()
function from the ripserr package.
In addition to the dataset, we can also pass the arguments dim
and threshold
to specify the maximum dimension of homology to compute and the maximal value of \(r\) to use.
The vietoris_rips()
function then creates a Rips filtration from the data and computes the persistence homology classes of this filtration along with their birth and death times.
Let's use
vietoris_rips()
to compute the persistence homology classes for the Vietoris-Rips filtration ofP
.> P_vr <- vietoris_rips(P)
> P_vr
## dimension birth death
## 1 0 0.000000 1.000000
## 2 0 0.000000 1.414214
## 3 0 0.000000 2.000000
## 4 0 0.000000 2.236068
## 5 0 0.000000 2.828427
## 6 0 0.000000 3.162278
## 7 0 0.000000 3.162278
## 8 0 0.000000 3.162278
## 9 1 3.605551 5.385165
Important: Thevietoris_rips()
function uses the diameter to compute the filtration not the radius For example, the first 0-homology class dies at 1 since the distance between the points \((-1,4)\) and \((-1,3)\) is 1. However, in Dey and Wang's convention, the edge between these two points appear in \(\mathbb{VR}^{0.5}(P)\).
Based on the information in
P_vr
, what is \(\beta_0(\mathbb{VR}^{1.5}(P))\)? Type your answer in the space marked [Answer Here 2]. Explain in general how you can read a Betti number \(\beta_p(\mathbb{VR}^{r}(P))\) fromP_vr
.Our data set
P
has nine points butP_vr
only list eight 0-homology classes, why does this happen? Explain your answer in the space marked [Answer Here 3].
Now let's draw the persistence diagram.
The ggtda package adds several uses functions for drawing persistence diagrams and barcodes.
The stat_persistence()
function allows us to use ggplot()
to draw persistence diagrams.
To use stat_persistence()
we need to specify the start
and end
arguments of the aesthetic.
Let's use
stat_persistence()
to draw the persistence diagram of the Vietoris-Rips filtration ofP
.> ggplot(P_vr,aes(start=birth,end=death)) + stat_persistence()
Let's add the diagonal. The
geom_abline()
function draws the line \(y=ax+b\).> ggplot(P_vr,aes(start=birth,end=death)) +
stat_persistence() +
geom_abline(slope=1,intercept=0)
We need to fix the scaling. Naturally, we want our scale to start at 0, and we'll use the largest
birth
anddeath
values to determine when our scale ends.> maxb <- max(P_vr$birth)
> maxd <- max(P_vr$death)
> ggplot(P_vr,aes(start=birth,end=death)) +
stat_persistence() +
geom_abline(slope=1,intercept=0) +
lims(x=c(0,maxb), y=c(0,maxd))
Our diagram is plots the homology classes of every dimension. Let's distinguish the dimension of the homology classes by color and shape. Note that R tends to assume the numerical columns are continuous variables. We can use the
factor()
function to transform continuous variables into discrete variables.> ggplot(P_vr,aes(start=birth,end=death)) +
stat_persistence(mapping=aes(colour=factor(dimension),shape=factor(dimension))) +
geom_abline(slope=1,intercept=0) +
lims(x=c(0,maxb), y=c(0,maxd))
Finally, let's decorate our diagram with proper labels and a theme.
> ggplot(P_vr,aes(start=birth,end=death)) +
theme_persist() +
theme(
legend.key = element_rect(fill='white'),
legend.box.background = element_rect(),
legend.box.margin = margin(6,6,6,6)
) +
stat_persistence(mapping=aes(colour=factor(dimension),shape=factor(dimension))) +
geom_abline(slope=1,intercept=0) +
lims(x=c(0,maxb), y=c(0,maxd)) +
labs(
title="Persistence Diagram of P",
x="Birth",
y="Death",
colour="Dim",
shape="Dim"
)
Copy the following code into code block 4.
dgm_theme <- theme(
legend.key = element_rect(fill='white'),
legend.box.background = element_rect(),
legend.box.margin = margin(6,6,6,6)
)
layer_persist <- stat_persistence(mapping = aes(colour=factor(dimension), shape = factor(dimension)))
layer_diagonal <- geom_abline(slope=1,intercept=0)
dgm_labels <- labs(
title="Persistence Diagram of P",
x="Birth",
y="Death",
colour="Dim",
shape="Dim"
)
Now we can plot persistence diagrams a little easier
> ggplot(P_vr,aes(start=birth,end=death)) +
theme_persist() + dgm_theme +
layer_persist + layer_diagonal +
lims(x=c(0,maxb),y=c(0,maxd)) +
dgm_labels
We can also draw barcodes using the geom_barcode()
function.
Let's plot a barcode diagram for
P
.> ggplot(P_vr,aes(start=birth,end=death)) +
theme_barcode() + dgm_theme +
geom_barcode(size=1, mapping = aes(colour=factor(dimension))) +
labs(x = "Diameter", y = "Homology Classes", colour="Dim")
In code block 5, write code that computes the dimension 0, 1, and 2 persistence homology classes of the Vietoris-Rips filtration of the points in
pnts
. Store this complex in a variable. Add code that draws the persistence diagram and barcode of the Vietoris-Rips complex of the points inpnts
.Add code to code block 6 that plots the points in
pnts
in 3D.Compare the scatter plot of
pnts
and its persistence diagrams. Write your response in the space marked [Answer Here 4].
Datasets passed to the vietoris_rips()
function must only contain numerical values.
In code block 7, write code that creates a two column dataset containing the longitude and latitude data from
airbnb
. Then, add code that computes the 0 and 1 dimensional persistence homology classes of the Vietoris-Rips filtration of the location data of the airbnb listings. Finally, add code that draws the persistence diagram of this filtration.
Congratulations! You've mastered performing topological data analysis in R.