Jonathan C. Johnson

E-mail: jonathan.johnson10@okstate.edu
Office: MSCS 524
Department of Mathematics
Oklahoma State University
Return to MATH 4910/5010 Homepage

MATH 4910/5010 - R Lab 4

In this lab you will learn how to use functions from the in development tdavesre package to perform topological data analysis in R.

Objectives:
  1. Learn how to plot Vietoris-Rips complexes in R
  2. Learn how to compute persistence homology classes of data using ripserr
  3. Learn how to plot persistence diagrams and barcodes using ggtda

You can find the R markdown template and the datasets used for this lab on Canvas in the course files under the "R Lab 4" folder. Follow the instructions below. Instructions in green indicate tasks that should be completed in the R markdown file for this lab.

Packages and data

We are now ready to preform topological data analysis on our datasets using tools made for R. There is a package currently in development called tdaverse which will contain a suite of TDA tools. Because this package has not been completed yet, we will install the subpackages we plan to use individually.

  1. Install ripserr package.

  2. Some the packages we need have not been submitted to the R archive network yet so we'll have to install it from Github. Install the remotes package and load it in the R console. Install the ggtda package by running the following code in the R console.

    > remotes::install_github("rrrlw/ggtda", vignettes = TRUE)

  1. In code block 1, write code to load the tidyverse, ripserr, and ggtda packages into R.

You should have found two .csv files on canvas. One first file, 3dpoints.csv contains coordinates for points in \(\mathbb{R}^3\). The second file, rome_airbnb.csv, contains data from Airbnb listings in Rome.

  1. In code block 2, write code to load data from 3dpoints.csv and rome_airbnb.csv into a variables respectively called pnts and airbnb.

Plotting complexes using ggtda

The package ggtda adds extra functions which can used to plot TDA information with ggplot. Let's start by drawing a Vietoris-Rips complex.

  1. In code block 3, use the data.frame() function to create a data frame with the following information and store it in a variable called P.

    x y
    5 -4
    4 1
    3 -2
    1 3
    0 -3
    -1 4
    -1 3
    -1 -2
    -2 0

    \(P\)

  1. Use the ggplot() function to plot these points in the R console.

We can use the stat_disk() function to add balls around our data points. We can specify the radius by passing the radius argument.

  1. Let's draw a cover of the points in P with balls.

    > ggplot(P,aes(x=x,y=y))+theme_classic()+geom_point()+stat_disk(radius=1.5)

  2. We can also add geom_vietoris0(), geom_vietoris1(), and geom_vietoris2() to draw the Vietoris-Rips complex. Notice that these functions use diameter not radius.

    > ggplot(P,aes(x=x,y=y)) +
          theme_classic() +
          geom_point() +
          stat_vietoris0() +
          stat_vietoris1(diameter = 3, alpha = .25) +
          stat_vietoris2(diameter = 3, fill = "darkred")

  1. Add code to your plot in code block 3 that adds a cover of balls of radius 2. Then add code plot the Vietoris-Rips complex \(\mathbb{VR}^2(P)\).

  2. Compute the Betti numbers \(\beta_0(\mathbb{VR}^2(P))\) and \(\beta_1(\mathbb{VR}^2(P))\). Type your answer in the space marked [Answer Here 1].

Persistence homology and diagrams with ripserr and ggtda

Now let's compute some persistence homology! This can done using the vietoris_rips() function from the ripserr package. In addition to the dataset, we can also pass the arguments dim and threshold to specify the maximum dimension of homology to compute and the maximal value of \(r\) to use. The vietoris_rips() function then creates a Rips filtration from the data and computes the persistence homology classes of this filtration along with their birth and death times.

  1. Let's use vietoris_rips() to compute the persistence homology classes for the Vietoris-Rips filtration of P.

    > P_vr <- vietoris_rips(P)
    > P_vr
    ##   dimension    birth    death
    ## 1         0 0.000000 1.000000
    ## 2         0 0.000000 1.414214
    ## 3         0 0.000000 2.000000
    ## 4         0 0.000000 2.236068
    ## 5         0 0.000000 2.828427
    ## 6         0 0.000000 3.162278
    ## 7         0 0.000000 3.162278
    ## 8         0 0.000000 3.162278
    ## 9         1 3.605551 5.385165
    Important: The vietoris_rips() function uses the diameter to compute the filtration not the radius For example, the first 0-homology class dies at 1 since the distance between the points \((-1,4)\) and \((-1,3)\) is 1. However, in Dey and Wang's convention, the edge between these two points appear in \(\mathbb{VR}^{0.5}(P)\).

  1. Based on the information in P_vr, what is \(\beta_0(\mathbb{VR}^{1.5}(P))\)? Type your answer in the space marked [Answer Here 2]. Explain in general how you can read a Betti number \(\beta_p(\mathbb{VR}^{r}(P))\) from P_vr.

  2. Our data set P has nine points but P_vr only list eight 0-homology classes, why does this happen? Explain your answer in the space marked [Answer Here 3].

Now let's draw the persistence diagram. The ggtda package adds several uses functions for drawing persistence diagrams and barcodes. The stat_persistence() function allows us to use ggplot() to draw persistence diagrams. To use stat_persistence() we need to specify the start and end arguments of the aesthetic.

  1. Let's use stat_persistence() to draw the persistence diagram of the Vietoris-Rips filtration of P.

    > ggplot(P_vr,aes(start=birth,end=death)) + stat_persistence()

  2. Let's add the diagonal. The geom_abline() function draws the line \(y=ax+b\).

    > ggplot(P_vr,aes(start=birth,end=death)) +
          stat_persistence() +
          geom_abline(slope=1,intercept=0)

  3. We need to fix the scaling. Naturally, we want our scale to start at 0, and we'll use the largest birth and death values to determine when our scale ends.

    > maxb <- max(P_vr$birth)
    > maxd <- max(P_vr$death)
    > ggplot(P_vr,aes(start=birth,end=death)) +
          stat_persistence() +
          geom_abline(slope=1,intercept=0) +
          lims(x=c(0,maxb), y=c(0,maxd))

  4. Our diagram is plots the homology classes of every dimension. Let's distinguish the dimension of the homology classes by color and shape. Note that R tends to assume the numerical columns are continuous variables. We can use the factor() function to transform continuous variables into discrete variables.

    > ggplot(P_vr,aes(start=birth,end=death)) +
          stat_persistence(mapping=aes(colour=factor(dimension),shape=factor(dimension))) +
          geom_abline(slope=1,intercept=0) +
          lims(x=c(0,maxb), y=c(0,maxd))

  5. Finally, let's decorate our diagram with proper labels and a theme.

    > ggplot(P_vr,aes(start=birth,end=death)) +
          theme_persist() +
          theme(
              legend.key = element_rect(fill='white'),
              legend.box.background = element_rect(),
              legend.box.margin = margin(6,6,6,6)
          ) +
          stat_persistence(mapping=aes(colour=factor(dimension),shape=factor(dimension))) +
          geom_abline(slope=1,intercept=0) +
          lims(x=c(0,maxb), y=c(0,maxd)) +
          labs(
              title="Persistence Diagram of P",
              x="Birth",
              y="Death",
              colour="Dim",
              shape="Dim"
          )
    This looks nice, but it will be annoying to type this out every time. Next, we'll find a way to streamline these plots.

  1. Copy the following code into code block 4.

    dgm_theme <- theme(
        legend.key = element_rect(fill='white'),
        legend.box.background = element_rect(),
        legend.box.margin = margin(6,6,6,6)
    )

    layer_persist <- stat_persistence(mapping = aes(colour=factor(dimension), shape = factor(dimension)))

    layer_diagonal <- geom_abline(slope=1,intercept=0)

    dgm_labels <- labs(
        title="Persistence Diagram of P",
        x="Birth",
        y="Death",
        colour="Dim",
        shape="Dim"
    )

  1. Now we can plot persistence diagrams a little easier

    > ggplot(P_vr,aes(start=birth,end=death)) +
          theme_persist() + dgm_theme +
          layer_persist + layer_diagonal +
          lims(x=c(0,maxb),y=c(0,maxd)) +
          dgm_labels
    That's much better.

We can also draw barcodes using the geom_barcode() function.

  1. Let's plot a barcode diagram for P.

    > ggplot(P_vr,aes(start=birth,end=death)) +
          theme_barcode() + dgm_theme +
          geom_barcode(size=1, mapping = aes(colour=factor(dimension))) +
          labs(x = "Diameter", y = "Homology Classes", colour="Dim")

  1. In code block 5, write code that computes the dimension 0, 1, and 2 persistence homology classes of the Vietoris-Rips filtration of the points in pnts. Store this complex in a variable. Add code that draws the persistence diagram and barcode of the Vietoris-Rips complex of the points in pnts.

  2. Add code to code block 6 that plots the points in pnts in 3D.

  3. Compare the scatter plot of pnts and its persistence diagrams. Write your response in the space marked [Answer Here 4].

Datasets passed to the vietoris_rips() function must only contain numerical values.

  1. In code block 7, write code that creates a two column dataset containing the longitude and latitude data from airbnb. Then, add code that computes the 0 and 1 dimensional persistence homology classes of the Vietoris-Rips filtration of the location data of the airbnb listings. Finally, add code that draws the persistence diagram of this filtration.

Congratulations! You've mastered performing topological data analysis in R.