[R語言圖表]用ggplot畫山脊圖 ridgeline plot

2022-12-13
R語言教學, 資料視覺化

Last Updated on 2023-10-31

Home » R語言教學 » 資料視覺化 » [R語言圖表]用ggplot畫山脊圖 ridgeline plot

你想畫山脊圖（ridgeline plot）嗎？用 ggplot2 帶你畫！

想畫山脊圖（ridgeline plot），但是不知道怎麼在R語言中使用相關函數嗎？山脊彼此之前如何重疊？有什麼輔助線可以加？想呈現原始的資料點該如何處理？

我會在這篇文章介紹如何活用R語言的套件ggplot2，配上實際程式碼，帶你畫山脊圖。

文章目錄

山脊圖 ridgeline plot 是什麼？

山脊圖（ridgeline plot）跟密度圖（density plot）相似，都是一種用來呈現變數分布（distribution）情形的圖表，差異之處在餘，它可以同時呈現同一個類別變數所具有不同類型的數值變數分布，舉例來說，我們在這篇文章就是想呈現不同政黨（類別變數）候選人的年齡分布（數值變數）。

以密度圖呈現不同政黨議員候選人年齡分布的例子

在繪製山脊圖的時候，我們想檢視分布情形的變數同樣會映射（mapping，指的是想呈現的變量對應到圖表中的表示方法）到x軸，R會幫我們估計密度，在密度圖的時候它是映射到y軸，在山脊圖時則是把想看的類別變數對應到y軸，估計出的密度在呈現於每個類別之上。聽起來可能有些模糊，等等看了程式碼就知道了。

在 R 語言中，我們可以快速地利用 ggplot2套件輕鬆簡單的畫出山脊圖。底下會一步一步帶大家實作。

實作篇

資料長相

我們先來匯入資料。從底下的表格可以看到，這份資料2022年九合一選舉中議員候選人的基本資料，欄位有選區、姓名、政黨、性別、生日、年齡、年齡組等。另外為了繪製山脊圖，我們多載入了library(ggridges)這個套件。

library(tidyverse)
library(ggridges)
df_council_candidate_age <- read_csv("../data/ggplot-density-plot.csv")
df_council_candidate_age
#> # A tibble: 1,018 x 7
#>    areaName        name   party      gender birth        age age_group
#>    <chr>           <chr>  <chr>      <chr>  <date>     <dbl>     <dbl>
#>  1 連江縣第1選舉區 曹丞君 中國國民黨 女     1976-07-23  46.3        40
#>  2 連江縣第1選舉區 劉浩晨 民主進步黨 男     1991-08-21  31.3        30
#>  3 連江縣第1選舉區 陳書建 中國國民黨 男     1957-12-20  64.9        60
#>  4 連江縣第1選舉區 林明揚 中國國民黨 男     1963-10-31  59.1        50
#>  5 連江縣第1選舉區 楊清宇 中國國民黨 男     1974-06-16  48.4        40
#>  6 連江縣第2選舉區 周瑞國 中國國民黨 男     1969-04-05  53.6        50
#>  7 連江縣第2選舉區 陳如嵐 中國國民黨 男     1970-12-01  52.0        50
#>  8 連江縣第2選舉區 陳玉發 中國國民黨 男     1960-11-05  62.1        60
#>  9 連江縣第3選舉區 陳貽斌 中國國民黨 男     1969-11-16  53.0        50
#> 10 連江縣第4選舉區 張永江 中國國民黨 男     1960-06-23  62.4        60
#> # … with 1,008 more rows

從密度圖開始

參考前一篇繪製密度圖的文章，我們先回顧當時的程式碼。想看的數值變數映射到x軸，估計出的密度則呈現在y軸。

df_council_candidate_age %>% 
  ggplot(aes(x = age)) + 
  geom_density(fill = "red", color = "red", alpha = 0.3) +
  ggthemes::theme_clean() +
  theme(axis.text = element_text(family = "Noto Sans TC Regular", size = 20),
        axis.title = element_text(family = "Noto Sans TC Regular", size = 20),
        legend.position = "none")

回到山脊圖

重新回到山脊圖，我們改用 geom_density_ridges()繪圖。這張圖想呈現的是台灣不同政黨議員候選人的年齡分布，年齡映射到x軸，政黨的類型映射到y軸，至於估計出的密度，其實也是在y軸，這張圖有點像是ggplot2當中的facet_()系列函數。我們在ggplot()裡面代表美學（aesthetics）的aes()中放入年齡（age）作為x軸、政黨（party）作為y軸，因為想要讓曲線圈起的面積也有顏色，所以fill同樣填入政黨。

另外，跟以往不同，這次我們沒有使用 ggthemes::theme_clean()，而是利用theme_ridges()，並且在裡面加入grid = TRUE的參數，顯示格線方便讀者比較差異。

df_council_candidate_age %>%
  left_join(df_council_candidate_age %>% count(party)) %>%
  mutate(party = fct_reorder(as_factor(party), n)) %>%
  ggplot(aes(x = age, y = party, fill = party)) + 
  geom_density_ridges(scale = 2) +
  theme_ridges(font_size = 14, grid = TRUE) +
  theme(axis.text = element_text(family = "Noto Sans TC Regular", size = 20),
        axis.title = element_text(family = "Noto Sans TC Regular", size = 20),
        legend.position = "none")

在geom_density_ridges()裡面，我們加入scale這個參數，當它的值為1時，代表每一個類型彼此之間的最高點會碰到另一個的低點。下方的圖表正是scale=1的情形。

df_council_candidate_age %>%
  left_join(df_council_candidate_age %>% count(party)) %>%
  mutate(party = fct_reorder(as_factor(party), n)) %>%
  ggplot(aes(x = age, y = party, fill = party)) + 
  geom_density_ridges(scale = 1) +
  theme_ridges(font_size = 14, grid = TRUE) +
  theme(axis.text = element_text(family = "Noto Sans TC Regular", size = 20),
        axis.title = element_text(family = "Noto Sans TC Regular", size = 20),
        legend.position = "none")

這張圖表呈現了scale = 4的情形，可以看到每個山脊彼此之間有高度疊合，也難怪有人稱山脊圖為「疊嶂圖」，就是取山與山之間彼此交疊、層巒疊嶂之意，別有一番韻味。

df_council_candidate_age %>%
  left_join(df_council_candidate_age %>% count(party)) %>%
  mutate(party = fct_reorder(as_factor(party), n)) %>%
  ggplot(aes(x = age, y = party, fill = party)) + 
  geom_density_ridges(scale = 4) +
  theme_ridges(font_size = 14, grid = TRUE) +
  theme(axis.text = element_text(family = "Noto Sans TC Regular", size = 20),
        axis.title = element_text(family = "Noto Sans TC Regular", size = 20),
        legend.position = "none")

調整其他參數

在 geom_density_ridges()當中，除了scale以外，還可以加上特定的線條，例如quantile_lines=T，可以呈現使用者指定的線條，我們先嘗試
quantiles = 4，它的意思就是加上四分位數的位置。

df_council_candidate_age %>%
  left_join(df_council_candidate_age %>% count(party)) %>%
  mutate(party = fct_reorder(as_factor(party), n)) %>%
  ggplot(aes(x = age, y = party, fill = party)) + 
  geom_density_ridges(scale = 2, quantile_lines = TRUE, quantiles = 4) +
  theme_ridges(font_size = 14, grid = TRUE) +
  theme(axis.text = element_text(family = "Noto Sans TC Regular", size = 20),
        axis.title = element_text(family = "Noto Sans TC Regular", size = 20),
        legend.position = "none")

接著嘗試
quantiles = c(0.025, 0.975)，熟悉統計課的朋友一定對它不陌生，在做假設檢定的時候，若是選擇 alpha = 0.5 的雙尾檢定（two-sides test），就剛好是這兩個數值。

df_council_candidate_age %>%
  left_join(df_council_candidate_age %>% count(party)) %>%
  mutate(party = fct_reorder(as_factor(party), n)) %>%
  ggplot(aes(x = age, y = party, fill = party)) + 
  geom_density_ridges(scale = 2, quantile_lines = TRUE, quantiles =  c(0.025, 0.975)) +
  theme_ridges(font_size = 14, grid = TRUE) +
  theme(axis.text = element_text(family = "Noto Sans TC Regular", size = 20),
        axis.title = element_text(family = "Noto Sans TC Regular", size = 20),
        legend.position = "none")

另外，還可以原汁原味的呈現資料的原始分布，底下這段程式碼的來源是官方的介紹，有興趣的人可以參考，現在我們是採用類似條碼的呈現方式，也有點點的選擇。

df_council_candidate_age %>%
  left_join(df_council_candidate_age %>% count(party)) %>%
  mutate(party = fct_reorder(as_factor(party), n)) %>%
  ggplot(aes(x = age, y = party, fill = party)) + 
  geom_density_ridges(
    jittered_points = TRUE,
    position = position_points_jitter(width = 0.05, height = 0),
    point_shape = '|', point_size = 3, point_alpha = 1, alpha = 0.7,
  ) +
  theme_ridges(font_size = 14, grid = TRUE) +
  theme(axis.text = element_text(family = "Noto Sans TC Regular", size = 20),
        axis.title = element_text(family = "Noto Sans TC Regular", size = 20),
        legend.position = "none")

如果調整stat這個參數，也能用類似直方圖的方式呈現，只要修改bins的值，就能有不錯的效果。

df_council_candidate_age %>%
  left_join(df_council_candidate_age %>% count(party)) %>%
  mutate(party = fct_reorder(as_factor(party), n)) %>%
  ggplot(aes(x = age, y = party, fill = party)) + 
  geom_density_ridges(stat = "binline", bins = 10, scale = 2, draw_baseline = TRUE) +
  theme_ridges(font_size = 14, grid = TRUE) +
  theme(axis.text = element_text(family = "Noto Sans TC Regular", size = 20),
        axis.title = element_text(family = "Noto Sans TC Regular", size = 20),
        legend.position = "none")

補上細節

最後我們回到原先的圖表，並且用scale_x_continuous()以及scale_fill_manual()處理座標軸以及加上顏色，這樣就完成囉！

color_party = c("中國國民黨"="#000095","民主進步黨"="#1B9431","台灣民眾黨"="white","時代力量"="#F9BE01","台灣基進"="#A73f24","小民參政歐巴桑聯盟"="#BB8E19","新黨"="#FFDB00")

df_council_candidate_age %>%
  left_join(df_council_candidate_age %>% count(party)) %>%
  mutate(party = fct_reorder(as_factor(party), n)) %>%
  ggplot(aes(x = age, y = party, fill = party)) + 
  geom_density_ridges(scale = 2, quantile_lines = TRUE, quantiles = 4) +
  scale_x_continuous(breaks = seq(0,90,10),
                     expand = c(0, 0), limits = c(0,90)) +
  scale_fill_manual(values=color_party) +
  theme_ridges(font_size = 14, grid = TRUE) +
  theme(axis.text = element_text(family = "Noto Sans TC Regular", size = 20),
        axis.title = element_text(family = "Noto Sans TC Regular", size = 20),
        legend.position = "none")

小結

在這篇文章中，我們嘗試畫了一張山脊圖（ridgeline plot），從上次的密度圖開始，我們討論了山脊圖裡面的不同參數，包含高度、輔助線、呈現原始資料與否、改成直方圖等等，最後補上顏色以及調整座標軸。希望你喜歡這篇文章，也能夠增添對於 ggplot2 的認識，並且學到東西。

Post Views: 444

Dennis Tseng

3 Comments

gate io
2023-05-25 at 10:59 // Reply

I have read your article carefully and I agree with you very much. This has provided a great help for my thesis writing, and I will seriously improve it.
bina
2023-05-31 at 21:55 // Reply

Thanks for shening. I read many of your blog posts, cool, your blog is very good.
sig
2023-06-02 at 04:55 // Reply

Your enticle helped me a lot, is there any more related content? Thanks!