最具影响力的数字化技术在线社区

168主编 发表于 2017-6-18 14:24:59

一文详解如何用 R 语言绘制热图

作为目前最常见的一种可视化手段,热图因其丰富的色彩变化和生动饱满的信息表达被广泛应用于各种大数据分析场景。同时,专用于大数据统计分析、绘图和可视化等场景的 R 语言,在可视化方面也提供了一系列功能强大、覆盖全面的函数库和工具包。因此,对相关从业者而言,用 R 语言绘制热图就成了一项最通用的必备技能。本文将以 R 语言为基础,详细介绍热图绘制中遇到的各种问题和注意事项。原文作者 taoyan,原载于作者个人博客。简介本文将绘制静态与交互式热图,需要使用到以下R包和函数:● heatmap():用于绘制简单热图的函数● heatmap.2():绘制增强热图的函数● d3heatmap:用于绘制交互式热图的R包● ComplexHeatmap:用于绘制、注释和排列复杂热图的R&bioconductor包(非常适用于基因组数据分析)数据准备使用R内置数据集 mtcarsdf <- as.matrix((scale(mtcars))) #归一化、矩阵化使用基本函数绘制简单简单热图主要是函数 heatmap(x, scale="row")● x: 数据矩阵● scale:表示不同方向,可选值有:row, columa, none● Default plotheatmap(df, scale = "none")http://p0.ifengimg.com/pmop/2017/0615/3B46C34AED5034A11406FCA53A0B180D6553F6C1_size73_w740_h529.jpegUse custom colorscol <- colorRampPalette(c("red", "white", "blue"))(256)heatmap(df, scale = "none", col=col)http://p0.ifengimg.com/pmop/2017/0615/C2075CB1D8AAC1F33BBBD40297AE8E65E4F48D38_size75_w740_h529.jpeg#Use RColorBrewer color palette nameslibrary(RColorBrewer)col <- colorRampPalette(brewer.pal(10, "RdYlBu"))(256)#自设置调色板dim(df)#查看行列数## 32 11heatmap(df, scale = "none", col=col, RowSideColors = rep(c("blue", "pink"), each=16),ColSideColors = c(rep("purple", 5), rep("orange", 6)))#参数RowSideColors和ColSideColors用于分别注释行和列颜色等,可help(heatmap)详情http://p0.ifengimg.com/pmop/2017/0615/2E4DDECF84EB7C41369F75330ECA797BB74FA948_size71_w740_h529.jpeg增强热图函数 heatmap.2()在热图绘制方面提供许多扩展,此函数包装在 gplots 包里。library(gplots)heatmap.2(df, scale = "none", col=bluered(100),trace = "none", density.info = "none")#还有其他参数可参考help(heatmap.2())http://p0.ifengimg.com/pmop/2017/0615/09407E97E77EB105A93AA6C9537BD188E6E45A9A_size76_w740_h529.jpeg交互式热图绘制d3heatmap 包可用于生成交互式热图绘制,可通过以下代码生成:if (!require("devtools"))install.packages("devtools")devtools::install_github("rstudio/d3heatmap")函数 d3heatmap() 用于创建交互式热图,有以下功能:● 将鼠标放在感兴趣热图单元格上以查看行列名称及相应值● 可选择区域进行缩放library(d3heatmap)d3heatmap(df, colors = "RdBu", k_row = 4, k_col = 2)k_row、k_col分别指定用于对行列中树形图分支进行着色所需组数。进一步信息可help(d3heatmap())获取。使用 dendextend 包增强热图软件包 dendextend 可以用于增强其他软件包的功能library(dendextend)# order for rowsRowv <- mtcars %>% scale %>% dist %>%hclust %>% as.dendrogram %>%set("branches_k_color", k = 3) %>%set("branches_lwd", 1.2) %>% ladderize# Order for columns#We must transpose the dataColv <- mtcars %>% scale %>% t %>% dist %>%hclust %>% as.dendrogram %>%set("branches_k_color", k = 2, value = c("orange", "blue")) %>% set("branches_lwd", 1.2) %>% ladderize#增强heatmap()函数heatmap(df, Rowv = Rowv, Colv = Colv, scale = "none")http://p0.ifengimg.com/pmop/2017/0615/FEB652F831ABAE3EF80E6F3BFA05C5B036113DA8_size78_w740_h529.jpeg#增强heatmap.2()函数heatmap.2(df, scale = "none", col = bluered(100), Rowv = Rowv, Colv = Colv, trace = "none", density.info = "none")http://p0.ifengimg.com/pmop/2017/0615/9739E2B334EAD829F93E7966DEF5EAB5249DDA11_size84_w740_h529.jpeg#增强交互式绘图函数d2heatmap()d3heatmap(scale(mtcars), colors = "RdBu", Rowv = Rowv, Colv = Colv)绘制复杂热图ComplexHeatmap 包是 bioconductor 包,用于绘制复杂热图,它提供了一个灵活的解决方案来安排和注释多个热图。它还允许可视化来自不同来源的不同数据之间的关联热图。可通过以下代码安装:if (!require("devtools")) install.packages("devtools")devtools::install_github("jokergoo/ComplexHeatmap")ComplexHeatmap 包的主要功能函数是 Heatmap(),格式为:Heatmap(matrix, col, name)● matrix:矩阵● col:颜色向量(离散色彩映射)或颜色映射函数(如果矩阵是连续数)● name:热图名称library(ComplexHeatmap)Heatmap(df, name = "mtcars")#自设置颜色library(circlize)Heatmap(df, name = "mtcars", col = colorRamp2(c(-2, 0, 2), c("green", "white", "red")))使用调色板Heatmap(df, name = "mtcars",col = colorRamp2(c(-2, 0, 2), brewer.pal(n=3, name="RdBu")))http://p0.ifengimg.com/pmop/2017/0615/C1FB860BB06B9735FD5BF191CB36106511B6CD34_size93_w740_h529.jpeg#自定义颜色mycol <- colorRamp2(c(-2, 0, 2), c("blue", "white", "red"))热图及行列标题设置Heatmap(df, name = "mtcars", col = mycol, column_title = "Column title", row_title ="Row title")http://p0.ifengimg.com/pmop/2017/0615/D8590E39990398505FBEF3458DD3A9304DD18E55_size100_w740_h529.jpeg注意,行标题的默认位置是“left”,列标题的默认是“top”。可以使用以下选项更改:● row_title_side:允许的值为“左”或“右”(例如:row_title_side =“right”)● column_title_side:允许的值为“top”或“bottom”(例如:column_title_side =“bottom”) 也可以使用以下选项修改字体和大小:● row_title_gp:用于绘制行文本的图形参数● column_title_gp:用于绘制列文本的图形参数Heatmap(df, name = "mtcars", col = mycol, column_title = "Column title",column_title_gp = gpar(fontsize = 14, fontface = "bold"),row_title = "Row title", row_title_gp = gpar(fontsize = 14, fontface = "bold"))http://p0.ifengimg.com/pmop/2017/0615/D8204587544E520FD534BCF5523B1D60D20567CD_size100_w740_h529.jpeg在上面的R代码中,fontface的可能值可以是整数或字符串:1 = plain,2 = bold,3 =斜体,4 =粗体斜体。如果是字符串,则有效值为:“plain”,“bold”,“italic”,“oblique”和“bold.italic”。显示行/列名称:● show_row_names:是否显示行名称。默认值为TRUE● show_column_names:是否显示列名称。默认值为TRUEHeatmap(df, name = "mtcars", show_row_names = FALSE)http://p0.ifengimg.com/pmop/2017/0615/8217641B2E4A7BBFA6189D3383599083A2F0FB61_size59_w740_h529.jpeg更改聚类外观默认情况下,行和列是包含在聚类里的。可以使用参数修改:● cluster_rows = FALSE。如果为TRUE,则在行上创建集群● cluster_columns = FALSE。如果为TRUE,则将列置于簇上# Inactivate cluster on rowsHeatmap(df, name = "mtcars", col = mycol, cluster_rows = FALSE)http://p0.ifengimg.com/pmop/2017/0615/5C3EE3B881E76CC4BBF938DE66807944DCBD00FF_size104_w740_h529.jpeg如果要更改列集群的高度或宽度,可以使用选项column_dend_height 和 row_dend_width:Heatmap(df, name = "mtcars", col = mycol, column_dend_height = unit(2, "cm"),row_dend_width = unit(2, "cm") )http://p0.ifengimg.com/pmop/2017/0615/D72CCE810EB00E98276B9B22ECFC2896E33D0038_size98_w740_h529.jpeg我们还可以利用 color_branches() 自定义树状图外观library(dendextend)row_dend = hclust(dist(df)) # row clusteringcol_dend = hclust(dist(t(df))) # column clusteringHeatmap(df, name = "mtcars", col = mycol, cluster_rows =color_branches(row_dend, k = 4), cluster_columns = color_branches(col_dend, k = 2))http://p0.ifengimg.com/pmop/2017/0615/278B1EEC03163F067F165B7D7907D52A43553A0C_size103_w740_h529.jpeg不同的聚类距离计算方式参数 clustering_distance_rows 和 clustering_distance_columns用于分别指定行和列聚类的度量标准,允许的值有“euclidean”, “maximum”, “manhattan”, “canberra”, “binary”, “minkowski”, “pearson”, “spearman”, “kendall”。Heatmap(df, name = "mtcars", clustering_distance_rows = "pearson",clustering_distance_columns = "pearson")http://p0.ifengimg.com/pmop/2017/0615/2460B808A41BD3AAC1FB5E587C377299B5C501A6_size100_w740_h529.jpeg#也可以自定义距离计算方式Heatmap(df, name = "mtcars", clustering_distance_rows = function(m) dist(m))Heatmap(df, name = "mtcars", clustering_distance_rows = function(x, y) 1 - cor(x, y))http://p0.ifengimg.com/pmop/2017/0615/19ED5E57F0C3EF871DCD1CB7BAC80BDE827BDFCD_size100_w740_h529.jpeg请注意,在上面的R代码中,通常为指定行聚类的度量的参数 clustering_distance_rows显示示例。建议对参数clustering_distance_columns(列聚类的度量标准)使用相同的度量标准。# Clustering metric functionrobust_dist = function(x, y) {qx = quantile(x, c(0.1, 0.9)) qy = quantile(y, c(0.1, 0.9)) l = x > qx & x < qx & y> qy & y < qy x = x y = y sqrt(sum((x - y)^2))}# HeatmapHeatmap(df, name = "mtcars", clustering_distance_rows = robust_dist,clustering_distance_columns = robust_dist,col = colorRamp2(c(-2, 0, 2), c("purple", "white", "orange")))http://p0.ifengimg.com/pmop/2017/0615/8378B7CA02FAF4B3D16E14DC9B8A5EFA5463EC1D_size99_w740_h529.jpeg聚类方法参数clustering_method_rows和clustering_method_columns可用于指定进行层次聚类的方法。允许的值是hclust()函数支持的值,包括“ward.D”,“ward.D2”,“single”,“complete”,“average”。Heatmap(df, name = "mtcars", clustering_method_rows = "ward.D",clustering_method_columns = "ward.D")http://p0.ifengimg.com/pmop/2017/0615/3AA7EA20FB986CECC6FE0748954D9A71DDA11268_size99_w740_h529.jpeg热图拆分有很多方法来拆分热图。一个解决方案是应用k-means使用参数km。在执行k-means时使用set.seed()函数很重要,这样可以在稍后精确地再现结果set.seed(1122)# split into 2 groupsHeatmap(df, name = "mtcars", col = mycol, k = 2)http://p0.ifengimg.com/pmop/2017/0615/5C5B74BDE21EE4668A307FE3F8C3D7537CFEC44A_size102_w740_h529.jpeg# split by a vector specifying row classes, 有点类似于ggplot2里的分面Heatmap(df, name = "mtcars", col = mycol, split = mtcars$cyl )http://p0.ifengimg.com/pmop/2017/0615/21E8E4FFE0CBA91E1F1061918CC23D899B0E62EA_size101_w740_h529.jpeg#split也可以是一个数据框,其中不同级别的组合拆分热图的行。# Split by combining multiple variablesHeatmap(df, name ="mtcars", col = mycol, split = data.frame(cyl = mtcars$cyl, am = mtcars$am))http://p0.ifengimg.com/pmop/2017/0615/8FEFDCE85FB3F0362B7BC933D5167F6B14E85A7A_size106_w740_h529.jpeg# Combine km and splitHeatmap(df, name ="mtcars", col = mycol, km = 2, split = mtcars$cyl)http://p0.ifengimg.com/pmop/2017/0615/820DA851A1F20CBF169B6F8DCCCD92FEA68A79C6_size106_w740_h529.jpeg#也可以自定义分割library("cluster")set.seed(1122)pa = pam(df, k = 3)Heatmap(df, name = "mtcars", col = mycol, split = paste0("pam",pa$clustering))http://p0.ifengimg.com/pmop/2017/0615/D792FE7E994950F87AD161E481F116D92751D664_size105_w740_h529.jpeg还可以将用户定义的树形图和分割相结合。在这种情况下,split可以指定为单个数字:row_dend = hclust(dist(df)) # row clusteringrow_dend = color_branches(row_dend, k = 4)Heatmap(df, name = "mtcars", col = mycol, cluster_rows = row_dend, split = 2)http://p0.ifengimg.com/pmop/2017/0615/A537C44DED3C1CF23FD8FC6D690BDDA01F44985E_size101_w740_h529.jpeg热图注释利用HeatmapAnnotation()对行或列注释。格式为: HeatmapAnnotation(df, name, col, show_legend)● df:带有列名的data.frame● name:热图标注的名称● col:映射到df中列的颜色列表# Transposedf <- t(df)# Heatmap of the transposed dataHeatmap(df, name ="mtcars", col = mycol)http://p0.ifengimg.com/pmop/2017/0615/14E43704E430A08B2B2DF9798535CD13BA53AE04_size110_w740_h529.jpeg# Annotation data frameannot_df <- data.frame(cyl = mtcars$cyl, am = mtcars$am, mpg = mtcars$mpg)# Define colors for each levels of qualitative variables# Define gradient color for continuous variable (mpg)col = list(cyl = c("4" = "green", "6" = "gray", "8" = "darkred"), am = c("0" = "yellow","1" = "orange"), mpg = colorRamp2(c(17, 25), c("lightblue", "purple")) )# Create the heatmap annotationha <- HeatmapAnnotation(annot_df, col = col)# Combine the heatmap and the annotationHeatmap(df, name = "mtcars", col = mycol, top_annotation = ha)http://p0.ifengimg.com/pmop/2017/0615/E493EE3E11013D2C283D327EB62A0E5898C43489_size118_w740_h529.jpeg#可以使用参数show_legend = FALSE来隐藏注释图例ha <- HeatmapAnnotation(annot_df, col = col, show_legend = FALSE)Heatmap(df, name = "mtcars", col = mycol, top_annotation = ha)http://p0.ifengimg.com/pmop/2017/0615/CAE9A52FDD442AB8E491E2247513165F33B5E908_size116_w740_h529.jpeg#注释名称可以使用下面的R代码添加
library("GetoptLong")# Combine Heatmap and annotationha <- HeatmapAnnotation(annot_df, col = col, show_legend = FALSE)Heatmap(df, name = "mtcars", col = mycol, top_annotation = ha)# Add annotation names on the rightfor(an in colnames(annot_df)) {seekViewport(qq("annotation_@{an}"))grid.text(an, unit(1, "npc") + unit(2, "mm"), 0.5, default.units = "npc", just = "left")}#要在左侧添加注释名称,请使用以下代码# Annotation names on the leftfor(an in colnames(annot_df)) { seekViewport(qq("annotation_@{an}")) grid.text(an,unit(1, "npc") - unit(2, "mm"), 0.5, default.units = "npc", just = "left")}http://p0.ifengimg.com/pmop/2017/0615/1543757E14EE81297A9DEDE8FF61BB6F7D4CE4B4_size120_w740_h529.jpeg复杂注释将热图与一些基本图形结合起来进行注释,利用anno_point(),anno_barplot(),anno_boxplot(),anno_density() 和 anno_histogram()。# Define some graphics to display the distribution of columns.hist = anno_histogram(df, gp = gpar(fill = "lightblue")).density = anno_density(df, type = "line", gp = gpar(col = "blue"))ha_mix_top = HeatmapAnnotation(hist = .hist, density = .density)# Define some graphics to display the distribution of rows.violin = anno_density(df, type = "violin", gp = gpar(fill = "lightblue"), which = "row").boxplot = anno_boxplot(df, which = "row")ha_mix_right = HeatmapAnnotation(violin = .violin, bxplt = .boxplot, which = "row",width = unit(4, "cm"))# Combine annotation with heatmapHeatmap(df, name = "mtcars", col = mycol, column_names_gp = gpar(fontsize = 8),top_annotation = ha_mix_top, top_annotation_height = unit(4, "cm")) + ha_mix_righthttp://p0.ifengimg.com/pmop/2017/0615/F731FA565915444820FC7DACC958417AD74187DA_size119_w740_h529.jpeg热图组合# Heatmap 1ht1 = Heatmap(df, name = "ht1", col = mycol, km = 2, column_names_gp = gpar(fontsize = 9))# Heatmap 2ht2 = Heatmap(df, name = "ht2", col = colorRamp2(c(-2, 0, 2), c("green", "white", "red")), column_names_gp = gpar(fontsize = 9))# Combine the two heatmapsht1 + ht2http://p0.ifengimg.com/pmop/2017/0615/F6DFB2E7E37F0C310882F56A98BD6ED3FE288B25_size127_w740_h529.jpeg可以使用选项width = unit(3,“cm”))来控制热图大小。注意,当组合多个热图时,第一个热图被视为主热图。剩余热图的一些设置根据主热图的设置自动调整。这些设置包括:删除行集群和标题,以及添加拆分等。draw(ht1 + ht2,# Titlesrow_title = "Two heatmaps, row title",row_title_gp = gpar(col = "red"),column_title = "Two heatmaps, column title",column_title_side = "bottom",# Gap between heatmapsgap = unit(0.5, "cm"))http://p0.ifengimg.com/pmop/2017/0615/8961AAC0FF3C55E994172F9C781425197749899B_size127_w740_h529.jpeg可以使用参数show_heatmap_legend = FALSE,show_annotation_legend = FALSE删除图例。基因表达矩阵在基因表达数据中,行代表基因,列是样品值。关于基因的更多信息可以在表达热图之后附加,例如基因长度和基因类型。expr = readRDS(paste0(system.file(package = "ComplexHeatmap"), "/extdata/gene_expression.rds"))mat = as.matrix(expr[, grep("cell", colnames(expr))])type = gsub("s\\d+_", "", colnames(mat))ha = HeatmapAnnotation(df = data.frame(type = type))Heatmap(mat, name = "expression", km = 5, top_annotation = ha, top_annotation_height = unit(4, "mm"),show_row_names = FALSE, show_column_names = FALSE) +Heatmap(expr$length, name = "length", width = unit(5, "mm"), col = colorRamp2(c(0, 100000), c("white", "orange"))) +Heatmap(expr$type, name = "type", width = unit(5, "mm")) +Heatmap(expr$chr, name = "chr", width = unit(5, "mm"), col = rand_color(length(unique(expr$chr))))http://p0.ifengimg.com/pmop/2017/0615/24E54F242772045AF49AFFF5CE23049A78C7BF69_size205_w740_h987.jpeg也可以可视化基因组变化和整合不同的分子水平(基因表达,DNA甲基化,…)可视化矩阵中列的分布使用函数densityHeatmap()。densityHeatmap(df)http://p0.ifengimg.com/pmop/2017/0615/C7B44AE5EAF71FB62D9CCB98ED6EE3FBA0352646_size126_w740_h529.jpeg8 InfossessionInfo()## R version 3.3.3 (2017-03-06)## Platform: x86_64-w64-mingw32/x64 (64-bit)## Running under: Windows 8.1 x64 (build 9600)#### locale:## LC_COLLATE=Chinese (Simplified)_China.936## LC_CTYPE=Chinese (Simplified)_China.936## LC_MONETARY=Chinese (Simplified)_China.936## LC_NUMERIC=C## LC_TIME=Chinese (Simplified)_China.936 #### attached base packages:## grid stats graphics grDevices utils datasets methods## base#### other attached packages:## GetoptLong_0.1.6 cluster_2.0.5 circlize_0.3.10## ComplexHeatmap_1.12.0 dendextend_1.4.0 d3heatmap_0.6.1.1## gplots_3.0.1 RColorBrewer_1.1-2#### loaded via a namespace (and not attached):## Rcpp_0.12.9 DEoptimR_1.0-8 plyr_1.8.4## viridis_0.3.4 class_7.3-14 prabclus_2.2-6## bitops_1.0-6 base64enc_0.1-3 tools_3.3.3## digest_0.6.12 mclust_5.2.2 jsonlite_1.3## evaluate_0.10 tibble_1.2 gtable_0.2.0## lattice_0.20-34 png_0.1-7 yaml_2.1.14## mvtnorm_1.0-6 gridExtra_2.2.1 trimcluster_0.1-2## stringr_1.2.0 knitr_1.15.1 GlobalOptions_0.0.11## htmlwidgets_0.8 gtools_3.5.0 caTools_1.17.1## fpc_2.1-10 diptest_0.75-7 nnet_7.3-12## stats4_3.3.3 rprojroot_1.2 robustbase_0.92-7## flexmix_2.3-13 rmarkdown_1.3.9002 gdata_2.17.0## kernlab_0.9-25 ggplot2_2.2.1 magrittr_1.5## whisker_0.3-2 backports_1.0.5 scales_0.4.1## htmltools_0.3.5 modeltools_0.2-21 MASS_7.3-45## assertthat_0.1 shape_1.4.2 colorspace_1.3-2## KernSmooth_2.23-15 stringi_1.1.2 lazyeval_0.2.0## munsell_0.4.3 rjson_0.2.15

页: [1]
查看完整版本: 一文详解如何用 R 语言绘制热图