# 写一个简单的 R 包：DTII v0.1.0 的开发过程

前段时间参加了字节和 Intel 办的 Bio-OS 开源开放大赛（任务挑战赛），初赛后写了一个小 R 包 DTII，用于从 Open Targets Platform GraphQL API 查询药物 - 靶标 - 适应症相互作用。以此为例介绍一下 R 包的开发流程。

# 准备环境

R 4.4.0
Rstudio
devtools: 用于简化包的开发工作流。
install.packages("devtools")
roxygen2: 用于生成文档。
install.packages("roxygen2")

安装完成后，使用命令 devtools::has_devel() 来进行检查

理论上会输出 “Your system is ready to build packages! ”

# 起一个名字

在命名 R 包时，CRAN 和 R 社区通常建议遵循一些基本的命名规则：

# 字符限制

名称长度适中：一般包的名称不宜过长，通常在 2-20 个字符之间。
只能使用字母和数字：包名称应由字母（A-Z，a-z）和数字（0-9）组成。
避免使用特殊字符：包名称不能包含下划线 ( _ )、连字符 ( - ) 或其他特殊符号。
不区分大小写：R 包的名称在 CRAN 中不区分大小写。也就是说， mypackage 和 MyPackage 被认为是相同的名称。
推荐小写：虽然不强制，但建议使用 全小写字母，便于记忆和输入。例如， dplyr 、 ggplot2 等。

# 避免保留字和常用函数名称

避免使用 R 的保留字：如 if 、 else 、 function 等，以防冲突。
避免与常用包的名称冲突：如 stats 、 data 等，容易引起混淆。

# 有意义和表达性

描述包的功能或用途：名称应尽量传达包的核心功能或主题。比如 stringr 表示字符串处理、 forecast 表示时间序列预测等。
不要过度通用：避免使用过于笼统的词汇，如 data 、 tools 等，尽量有具体意义。

# 避免商标和品牌名

避免使用商标、品牌名称或受保护的词语，确保包名不会侵犯任何版权或商标权。

# 如果需要多个单词，考虑拼接或数字结尾

多词拼接：可以将多个单词直接拼接，如 data.table 。
使用数字后缀：如果包有更新版本或表示一个概念的改进版，可以在包名后使用数字，如 ggplot2 。

# 遵循 CRAN 提交要求

CRAN 对包名有一些技术性要求，确保包名称在提交时满足这些规范，并且包名要唯一（不能与现有包重名）。

# 初始化

设置目录，并进行 R 包的初始化

	library(devtools)
	setwd("D:/Study/Project/GDG/BioOS/2024")
	create_package("DTII")

执行完以上代码，Rstudio 会自动打开一个 R Project

# 编写函数

使用 use_r() 函数来创建一个新的函数，例如我们来创建一个 search() 函数

use_r("search")

此函数用于根据提供的关键字使用 Open Targets GraphQL API 来搜索实体

由于 API 的限制，我们最多只能获取前 10,000 个结果

	search <- function(keywords = NULL, size = 10000) {

	# Check if keyword is empty
	if (is.null(keywords)) {
	stop("Please provide a keyword to query.")
	}

	# If the size entered by the user exceeds 10000, force it to be set to 10000
	if (size > 10000) {
	size <- 10000
	message("Size exceeds the limit of the Open Targets API and has been automatically set to 10000.")
	}

	# GraphQL query string
	query_string <- '
	query searchEntities($keywords: String!, $size: Int!) {
	search(queryString: $keywords, page: {index: 0, size: $size}) {
	aggregations {
	total
	entities {
	name
	total
	categories {
	name
	total
	}
	}
	}
	hits {
	id
	entity
	name
	}
	total
	}
	}
	'

	# variables
	variables <- list(keywords = keywords, size = size)

	# base URL
	base_url <- "https://api.platform.opentargets.org/api/v4/graphql"

	# Send request
	tryCatch({
	response <- POST(
	url = base_url,
	body = list(query = query_string, variables = variables),
	encode = "json"
	)

	# Check response status code
	if (http_status(response)$category != "Success") {
	stop("Request failed: ", http_status(response)$message)
	}

	# Parse JSON response
	api_response <- fromJSON(content(response, "text"))

	# Get the total number of results
	total_hits <- api_response[["data"]][["search"]][["total"]]

	# If the total number of results exceeds 10,000, prompt the user
	if (total_hits > 10000) {
	message(sprintf("%d results found. Due to API limitations, only the first 10,000 are shown.", total_hits))
	}

	# Number of entities of each type
	entities_df <- api_response[["data"]][["search"]][["aggregations"]][["entities"]]

	# Create an empty list to store the categories column in the data.frame
	results_list <- list()

	# Traverse api_response, store each data.frame in categories into a list, and name it with the value of the name column
	for (i in seq_along(entities_df$name)) {
	category_name <- entities_df$name[i] # Get the value of the name column
	categories_df <- entities_df$categories[[i]] # Get the data.frame of categories column

	# Save data.frame into a list and name it the value of the name column
	results_list[[category_name]] <- categories_df
	}

	# hit list
	hits_df <- api_response[["data"]][["search"]][["hits"]]
	results_list$hits <- hits_df

	# Returns the parsed data list
	return(results_list)

	}, error = function(e) {
	# Capture request exceptions and return error information
	message <- paste("Request failed for keywords", keywords, ":", e$message)
	message(message)
	return(list(error = e$message))

	}, warning = function(w) {
	# Catching parsing warnings
	message <- paste("Warning occurred:", w$message)
	message(message)
	return(list(error = w$message))
	})
	}

函数中的注释原本使用的是中文，但我在写这篇教程的时候发现下一步的 load_all() 在中文注释状态下竟然会报错（这个现象在开发时并未出现）

# 函数测试

编写完 search() 函数后，我们使用 load_all() 使其可以用于测试

load_all()

load_all() 会使用默认参数运行一次函数，因此对于 search() 函数，即会运行 search(keywords = NULL, size = 10000) ，但是由于我的编写逻辑为检测到空关键词就终止并提示用户输入，因此 load_all() 的结果如下：

	> load_all()
	ℹ Loading DTII
	错误于search(): Please provide a keyword to query.
	错误于search(): Please provide a keyword to query.

这个错误其实~~是一个正确的错误~~并不是真正的错误，我们在测试时可以去掉这部分的逻辑并给一个正常的 keywords 变量

load_all() 完成后，即可自定义变量进行测试

result <- search("lung cancer")

正常情况下， result 是一个列表，存储了 4 个 data.frame：

target：lung cancer 相关靶标的分类与数量
disease：lung cancer 相关疾病的分类与数量
drug：lung cancer 相关药物的分类与数量
hits：lung cancer 在数据库中的所有相关实体的 ID、类型以及名称

如果运行结果没有问题，那么这个函数基本上是成功的

# 函数注释

将光标定位到函数内部，在 Rstudio 的顶栏 Code -> Insert Roxygen Skeleton 可以插入注释模板

默认的模板如下

	#' Title
	#'
	#' @param x
	#' @param y
	#' @return
	#' @export
	#'
	#' @examples

以上的函数注释遵循 Roxygen2 的语法规范。Roxygen2 是一种方便的工具，可以将注释自动生成 .Rd 文档文件，这些文件将最终被包含在 R 包中，并在帮助系统中显示。

以下是 R 包函数注释的基本规则和关键部分：

# Roxygen2 注释格式

使用 #' 开头的注释行。
注释内容应放在每个函数的前面。
每一行注释都要以 #' 开始，否则不会被识别为 Roxygen2 格式。

# 注释的关键部分

# Title - 标题

使用 @title 标签为函数提供一个简明的标题。
应尽量简短而清晰地描述函数的主要作用。

#' @title Summarize Data

# Description - 描述

使用 @description 标签提供函数的详细描述。
可以多行书写，尽量准确说明函数的作用和返回内容。

	#' @description This function summarizes the given dataset by calculating
	#' mean, median, and standard deviation for each numeric column.

PS：其实不使用 @title 和 description ，直接写标题和描述也可以

# Parameters - 参数

每个参数用 @param 标签描述。
说明每个参数的名称、类型以及用法，通常为单行描述。

	#' @param data A data frame containing the data to be summarized.
	#' @param na.rm Logical, whether to remove missing values.

# Return - 返回值

使用 @return 标签描述函数的返回值。
简明描述返回内容的类型及其结构。

#' @return A data frame summarizing the numeric columns.

# Examples - 示例

使用 @examples 标签提供代码示例。
例子应该能够直接运行，并展示函数的典型用法。

	#' @examples
	#' # Summarize the mtcars dataset
	#' summarize_data(mtcars)
	#' # Remove missing values
	#' summarize_data(mtcars, na.rm = TRUE)

但对于一些可能不适合直接运行的示例代码，我们也可以使用 \\dontrun{} 标签来标记，例如

	#' @examples
	#' \dontrun{
	#' # Example usage:
	#' result <- search("cancer")
	#'
	#' # Query with a specific size:
	#' result <- search("diabetes", size = 5000)
	#' }

# Other Common Tags

@export ：标记函数为导出函数，使其对用户可见。
@seealso ：添加链接到其他相关函数。

有时候我们并不需要把某一个参数的注释写的过于详细，因为其它地方有非常完备的说明，那么我们就可以直接进行引用，例如，以下示例引用了 base 包的 mean() 函数
#' @seealso \code{\link[base]{mean}}
@import 和 @importFrom ：说明依赖的其他包或函数。
@details ：提供额外的详细说明，通常是在 @description 中没有说明的内容。

# 示例

以下是一个完整的函数注释示例：

	#' Search for entities using the Open Targets GraphQL API
	#'
	#' This function queries the Open Targets Platform using GraphQL based on the provided
	#' keyword and returns entities related to the search term. The function supports
	#' limiting the number of returned results due to API limitations.
	#'
	#' @param keywords A string representing the search term to query the Open Targets Platform.
	#' @param size An integer representing the number of results to return (default is 10000).
	#' Maximum value allowed by the Open Targets API is 10000.
	#'
	#' @return A list containing the search results, including entity names and their associated categories.
	#' The list also contains search hits with detailed information about each hit. If the query fails,
	#' an error message is returned.
	#'
	#' @details
	#' The Open Targets API has a limit of 10000 results per query. If the total number of hits exceeds 10000,
	#' the function will only return the first 10000 results, and a message will be printed to inform the user.
	#' The results include entity names and categories, which are stored in a list, and a separate hits data frame.
	#'
	#' If the size parameter exceeds 10000, the function will automatically adjust it to 10000 and provide
	#' a message to notify the user of this adjustment.
	#'
	#' @examples
	#' \dontrun{
	#' # Example usage:
	#' result <- search("cancer")
	#'
	#' # Query with a specific size:
	#' result <- search("diabetes", size = 5000)
	#' }
	#'
	#' @import httr jsonlite
	#' @export

# 生成文档

在所有函数前添加注释后，使用以下命令生成帮助文档：

devtools::document()

这将自动创建或更新 man/ 文件夹中的 .Rd 文件，使函数文档在 R 包中生效。

# 添加许可证

有多种常见的许可证可以使用，我们可以借助 usethis 包提供的函数来自动生成许可证文件。

# 常用的许可证生成函数（ `usethis` 包）

GNU 通用公共许可证 v2.0 或 v3.0
- use_gpl3_license() ：生成 GNU 通用公共许可证第三版（GPL-3）的文件。
- use_gpl2_license() ：生成 GNU 通用公共许可证第二版（GPL-2）的文件。
Apache 许可证
- use_apache_license() ：生成 Apache License 2.0 的许可证文件。
MIT 许可证
- use_mit_license() ：生成 MIT 许可证文件，这是最常用的宽松开源许可证之一。
艺术家公共许可
- use_artistic_license_2() ：生成 Artistic License 2.0 文件。
CC BY 许可证
- use_ccby_license() ：生成知识共享署名（Creative Commons Attribution, CC BY）许可证文件。

# 示例

使用 MIT 许可证，并将许可证信息写入 DESCRIPTION 文件：

	# 使用 MIT 许可证
	usethis::use_mit_license("Your Name")

# 注意

许可证的选择对代码的使用和传播有重要影响。一般建议：

如果希望代码自由使用、修改并再次发布，选择 MIT 或 Apache。
如果希望代码开源且传播时保持同样的许可证要求，选择 GPL-3。

# DESCRIPTION 文件

DESCRIPTION 文件是 R 包的元数据文件，包含关于包的信息，如包名、版本、作者、依赖项等。这个文件对于 CRAN 审核以及用户理解包的基本信息非常重要。

它的基本格式是 Field: Value ，每一行定义一个字段。以下是撰写 DESCRIPTION 文件的关键字段及其内容介绍。

# 常见字段

# `Package`

包名，必须是唯一的、易于记忆的标识符。
应符合 CRAN 的命名规范（仅允许字母、数字和句点，不允许空格）。
Package: DTII

# `Title`

简短标题，描述包的功能或用途，通常不超过 65 个字符。

必须以大写字母开头，结尾不要加句号。

Title: Drug-Target-Indication Interaction Query Package

# `Version`

版本号，用于标识包的不同发布版本。
通常采用三段式格式： Major.Minor.Patch （例如， 1.0.0 ）。
Version: 1.0.0
非正式发布的 R 包，版本号可以是零点几的版本，例如 0.1.0

# `Authors@R`

作者和维护者信息。

Authors@R 使用 R 代码格式来定义多个作者及其角色，推荐使用这种格式。

Authors@R: c(person(given = "First", family = "Last", email = "first.last@example.com", role = c("aut", "cre")))

常见角色代码：
- "aut" ：作者
- "cre" ：维护者（必须指定一个维护者）

# `Description`

详细描述，介绍包的功能和作用。

Description: The DTII package provides functions to query interactions between drugs, targets, and diseases using the Open Targets Platform GraphQL API.

    It supports searching for drug-target-indication interactions and retrieving detailed information on known drugs and their related diseases and targets.

    Functions include 'search' for querying the Open Targets database by keyword, and 'get_interactions' for querying specific drugs, targets, or diseases by their unique identifiers.

# `License`

许可证，声明包的使用和分发权限。
在上一步中，如果使用了 MIT 许可证，那么该字段已经被自动填充：
License: MIT + file LICENSE

# `Depends`

包依赖，声明包运行所需的最低 R 版本及依赖的其他 R 包。
通常用于指定 R 版本，其他包依赖一般放在 Imports 字段中。
Depends: R (>= 3.5.0)

# `Imports`

导入依赖，列出包运行时所需的其他包。
Imports 中的包在加载时不会自动附加到用户的命名空间，需要使用 :: 调用（如 dplyr::filter ）。
Imports: httr, jsonlite

# `Suggests`

建议依赖，列出包的可选依赖项。
这些包在运行某些特定功能或示例代码时才需要，例如用于测试和文档生成的工具。

# `LazyData`

延迟加载数据，一般设为 true ，表示在使用包自带的数据集时可以延迟加载。
LazyData: true

# 完整示例

以下是本 R 包的 DESCRIPTION 文件示例：

	Package: DTII
	Title: Drug-Target-Indication Interaction Query Package
	Version: 0.1.0
	Authors@R:
	person(given = "Min", family = "Li", email = "mli.bio@outlook.com", role = c("aut", "cre"),
	comment = c(ORCID = "0009-0003-9757-6822"))
	Description: The DTII package provides functions to query interactions between drugs, targets, and diseases using the Open Targets Platform GraphQL API.
	It supports searching for drug-target-indication interactions and retrieving detailed information on known drugs and their related diseases and targets.
	Functions include 'search' for querying the Open Targets database by keyword, and 'get_interactions' for querying specific drugs, targets, or diseases by their unique identifiers.
	Depends: R (>= 3.5.0)
	Imports:
	httr,
	jsonlite
	License: MIT + file LICENSE
	Roxygen: list(markdown = TRUE)
	RoxygenNote: 7.3.2

# 封装 R 包

使用 devtools::build() 来封装 R 包

# 上传到 Github

我们可能希望自己的包被别人使用，或者是留个云存档。

但在 CRAN 上发布 R 包是一件非常麻烦的事情，因此可以发布到 Github。假设已经安装配置好了 git

在 GitHub 上创建一个新的代码仓库，例如我们取名叫 DTII
初始化 git 仓库，在 R 包的根目录执行
git init
提交到本地 git 仓库
git add .
git commit -m "DTII v0.1.0"

上传到 Github
在新建的 DTII 仓库中点击绿色 Code 按钮可以获得 SSH 链接

	# 关联本地仓库与 Github 仓库
	git remote add origin git@github.com:WhyLIM/DTII.git
	# 提交到 Github 仓库，仅第一次提交需要 -u 参数
	git push -u origin main

R develop

	git add .
	git commit -m "DTII v0.1.0"

# 写一个简单的 R 包：DTII v0.1.0 的开发过程

# 准备环境

# 起一个名字

# 字符限制

# 避免保留字和常用函数名称

# 有意义和表达性

# 避免商标和品牌名

# 如果需要多个单词，考虑拼接或数字结尾

# 遵循 CRAN 提交要求

# 初始化

# 编写函数

# 函数测试

# 函数注释

# Roxygen2 注释格式

# 注释的关键部分

# Title - 标题

# Description - 描述

# Parameters - 参数

# Return - 返回值

# Examples - 示例

# Other Common Tags

# 示例

# 生成文档

# 添加许可证

# 常用的许可证生成函数（ usethis 包）

# 示例

# 注意

# DESCRIPTION 文件

# 常见字段

# Package

# Title

# Version

# Authors@R

# Description

# License

# Depends

# Imports

# Suggests

# LazyData

# 完整示例

# 封装 R 包

# 上传到 Github

使用 Vue 3 + Element Plus 从头开始写一个数据库网站-06-前端与后端的交互

差异基因富集分析与 GSEA

# 常用的许可证生成函数（ `usethis` 包）

# `Package`

# `Title`

# `Version`

# `Authors@R`

# `Description`

# `License`

# `Depends`

# `Imports`

# `Suggests`

# `LazyData`