---
title: "Suggestions for optimizing package dependencies"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Suggestions for optimizing package dependencies}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, echo = FALSE, message = FALSE}
library(pkgndep)
```


## For the strong parents

Heaviness analysis provides hints for reducing the complexity of package
dependencies, but how to optimize depends on the specific use of parent
packages in the corresponding package. First we need to be aware of that there
are scenarios when reduction of heavy parents could not be performed: 

1. When a package extends the functionality of a heavy parents, the parent must be its strong parent.
2. A heavy parent provides core functionality to a package.
3. S4 methods or S4 classes are imported from a parent.

Now let's assume the dependencies from heavy parents can be reduced. We give
the following suggestions for reducing them:

1. Re-implement the functions that were imported from heavy parents. Take
   package **mapStats** (version 2.4) as an example, an extremely heavy parent **Hmisc** can
   be observed where **Hmisc** has a heaviness of 49 which is almost 60% of
   the total number of required packages of **mapStats**. If moving **Hmisc**
   to “Suggests” of **mapStats**, the number of strong dependencies can be
   reduced from 83 to 34. [The dependency heaviness
   analysis](https://pkgndep.github.io/prefix_m/mapStats_dependency_report.html)
   shows there is only one function imported from **Hmisc**. A deep inspection
   into the source code of **mapStats** reveals that only a function
   `capitalize()` from **Hmisc** is imported to **mapStats**. `capitalize()`
   is a simple function that only capitalizes the first letter of a string.
   The 49 additional dependencies imported from **Hmisc** can be avoided by
   simply reimplementing a function `capitalize()` by developer’s own.

2. The previous scenario is actually not very common. More common cases are
   when heavy parents only provide analysis that are not frequently used in
   the package, which we call it as "secondary analysis". We suggest to move
   heavy parent packages that are less used to "Suggests" and they are only
   loaded when the corresponding functions are called. Dependency packages
   listed in "Suggests" by default are not mandatory to be installed, thus,
   there will be errors when the functions are called but these dependency
   packages are not installed yet. We can write simple code to test whether 
   such secondary packages are installed. If not, print messages to tell users
   to install them.

3. Another way for reducing the package dependencies is to directly copy code
   from heavy parents. This approach is of course NOT recommended from the aspect
   of software engineering, [but actually it is widely used in CRAN
   packages](https://doi.org/10.1109/IWSC.2015.7069885).

4. Try to separate a large package into several smaller packages which focus on more specific tasks.
   For example, the package **remotes** separated from the package **devtools** only focuses on 
   installing packages from remote repositories, where [**remotes** only has 4 strong dependencies](https://pkgndep.github.io/prefix_r/remotes_dependency_report.html), 
   but [**devtools** has 76](https://pkgndep.github.io/prefix_d/devtools_dependency_report.html).

## For the weak parents

Weak parent packages are those listed in "Suggested/Enhances" fields in the
DESCRIPTION file. The heaviness mainly measures the effects on strong parents
(those in "Depends/Imports") and we suggest moving heavy and strong parents to
"Suggested/Enhances" if they are not frequently used by users. As "suggested"
packages are not necessarily required for users, optimizing the dependency
complexity for suggested packages is less necessary.

But still, on the developer's side, it is still important to keep the
dependency complexity of "all parent packages" to a reasonable size. CRAN and
Bioconductor perform full checks on packages, which means they require all the
packages in "Depends", "Imports" and "Suggests" to be installed. A large
dependency (normally from suggested packages) more likely produces failures,
which prevents acceptance or updates on CRAN or on Bioconductor. In general, we
have the following suggestions for handling dependency complexity from suggested
packages.

1. If removing a heavy suggested package does not affect the functionality of
   the package, or in other words, this dependency package only "enhances" the
   package, then it can be put in the "Enhances" field of the DESCRIPTION
   file. Packages listed in "Enhances" will never be checked by `R CMD check`.
2. If a heavy suggested package is only used in examples and in vignettes.
   Developers may think about whether they really need the corresponding code to run
   during the building and the checking of the package. More complex
   vignettes can be served separately while not being shipped with the package.
   A nice example is [the Seurat package](https://satijalab.org/seurat/).







