```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# sigminer.prediction
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
Mutational signatures represent mutational processes occured in cancer evolution, thus are stable and genetic resources for subtyping. This tool provides functions for training neutral network models to predict the subtype a sample belongs to based on 'keras' and ['sigminer'](https://github.com/ShixiangWang/sigminer) packages.
> This is part of **sigminer** project.
## Installation
You can install the **sigminer.prediction** from **GitHub** with::
``` r
# install.packages("remotes")
remotes::install_github("ShixiangWang/sigminer.prediction")
```
Keras package and library are required.
```r
install.packages("keras")
keras::install_keras()
```
## Usage
```{r}
library(sigminer.prediction)
```
Load data from our group study.
```{r}
load(system.file("extdata", "wang2020-input.RData",
package = "sigminer.prediction", mustWork = TRUE
))
```
Prepare data.
```{r}
dat_list See `?modeling_and_fitting` for more.
Plot modeling history.
```{r}
res$history[[1]] %>% plot()
```
Load the model and use it to predict.
```{r}
model % predict_classes(dat_list$x_train[1, , drop = FALSE])
model %>% predict_proba(dat_list$x_train[1, , drop = FALSE])
```
If you input wrong data shape, it will return error and remind you the correct shape.
```{r, error=TRUE}
# Use a 9 numbers input
model %>% predict_classes(dat_list$x_train[1, 1:9, drop = FALSE])
```
For constructing a batch of models, see `?batch_modeling_and_fitting`.
## Trained models for prostate cancer
In our prostate cancer study, we trained 3 models for different datasets for different clinical applification. Each model is selected as the best model by hand from parameter combination matrix (576 models) according to comprehensive consideration of accuracy in test dataset, average accuracy in all datasets and number of parameters used:
```{r}
mat We randomly selected 80% of total samples for training and 20% of total samples for testing the performance. We trained 50 epochs with batch size 16. At each epoch, 20% of trained samples were randomly selected as the validation dataset.
```{r, echo=FALSE, fig.align="center", fig.cap="Performance of 3 selected Keras models at the last (generated from 20200409)", out.width='50%'}
knitr::include_graphics("man/figures/pc_model_pf.png")
```
### Usage of trained model
List information for available models.
```{r}
list_trained_models()
```
Get the corresponding model by passing a subset data to `load_trained_model()`:
```{r}
md_all %
head(1) %>%
load_trained_model()
md_all
```
> When the input have multiple rows, it will return a `list` of models.
```{r}
md_all %>% predict_classes(dat_list$x_train[1, , drop = FALSE])
```
## Citation
-----
***Copy number signature analyses in prostate cancer reveal distinct
etiologies and clinical outcomes, under submission***
-----