Step 1: compute a multiscale score measure for each cell of its k-nearest-neighborhood for multiple values of k. Step 2: train a logistic regression classifier based on the multiscale score measure and retain cells that may reside in DA regions.

getDAcells(X, cell.labels, labels.1, labels.2, k.vector = NULL,
  save.knn = F, alpha = 0, k.folds = 10, n.runs = 5, n.rand = 2,
  pred.thres = NULL, do.plot = T, plot.embedding = NULL,
  size = 0.5)

Arguments

X

size N-by-p matrix, input merged dataset of interest after dimension reduction.

cell.labels

size N character vector, labels for each input cell

labels.1

character vector, label name(s) that represent condition 1

labels.2

character vector, label name(s) that represent condition 2

k.vector

vector, k values to create the score vector

save.knn

a logical value to indicate whether to save computed kNN result, default False

alpha

numeric, elasticnet mixing parameter passed to glmnet(), default 0 (Ridge)

k.folds

integer, number of data splits used in the neural network, default 10

n.runs

integer, number of times to run the neural network to get the predictions, default 5

n.rand

integer, number of random permutations to run, default 2

pred.thres

length-2 vector, top and bottom threshold on DA measure, default NULL, select significant DA cells based on permutation

do.plot

a logical value to indicate whether to return ggplot objects showing the results, default True

plot.embedding

size N-by-2 matrix, 2D embedding for the cells

size

cell size to use in the plot, default 0.5

Value

a list of results

da.ratio

score vector for each cell

da.pred

(mean) prediction from the logistic regression

da.up

index for DA cells more abundant in condition of labels.2

da.down

index for DA cells more abundant in condition of labels.1

pred.plot

ggplot object showing the predictions of logistic regression on plot.embedding

da.cells.plot

ggplot object highlighting cells of da.cell.idx on plot.embedding