UniPath

Overview

UniPath provides robust statistical methods to represent every single cell using pathway and gene-set enrichment scores. It can be used with both single cell RNA-seq and single cell ATAC-seq profile with scalability for atlas scale data-sets. UniPath comes with several features like pseudo-temporal ordering using pathway scores and unconventional way of enumerating differences between two cell populations.

FAQs

1. Can I use raw read count data?
Ans. UniPath works with FPKM/TPM/RPKM and UMI counts expression data. But we have a function count data into FPKM.

2. How to process big data?
Ans. User can split dataset column-wise and use chunks of data as input for UniPath. After parallel processing user can combine entire if required for downstream analysis.

3. Should I use raw p-values or log transformed p-values?
Ans. It depends on users chocice whether to consider raw p-values or log transformed p-values for downstream analysis. But for differential coenrichment and differential analysis, one should use raw p-value matrix since data is undergoing log transformation automaically.

4. What is the file format used for input single cell gene expression matrix?
Ans. For tutorial purpose in-built data is provided in data folder of UniPath in .RData format but user can read files having genes in the rows and samples/cells in columns. File format could be .txt or .csv.

5. What is the input file format of genomic coordinates or peak list for computing nearest genes for scATAC-seq genomic sites?
Ans. For computing the nearest gene to every genomic site of scATAC-seq profile, tab separated peak list is used as an input without a header.

6. What is the file format used for single cell ATAC-seq expression profiles?
Ans. .csv file format should be used where genomic coordinates are in the rows and samples/cells should be in the columns. For user convinience example dataset is provided with the package.

7. How does differential pathways co-enrichment analysis help?
Ans. Differential coenrichment pathway analysis is useful in studying regulatory mechanisms involved in controlling behaviour of cells. Therefore, helps user in studying differences in two cell populations.

8. How can co-enrichment/co-occurrence analysis between gene sets is useful?
Ans. Co-enrichment/co-occurrence between gene sets is useful in linking propensity of a cell to a disease. We have included a function which takes pathway scores tansformed from a null model and adjusted raw pathway scores for a particular cell type as an input for which pathway co-occurrences needs to be calcutaed. It gives significance (p-values) and correlation among the pair of pathways. For example, we calculated pathway co-occurrence in pancreatic beta cells and we found high correlation between type 1 diabetes mellitus and insulin synthesis and processing pathway.

Therefore, it provide users with an option to find pathways associated with diseases.

9. Should imputed or non imputed data to be used for scATAC-seq?
Ans. It depends on users choice but imputation of scATAC-seq data can improve the quality of scATAC-seq data. McImpute, scImpute, DrImpute in R can be used for imputation purpose. In UniPath, we have used DrImpute for imputation, but it is a third party software and addition facility in UniPath package. Imputation using DrImpute is a time consuming task and its not a responsibility of UniPath.

10. What if pseudo temporally ordered tree is showing randomization when plotted multiple times?
Ans. To obtain same psuedo temporally ordered tree i.e. same placement of cells every time, user has to use set.seed.

11. How to decide upon the number of clusters/classes and K nearest neighbour for pseudo temporal ordering?
Ans. Number of classes is defined based on types of cells in the data. And for K nearest neighbor, default setting is 5 but user can change accordingly.