N6-methyladenosine (m6A) is one of the most abundant modifications in RNA molecules, which functionally modulates mRNA metabolism and divergent cellular processes. Recent studies have revealed that m6A variants are closely related to the dysregulation in cellular processes, leading to serious diseases such as cancer. Functional variants, especially cancer mutations, can significantly alter the status of m6A, leading to the gain or loss of N6-methyladenosine. m6AVar is specifically designed to collect such functional variants, and aimed at providing potential help for revealing the functional roles of m6A variants. So far, 352,014 germline mutations from dbSNP and 62,227 somatic mutations from TCGA has been included in m6AVar. The experimental evidence of RBP-binding regions and miRNA-RNA interactions as well as splicing sites are also involved in m6AVar. In addition, to uncover the underlying relationship between m6A machinery and diseases, m6AVar establishes an integrated resources combining disease-associated data from GWAS and ClinVar database. Furthermore, multiple statistical diagrams and genome browser are also embedded in the web server for visualizing the analysis results. Currently, users can query or browse the following information from m6AVar:
m6A-associated genetic mutations (dbSNP)
m6A-associated cancer somatic mutations (TCGA)
RNA binding protein affected by m6A-associated variants
miRNA targeting and processing affected by m6A-associated variants
Splicing sites affected by m6A-associated variants
Disease related m6A-associated variants (GWAS and ClinVar)
We manually collected 7 miCLIP samples(Linder et al., 2015 and Moore et al., 2014) and 244 MeRIP-Seq samples from GEO database. All raw sequencing data were downloaded and mapped to human (version: hg19) or mouse (version: mm10) genome. For MeRIP-Seq data, we applied MACS2, MeTPeak and Meyer's method for peak calling. In order to ensure a high veracity of data, MSPC was used to combine results and construct consensus peaks from the above three methods.
We downloaded the germline mutations of human and mouse genome from dbSNP and the somatic mutations across 34 cancers from TCGA. The functional impact of genetic variants on m6A modification is evaluated based on the destruction of conventional DRACH motifs and sequence features. To identify m6A loss mutations, we extracted m6A sites from miCLIP samples and overlapped with all mutations to find the functional variants which destroyed the DRACH motifs. Besides, for those mutations that located in consensus peaks from MeRIP-Seq samples, we predicted the functional loss variants that potentially change m6A motifs. In addition, a genome-wide prediction was performed to obtain the potential functional variants based on Random Forest algorithm.
Mutations may destroy or create miRNA binding sites on RNA. We therefore downloaded all miRNA-RNA interaction regions from starBase2, then intersected it with m6A-associated variants to reveal potential impact on miRNA-mRNA interactions of functional variants.
It is reported that m6A modifications can regulate alternative splicing. We extracted 100 base pairs (bp) upstream form 5' splicing sites and 100bp downstream from 3' splicing sites at all canonical splicing sites (GT-AG), and excluded intron splicing sites with less than 100bp as well as pseudogenes. After that, all m6A-associated variants intersected with these regions to obtain those splicing events affected by m6A-associated variants.
GWAS tagSNPs were downloaded from GWAS catalog website, Johnson and O' Donnell, dbGAP and GAD database. Haploview was installed to perform LD analysis. Then, for each tagSNP, we use Haploview to obtain its LD mutations in different populations (The used populations include CHB, CEU, JPT, TSI). Moreover, we also collected ClinVar data to provide further information about the relationship between m6A-associated alterations and disease.
In order to predict m6A-associated variants, we utilize m6AFinder to predict and annotate the functional m6A sites. Based on the recently published miCLIP data, m6AFinder have two accurate prediction models for human and mouse using Random Forest algorithm. Combining all variants in the prediction model, we separately predicted the m6A status in reference sample and mutant sample. We defined an m6A-gain mutation as the m6A modification was occurred in mutated RNA sequence but not in reference sequence. While an m6A-loss alteration would be defined as the opposite case.
For convenience, we provide a user-friendly web interface for m6AVar database. Users can browse or search the data at different levels.
1) Browse by different species in dbSNP
2) Browse by different tumor in TCGA
1） Quick search for RsID of SNP
2） Quick search for genes
3） Quick search for an interested region
4） Quick search for an interested disease