At present, all or the majority of the published databases classified cancer-associated genes based on cancer types. There are even fewer databases focused on collecting tumor metastasis mechanism-associated genes based on hallmarks of cancer/metastasis. In spite of intensive research, few published works address the problem of setting up a web-based resource on characterizing metastasis genes under the metastasis mechanism framework. Since tumor metastasis is a dynamic process and involving many cellular and molecular processes, those published databases classified metastasis genes based on either cancer types or hallmarks of cancer cannot provide information on the sequential relations as well as cellular and molecular mechanisms among different metastasis stages. In other words, there is no way such databases can provide a clear picture of the onset and progression of the tumor metastasis process. Therefore, TMMGdb provides a systematic approach for the biomedical community to gain an in-depth understanding of the dynamic process, molecular definitions and cellular processes involved in tumor metastasis. Compared with the state-of-art research work, TMMGdb is a novel, systematic and comprehensive resource.
2. Collection of tumor metastasis mechanism keywords
We utilized three methods to select tumor metastasis mechanism-associated “Level 1” and “Level 2” keywords. The three methods are: (a) keywords employed by literature using to build mathematical models of the cancer metastasis processes,(b) research/review papers addressing the metastasis process, and (c) hallmark gene set names (hallmarks of metastasis terms), extracted from The Molecular Signatures Database (MSigDB) (Liberzon et al., 2015). The “Level 1” keywords are relatively broad, whereas, “Level 2” keywords are more specific.
“Level 1” keywords – a total of four keywords: “primary detachment” OR "in situ metastasis", intravasation, motility OR migration OR extravasation and "distal metastasis" OR colonization OR proliferation.
The “Level 2” keywords comprises of 44 keywords, obtained from the MSigDB database, literature review and advice from two oncologists. The first 10 keywords (in alphabet order) are listed as follows:
“Level 2” keywords (the first 10 keywords): androgen, angiogenesis, apoptosis, autophagy, axonogenesis, cell adhesion, cell migration, chemokines, cholesterol, “chromosomal instability or genomic instability etc.
3. Input datasets
The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov/) is one of the most comprehensive cancer research resources in the world. Genomic Data Commons (GDC) program research project is supported by TCGA National Cancer Institute (NCI). Information on the mechanism of tumor metastasis can be obtained from Broad Institute TCGA-GDAC Firebrowse. The Firebrowse database (http://firebrowse.org/) provides more than 80K samples from more than 11,000 cancer patients, involving 38 cancer diseases.
4. Search interfaces
Firstly, the TMMGdb database provides three interfaces; ‘Browse’, ‘Search’, ‘DEG search’ and ‘Download’, for users to retrieve information about TMMGs. TMMGdb contains a wealth of annotations, including: PubMed ID, genetics, cancer types, cancer cell lines, miRNAs, pathways and tissue expression, mutations, drug resistant, PPIN, PubMed and JCR journal ranking information.
Secondly, every TMMG can be referenced back to the PubMed IDs. The papers PubMed ID) displayed on the webpage are sorted according to the records of the JCR 2019 edition. It is convenient for users to select and read according to the ranking of the paper.
Figure 1. PMID of the original references for the BCL2 gene were listed under the ‘Literature’ column.
Regulatory pathway, miRNA regulators, tissue expression, cancer type and cancer cell lines information.
Thirdly, TMMGdb provides comprehensive annotations, including: pathway, miRNA regulators, tissue expression, cancer type annotation and cancer cell lines information.
Figure 2. Under the ‘Browse’ result page, the KEGG, Reactome and GO pathway information for the metastasis genes are reported. In this example, the LEVEL 1 and 2 keywords are motility and angiogenesis, respectively, a total of 471 genes were returned.
Figure 3. TMMGdb provides biological pathway, upstream miRNA regulators, tissue expression annotations of the TMMG, BCL2. BCL2 expressed in many tissues and it was regulated by a set of miRNAs.
Figure 4. The cancer type information for the metastasis gene MMP2, recorded by TMMGdb, HCMDB and CMgene.
Figure 5. Cancer cell lines information can accessed through the following links: including (i) COSMIC Cell Line Project (https://cancer.sanger.ac.uk/cell_lines), (ii) Cancer Cell Line Encyclopedia depmap portal (https://depmap.org/) and (iii) Genomics of Drug Sensitivity in Cancer (https://www.cancerrxgene.org/celllines).
5. Genetic mutation and drug resistant annotations
Figure 6. The figure shows the mutation and drug resistant information for the metastasis mechanism-associated gene, BCL2. BCL2 is a cancer driver gene, recorded by COSMIC.
6. Transcription factor-mediated PPINs
Fifthly, TMMGdb provides transcription factor-mediated PPINs of all the TMMGs. In case the TMMG or its direct interacting partner is a TF, the activation or suppression regulatory information was provided by TRRUST.
(A)
(B)
Figure 7. Protein-protein interactions of the TMMG, BCL2 and TFs. (A) a partial list of the PPI partners of the BCL2 gene, and (B) visualization of the PPI partners of BCL2 using grid layout (a total of five different types of layouts are available for the user to select).
7. Differentially expressed gene analysis and survival analysis (HR)
Sixthly, TMMGdb provides differentially expressed genes (DEGs) results for 12 cancer cohorts. If the gene satisfies the Benjamini-Hochberg adjusted p-value less-than 0.05 and the absolute value of the logarithm of fold change (FC) greater than or equals to one (|Log2FC|>=1), we defined it as a DEG. DEGs associated with |Log2FC| >=1 and adj. p-value < 0.05 are potential biomarkers of tumor metastasis.
Figure 8. The results of the differential gene analysis (the log2FC and adjusted p-value data) and survival analysis (hazard ratio (HR)) ofof the BCL2 gene for 12 cancer cohorts.