Expression of evolutionarily novel genes in tumors


Evolutionarily novel genes are those novel genes which originate in the germ cells
of multicellular organisms and thus can participate in evolution. Genes that originate
in somatic cells (e.g. in tumor cells) and cannot be passed to the progeny organisms
are not considered as evolutionarily novel.

Novel genes can originate from pre-existing genes or de novo. The theory of the origin of novel genes is well developed and the mechanisms of
the origin of evolutionarily novel genes are well understood and described 8], 45], 58], 70], 76], 77], 110], 131], 132], 189], 194], 217]. But there is a question in which cells of the evolving multicellular organisms genes
determining the evolutionary innovations and morphological novelties are expressed.

There is a general correlation between the increase in the gene number in the genomes
of evolving organisms, from one side, and the increase in the number of cell types,
the origin of other innovations and the overall complexity, on the other 34], 91], 215]. The question is how such adequate correlation was realized at the multicellular
level. An adequate increase in cell number that accompanied the process of the origin
of novel genes is hard to imagine. More likely, some autonomous cellular proliferative
processes were recruited to provide the space for the expression of new genes.

In my previous publications 88]–90] and in my recently published book “Evolution by Tumor Neofunctionalization” 91] I suggested that heritable tumors – benign tumors or tumors at the early stages of
progression – may provide extra cell masses for expression of evolutionary novel genes
and for emergence of evolutionary innovations and morphological novelties. The non-trivial
prediction of this hypothesis is that we may find the expression of evolutionarily
novel genes in tumors.

Experiments in this direction performed in my lab since early 2000s have indeed demonstrated
the specific or predominant expression of many evolutionarily young or novel genes
in tumors. These data will be discussed in the first part of this review.

I also found in the literature descriptions of many genes with similar dual specificity
– tumor specifically expressed, evolutionary novel. Such genes with dual specificity
were not purposefully searched for by the authors and the connection of tumors and
evolution was not emphasized. Rather, the data on evolutionary novelty and specificity
of expression of certain genes were the result of descriptive experiments and often
can be found among other described features of the studied genes. Similar information
may be found in the results of genome-wide studies. Tumor specificity of expression
of genes originated by gene duplication, from retrotransposons and endogenous retroviruses,
by exon shuffling or de novo will be discussed in the second part of this review.

The purposeful experimental search for evolutionarily novel genes with tumor-specific
expression

To study experimentally the prediction concerning the expression of evolutionarily
young or novel genes in tumors we used two complementary approaches. One was to study
the evolutionary novelty of genes/sequences with proven tumor specificity of expression.
The other was to study tumor specificity of expression of genes/sequences with proven
evolutionary novelty. Both approaches found out genes/sequences with dual specificity,
i.e. tumor-specifically or tumor-predominantly expressed and evolutionarily young
or novel.

The evolutionary novelty of tumor-specifically expressed sequences

To find the sequences which are expressed in tumors but not in normal tissues the
global comparison of cDNA sequences from all available tumor-derived libraries with
cDNA sequences from all available normal tissue-derived libraries was performed. The
normal EST set was subtracted in silico from the tumorous EST set 11].

The results showed that, in accordance with my prediction, tumors indeed express hundreds
of sequences that are not expressed in normal tissues. About half of discovered tumor-specific
sequences lack long reading frames (i.e., may be referred to non-coding RNAs) and
defined function 11], 51]. Among non-coding RNAs, the long non-coding RNA 94] and candidate microRNA (see ELFN1-AS1, a novel primate gene expressed predominantly
in tumors) have been described.

The analysis of the relative evolutionary novelty of sequences retrieved from the
paper 51] was performed. The protein-coding sequences were studied by ProteinHistorian tool
28]. The nucleotide BLAST algorithm and the original Python script 3] were used to analyze the novelty of noncoding sequences. The orthologs of tumor-specifically
expressed sequences described by Baranova and co-authors were searched in 26 completely
sequenced eukaryotic and prokaryotic genomes. The curves of phylogenetic distribution
of orthologs of these sequences have been generated. The data suggest that both sets
of tumor-specifically expressed sequences are relatively evolutionary novel. The non-coding
tumor-specifically expressed sequences are younger than protein-coding tumor-specifically
expressed sequences. During last 39 million years of evolution, these sequences represented
the youngest gene class in human ancestors’ genomes 115], 116].

In vitro experiments intended to confirm that the sequences found in silico are indeed specifically expressed in tumors were also carried out. cDNA panels from
normal and tumor tissues were used for PCR with specific primers. In total, 56 sequences
described in 11] have been studied in this way. Among them, nine were confirmed to be highly tumor-specific
94], 95], 138]. The sequences that have been confirmed to be tumor-specific are expressed in a vast
variety of tumors. For example, the sequence Hs. 202247 is expressed in 46 tumor samples
out of 56 examined and in none of 27 normal tissues. One of the protein products of
the sequences that proved to be tumor-specific appeared to be a promising immunogen
for antitumor vaccine development 138], 170]. However, most of experimentally confirmed tumor-specific sequences appear to be
non-coding RNAs.

The nine experimentally confirmed tumor-specific sequences were studied for their
evolutionary novelty using molecular-biological techniques, comparative genomics analysis,
the search for orthologous sequences and sequence conservation analysis 92], 163], 164]. Eight of the nine tumor-specifically expressed sequences are either evolutionarily
new (primates or humans) or relatively young (mammals) (Table 1) and evolve neutrally 92], 93], 162]–164]. I suggest to call such sequences Tumor-Specifically Expressed, Evolutionarily New
Sequences, or TSEEN sequences.

Table 1. Evolutionarily novel and young genes with tumor specific or predominant expression
studied at the Biomedical Center

The sequence Hs.285026 (HHLA1) contains ORF, although the corresponding protein is not shown experimentally. This
sequence is similar to human de novo protein-coding genes 86]. As far as corresponding protein has not been shown, this sequence may represent
the earlier stage of the novel gene origin comparing to those described by D.G. Knowles
and A. McLysaght. This and other sequences described in our studies (besides protein-coding
sequences with established functions) may represent proto-genes (gene precursors which
have not yet acquired functions and evolve neutrally 29]) at different stages of their evolution towards novel genes with protein or RNA related
functions. The sequence Hs.633957 represents this transition.

ELFN1-AS1, a novel primate gene expressed predominantly in tumors

The human transcribed locus resides in the 7th chromosome and corresponds to the UniGene
EST cluster Hs.633957. It was found by our group to be expressed in a tumor-specific
manner by in silico analysis 11]. Later these data were supported experimentally: specific transcripts of the locus
were detected in tumors of various histological origins, but not in most of the healthy
tissues 94], 149], 150].

Experimental and in silico evidence that locus is a stand-alone gene which has its own promoter and capability
for alternative splicing was obtained. However, only one splicing isoform is predominant.
The gene was assigned a gene symbol ELFN1-AS1, ELFN1 antisense RNA 1 (non-protein coding), gene name approved by Human Gene Nomenclature
Committee. Our data point to the miRNA function of ELFN1-AS1 with DPYS mRNA being its primary target 151], 152].

This gene originated de novo from an intronic region of a conservative gene ELFN1 (NCBI Ref. Seq. NM_001128636.2) in primate lineage. Homologous sequences of this
gene were identified by us in all primates, but the DNA sequence from the representative
of suborder Strepsirrhini Otolemur garnettii has more than 50 % differences from its human counterpart and forms an outgroup on
the phylogenetic tree. Thus ELFN1-AS1 could become transcriptionally active after divergence of Strepsirrhini and Haplorhini
primates. It is noteworthy that all the Haplorhini primates have a region with 5 or
more E-boxes downstream of the DS site. This suggests that ELFN1-AS1 gene since its origin could be c-Myc-responsive.

Taken together, the data indicate that human transcribed locus contains a gene for
some non-coding RNA, likely a microRNA. This gene combines features of predominant
expression in tumors and evolutionary novelty 151], 152].

PBOV1, de novo originated human gene with tumor-specific expression

In the study of PBOV1 gene the other approach was used, i.e. the evolutionary novelty of the gene was studied
first.

PBOV1 (UROC28, UC28) is a human protein-coding gene with a 2501 bp single-exon mRNA and 135aa ORF. The
gene has been originally characterized by An and co-workers 4]. This gene was mentioned among 12 human genes without orthologs in the mouse and
dog genomes in the paper of Clamp and co-authors 38]. We studied the evolutionary novelty of this gene more carefully and found that the
coding sequence of PBOV1 is poorly conserved in the mammalian evolution and originated de novo in primate evolution through a series of frame-shift and stop codon mutations. Consequently,
80 % of protein sequence is unique to humans. The Ka/Ks ratio both in pairwise alignments
and in multiple alignment of all primate sequences syntenic to human coding sequence
didn’t show any significant differences from 1.0, indicating that the amino acid sequence
evolved neutrally. PBOV1 protein lacks any annotated or predicted domains and over
60 % of its sequence is predicted to be disordered. These findings strongly suggest
that human PBOV1 is a protein of a very recent de novo evolutionary origin 165].

After establishing the evolutionary novelty of PBOV1 gene, the specificity of its expression in tumors and normal tissues was studied.
PBOV1 has been previously reported to be overexpressed in prostate, breast, and bladder
cancers 4]. We studied the expression of PBOV1 using PCR on panels of cDNA from various normal and tumor tissues. The gene had a
highly tumor-specific expression profile. It was expressed in 20 out of 34 tumors
of various origins but was not expressed in any of the normal adult or fetal human
tissues that we tested (Figs. 1 and 2). The interesting feature of this result is that tumor specificity of PBOV1 expression was predicted by us from its evolutionary novelty 96], 165].

Fig. 1. PBOV1 expression measured by PCR in cDNA panels from human tumors. a Tumor cDNA Panel (BioChain Institute, USA): 1 – Brain medulloblastoma, with glioma,
2 – Lung squamous cell carcinoma, 3 – Kidney granular cell carcinoma, 4 – Kidney clear
cell carcinoma, 5 – Liver cholangiocellular carcinoma, 6 – Hepatocellular carcinoma,
7 – Gallbladder adenocarcinoma, 8 – Esophagus squamous cell carcinoma, 9 – Stomach
signet ring cell carcinoma, 10 – Small Intestine adenocarcinoma, 11 – Colon papillary
adenocarcinoma, 12 – Rectum adenocarcinoma, 13 – Breast fibroadenoma, 14 – Ovary serous
cystoadenocarcinoma, 15 – Fallopian tube medullary carcinoma, 16 – Uterus adenocarcinoma,
17 – Ureter papillary transitional cell carcinoma, 18 – Bladder transitional cell
carcinoma, 19 – Testis seminoma, 20 – Prostate adenocarcinoma, 21 – Malignant melanoma,
22 – Skeletal Muscle malignancy fibrous histocytoma, 23 – Adrenal pheochromocytoma,
24 – Non-Hodgkin’s lymphoma, 25 – Thyroid papillary adenocarcinoma, 26 – Parotid mixed
tumor, 27 – Pancreas adenocarcinoma, 28 – Thymus seminoma, 29 – Spleen serous adenocarcinoma,
30 – Hodgkin’s lymphoma, 31 – T cell Hodgkin’s lymphoma, 32 – Malignant lymphoma.
NC – PCR with no template, PC – PCR with human DNA. b PBOV1 expression in clinical tumor samples. PBOV1 is expressed in breast cancer (9–250),
ovary cancer (1, 6), cervical cancer (2, 13), endometrial cancer (156, 270), lung
cancer (12, 14, 17), seminoma (7), meningioma (63), non-Hodgkin lymphomas (67, 82,
92, 102, 113). From open access paper 165]. Copyright of authors

Fig. 2. Expression of PBOV1 and GAPDH (positive control) measured by PCR in cDNA panels from
human normal tissues. a Human MTC Panel I (1–8), Human MTC Panel II (9–16): 1 – brain, 2 ¬– heart, 3 – kidney,
4 – liver, 5 – lung, 6 – pancreas, 7 – placenta, 8 – skeletal muscle, 9 – colon, 10
– ovary, 11 – peripheral blood leukocyte, 12 – prostate, 13 – small intestine, 14
– spleen, 15 – testis, 16 – thymus. b Human Digestive System MTC Panel: 1 – cecum, 2 – colon, ascending 3 – colon, descending
4 – colon, transverse 5 – duodenum, 6 – esophagus, 7 – ileocecum, 8 – ileum, 9 – jejunum,
10 – liver, 11 – rectum, 12 – stomach. c Human Immune System MTC Panel (1–7), Human Fetal MTC Panel(8–15): 1 – bone marrow,
2 – fetal liver, 3 – lymph node, 4 – peripheral blood leukocyte, 5 – spleen, 6 – thymus,
7 – tonsil, 8 – fetal brain, 9 – fetal heart, 10 – fetal kidney, 11 – fetal liver,
12 – fetal lung, 13 – fetal skeletal muscle, 14 – fetal spleen, 15 – fetal thymus;
A-C: NC – PCR with no template, PC – PCR with human DNA. From open access paper 165]. Copyright of authors

Unlike cancer/testis antigens genes PBOV1 is expressed from a GC-poor TATA-containing promoter which is not influenced by DNA
methylation and is not active in testis. PBOV1 activation in tumors may depend on sex hormone receptors, C/EBP transcription factors
and Hedgehog signaling pathway. Although the PBOV1 protein has recently originated
de novo and thus has no identifiable structural or functional signatures, a missense SNP
(single nucleotide polymorphism) in it has been previously associated with an increased
risk of breast cancer. Using publicly available data we found that higher level of
PBOV1 expression in breast cancer and glioma samples were significantly associated with
a positive disease outcome. PBOV1 is also highly expressed in primary but not recurrent high-grade gliomas, suggesting
that immunoediting against PBOV1-expressing cancer cells might occur over the course of disease. We propose that PBOV1 is a novel tumor suppressor gene which might act by provoking the cytotoxic immune
response against cancer cells that express it. We speculate that this property might
be a source of phenotypic feedback that facilitated PBOV1 gene fixation in human evolution 165].

The evolutionary novelty of human cancer/testis antigen genes

Cancer/testis antigen genes (CTA or CT genes) code for a subgroup of tumor antigens
expressed predominantly in testis and different tumors. CT antigens may be also expressed
in placenta, in female germ cells, and in the brain 33], 64], 175], 209], 210] (see discussion of CT genes expression in the brain in 91]). At the time of the study, CTDatabase (http://www.cta.lncc.br) included 265 CT genes and 149 CT gene families.

The hypothesis of the expression of evolutionarily novel genes in tumors explains
this otherwise strange cancer-testis association paradox: as far as the origin of
evolutionarily novel genes is connected with their expression in germ cells, cancer/testis
genes are novel genes which are expressed in tumors.

So I suggested that cancer/testis antigen genes should be evolutionarily new or young
genes. In order to prove this prediction, the presence of genes orthologous to human
cancer-testis genes in human lineage was studied 44]. This analysis was performed separately for genes located on the X chromosome and
autosomal cancer/testis genes, as far as extensive traffic of novel genes has been
described for mammalian X chromosome 16], 46], 103].

Orthologs of each of CT genes were searched among annotated genes in several completely
sequenced eukaryotic genomes using HomoloGene tool of NCBI 168] and distributions of orthologs of all CT-X genes, all autosomal CT genes, all human
CT genes and all annotated protein coding genes from human genome in 11 taxa of human
evolutionary lineage were built. It was shown that 31.4 % of CT-X genes are exclusive
for humans and 39.1 % of CT-X genes have orthologs originated in Catarrhini and Homininae.
Thereby the majority of human CT-X genes (70.5 %) are novel or young for humans.

Altogether 36.7 % of all human CT genes originated in Catarrhini, Homininae and humans.
It was also found that 30 % of all human CT genes originated in Eutheria. These CT
genes acquired functions in Eutheria. This indicates the importance of processes in
which tumors and CT antigens were involved during the evolution of Eutheria. CT genes
originated in Eutheria are located mainly on autosomes. CT genes originated in Catarrhini,
Homininae and humans are located predominantly on X chromosome. This difference is
probably related to important events in evolution of mammalian X chromosome since
the origin of Eutheria 99], especially to the acquisition of a special role in the origin of novel genes 77].

Thus the majority of CT-X genes are either novel or young for humans, and majority
of all human CT genes (70 %) originated during or after the origin of Eutheria. These
results suggest that the whole class of human CT genes is relatively evolutionarily
new 44].

Our data are in good correspondence with evidence obtained by other groups on particular
families of CT genes. I found the evidence in the literature that at least 7 families
(of 149 families know by that time) of CT genes (MAGE-1, PRAME, SPANX-A/D, GAGE, XAGE, CT45 and CT47) and many CT genes located on the X chromosome (CT-X genes) were either new or young
(reviewed in 91]. Later it was found that one more CT gene family, CTAGE (cutaneous T-cell-lymphoma-associated antigen) shows a rapid and primate specific
expansion, especially in humans, which starts with an ancestral retroposition in the
Haplorhini ancestor followed by DNA-based duplications 214]. But our study 44] was the first systematic study of the evolutionary novelty of the whole class of
CT genes which showed that it is relatively evolutionarily novel. Thus our prediction
of the evolutionary novelty of the whole class of CT genes turned out to be correct.

The relative evolutionary novelty of the whole class of CT genes confirms the prediction
about expression of evolutionarily young and novel genes in tumors. The expression
of cancer/testis genes in tumors thus appears as a natural phenomenon, not an aberrant
process as interpreted by most of authors (e.g. 1], 27], 32], 36], 175], 214]). More discussion of evolutionary novelty of CT genes may be found in my recent book
91].

The list of single genes and gene classes studied by our group at the Biomedical Center
is presented in Table 1.

The data obtained by our group, both on individual genes and on large groups of genes,
suggest that tumor specifically expressed, evolutionarily novel (TSEEN) genes could represent a new biological phenomenon, a phenomenon of TSEEN genes 91]. That is why I looked in the literature for the evidence about similar kind of genes,
i.e. evolutionarily novel, tumor specifically expressed.

Analysis of the literature data related to TSEEN genes

It turned out that many examples of genes with dual specificity –evolutionarily novel,
tumor specifically expressed – could be found in the literature but serious attention
was never paid to this association. Below I will discuss the tumor specificity of
expression of genes originated by different mechanisms – by gene duplication, from
retrotransposons and endogenous retroviruses, by exon shuffling or de novo. As far as positive Darwinian selection is a feature of many evolutionarily novel
genes, human tumor-related genes positively selected in primate lineage will be also
discussed.

Expression of pseudogenes in tumors

Gene duplication is a major way of genome evolution. The original hypothesis 131] suggested that pre-existing genes are under control of natural selection, and their
evolution is constrained within their existing function. The extra copy of existing
gene gets out of control of the natural selection, so that accumulation of mutations
in this extra copy may lead to the origin of a novel gene with related or even new
function. Gene duplication is considered as providing the “row material” for the origin
of new genes. This concept also suggests that the majority of duplicates becomes inactive
pseudogenes due to degenerative mutations, and only rarely beneficial mutations would
lead to the emergence of a new gene with a novel function 131]. But the term “pseudogene” was first introduced by C. Jacq and co-authors in 1977
72].

The DNA-mediated mechanisms of gene duplication include unequal crossing over, tandem,
segmental, chromosomal or genome duplications. The resulting gene duplicates may be
organized in tandem, interspersed or polyploid manner. Segmental duplications are
large interspersed segments of DNA with high sequence identity (90 %), usually separated
by 1Mb of unique sequences 120].

RNA-based gene duplication, or retroposition, creates duplicate genes by reverse transcription
of RNAs from parental genes. RNAs from all categories generate retrosequences that
may be exapted as novel genes or regulatory elements 21]. Retrogenes are most abundant in mammals where long interspersed nuclear elements
(LINEs) that provide the enzyme reverse transcriptase for retroposition are widespread.
The majority of retrogenes is produced by genes with high levels of germline expression.
They often originate from the X chromosome 16], 76]. A new retrogene is intronless, contains a poly(A) tract, and may be flanked by short
duplicate sequences 15], 104].

DNA-mediated gene duplication is more frequent event in genome evolution, while RNA-based
gene duplication is more capable to generate genes with novel functions. The retroposition
is less likely to provide expressed daughter retrocopies than segmental DNA duplication
because retrocopies do not contain regulatory elements. So, new promoters and enhancers
should somehow be recruited for the origin of new genes, and several mechanisms of
such recruitment are described 76], 77]. Retrogenes usually locate on chromosomes different from that of parental genes.
Mammalian X chromosome demonstrates extensive retrogene traffic 46]. For reasons of different location and new promoter recruitment, the transcribed
retrogenes are more capable to evolve new expression patterns and novel functional
roles than gene duplicates arising by DNA segmental duplication 76], 77]. Retrogenes, like duplicates originated through DNA-mediated mechanisms, might provide
the raw material for the origin of evolutionarily novel genes and functionally important
evolutionary innovations 76], 119], 197]. At least one functional retrogene per million years originated in primate lineage
that led to humans 119].

In accordance with two major ways of gene duplication – DNA-based and RNA-based mechanisms
– two types of pseudogenes are categorized as duplicated and processed pseudogenes,
accordingly 105], 148]. One more group of pseudogenes includes so called “unitary” pseudogenes that arise
through spontaneous mutations of single coding genes 216]. Other pseudogene biotypes may include polymorphic pseudogenes (loci known to be
coding in some individuals), IG pseudogenes (immunoglobulin segments with disabling
mutations) and TR pseudogenes (T-cell receptor gene segments with disabling mutations)
147].

Hundreds to thousands of pseudogenes have been identified in different species. In
humans, 11,216 pseudogenes have been recently annotated, including ~8,000 processed
pseudogenes 61], 147]. The extrapolation estimates suggest that the number of pseudogenes in human genome
may be ~14,000 147]. This is smaller than earlier estimates 190], 217]. The processed pseudogenes are the most abundant type of pseudogenes in human genome
which is connected with the burst of retroposition activity in ancestral primates
135], 217]. Pseudogenes have long been considered as non-functional or “junk” DNA. But during
the last decade, the attitude has changed substantially. The evidence is accumulating
that many pseudogenes are transcribed and functional in development and diseases (reviewed
in 105], 148], 154], 173]. Laura Poliseno determines the following types of pseudogene functions: related to
the parental gene and parental gene independent functions; mediated by the pseudogene
DNA, by pseudogene RNA transcribed in sense, by pseudogene RNA transcribed in antisense,
or by pseudogene-encoded proteins 154]. Pseudogenes transcribed as noncoding RNAs may regulate their parental genes as antisense
RNAs, short interfering RNAs (siRNAs) or as microRNA decoys 173]. Pseudogenes participate in the regulation of variety of biological processes including
cancer 105], 148], 154]. One of the earliest indications of the functional role of pseudogenes was demonstration
that in mouse oocytes pseudogene-derived small interfering RNAs regulate gene expression
188], 204]. Besides fully functionally active pseudogenes, partially active pseudogenes in the
process of either losing or gaining function are described 147].

The authors who study pseudogenes come to conclusion that pseudogenes serve as a source
of novel functions for the evolving organisms 10], 22], 105]. A special term – “potogenes – was generated to designate pseudogenes as DNA sequences
with a potentiality for becoming new genes 10], 22]. This is in accordance with the major postulate of original hypothesis of evolution
by gene duplication 131], and we may consider pseudogenes with novel or evolving functions as evolutionarily
novel or evolving genes.

Transcription of pseudogenes is an important indication of their functionality. The
evidence of pseudogenes transcription was accumulating during the last years 10], 219]. The ENCODE and GENCODE projects provided information about transcription of 876
pseudogenes including 531 processed and 345 duplicated pseudogenes 147]. The other group of authors studied RNA-Seq transcriptome data from 248 cancer and
45 benign samples of 13 different tissue types and described the expression of 2,082
distinct pseudogenes 78]. What is important for our consideration of expression of evolutionarily novel genes
in tumors, they observed 218 pseudogenes expressed only in cancer samples, of which
178 were observed in multiple cancers 78].

One of the first demonstrations that pseudogenes are activated in tumors was description
of the new tumor antigen (NA88-A) generating an HLA class I-restricted CTL response
against melanoma coded for by a processed pseudogene 126]. At the same time, the expression of parental gene HPX42B did not lead to similar CTL response. The transcription of NA88-A pseudogene was limited with significant expression found only in some metastatic
melanomas 126].

Among other earlier works was detection of ?PTEN expression in central nervous system high-grade astrocytic tumors 211]. The ?PTEN expression was complementary to PTEN mutation because the majority of glioblastomas showed either PTEN mutation or ?PTEN expression. In the later study 153] the functional relationship between the mRNAs produced by the PTEN tumor suppressor gene and its pseudogene PTENP1 (the other name of ?PTEN) was demonstrated. PTENP1 was able to regulate cellular levels of PTEN and exerted a growth suppressive role
acting as a microRNA-decoy 153].

In a comprehensive paper devoted to human processed pseudogenes Zhaolei Zhang and
co-authors 217] described several pseudogene families with implication to tumors (see Table 5 in
the above mentioned paper).

Other examples of pseudogenes expressed in tumors but not in normal tissues are presented
in Table 2.

Table 2. Pseudogenes expressed in tumors

As we can see from the data presented in this part of the paper, the expression of
pseudogenes in tumors is widespread. Thus the evolution of pseudogene towards functional
novel gene may involve its expression in tumors as a part of the whole process (see
91] for more discussion of the role of gene expression in the origin of novel genes).

Endoretroviral sequences and other retrotransposons are expressed in tumors

Transposable elements are classified in two groups, Class I and Class II. Class I
mobile elements use RNA intermediate and reverse transcriptase activity for transposition,
while Class II elements use a DNA intermediate and a ‘cut and paste’ mechanism. Class
I elements include long terminal repeat (LTR) retrotransposons, also called ‘endogenous
retroviruses’ (ERVs), and non-long terminal repeat (non-LTR) retrotransposons (LINEs
and SINEs) 155]. Human transposable elements comprise about 40 % of the human genome: HERVs, 4.64
%; MaLR, 3.65 %; LINEs, 20.42 %; and SINEs, 13.14 % 100]. That is why mobile elements were called the “drivers of genome evolution” 83]. The role of transposons in gene origin was recently reviewed in 91].

Endogenous retroviruses (ERVs) have been shown to have originated as the result of
repeated germ cell retroviral infection of their ancestral hosts 13], 19], 63], 118], 205]. The genes of ERVs were evolutionarily new for their ancestral hosts. Together with
other retrotransposons, ERVs participated in the origin of genes with the novel functions
to their hosts (reviewed in 91]). There are 203,000 copies of human ERVs (HERVs) in the human genome 100]. Different authors define different numbers of HERV families, from 26 53] to about 50 114], 121] or even 350 families 136].

Human endogenous retrovirus sequences are expressed in tumors 5], 111], 167]. Expression of different HERVs was described in different human tumors: HERV-K family
– in teratocarcinoma 20], seminomas 167], in breast cancer 200], in urothelial and renal cell carcinomas 49], in melanoma, germ cell tumors, gonadoblastoma, ovarian clear cell carcinoma, ovarian
epithelial tumors, prostate cancer, lymphoma, hematological neoplasms, sarcoma, bladder
and colon cancer 30], 65], 82]; HERV-E – in prostate carcinoma 201]; HERV-H – in leukemia cell lines 107] and in cancers of small intestine, bone marrow, bladder, cervix, stomach, colon and
prostate 178].

Recent reviews confirm the upregulation of HERVs in tumors 80], 113], 127], 158], 161], which is connected with general trend of HERVs demethylation in tumors 127], 158], and similar data continue to accumulate 26], 181], 208]. ERVs of mice also demonstrate hypomethylation and transcriptional upregulation in
mice tumors 66], 112], 158].

Endogenous retroviruses may serve as targets for antitumor immunity. For example,
HERV-K-MEL, a HERV-K pseudogene expressed in most melanomas and in many other types of tumors, encodes
the antigenic peptide that is targeted by CTLs in melanoma patients 30], 169]. HERV-E was found to be selectively expressed in clear kidney cell cancer but not
in normal tissues. This tumor-specific expression is connected with inactivation of
the von Hippel-Lindau tumor suppressor and hypomethylation. Antigens encoded by HERV-E
are immunogenic and stimulate cytotoxic T-cells that kill cancer cells. HERV proteins
that act as tumor-associated antigens have also been detected in other types of tumors
37].

Especially interesting for my consideration is HERV-K family because it contains the
most recently active members that entered the ancestral human genome after the divergence
of humans and chimps and may be considered as evolutionarily novel for humans 12], 13], 185]. Many HERV-K proviruses are unique to humans 12]. HERV-K continued to replicate in human lineage until at least 250,000 years ago
114], 117], and might still expand 113]. HERV-K is also most widely expressed in different tumors (see above). In HERV-K
and in other younger families such as HERV-H and HERV-W the most pronounced DNA demethylation
was reported 49], 158]. Not only mRNA, but also HERV-K antibodies are already elevated in the blood at the
early stage of breast cancer 202], 203].

RNA transcripts from various HERV LTRs have been described in various types of human
tumors and cell lines. For example, elevated HERV-K 5?LTR mRNA was detected in prostate
cancer tissues (reviewed in 207]).

Other primate-specific retrotransposons such as SVA, LINE-1P, AluY, and MaLR families
are also known for the loss of DNA methylation in tumors. The younger retroelements
are highly methylated in healthy tissues, while in many tumors these young elements
suffer the most dramatic loss of methylation 49], 130], 186]. L1 and Alu sequences are silenced in normal human cells and activated in tumors
14], 155], 171]. Full length L1 RNA in cancer cell lines and expression of ORF1p in tumors have been
shown (reviewed in 130]). The majority of the retrotransposition events seem to be harmless “passenger” mutations
191].

There are in silico data supporting the increased transcription of retrotransposons in transformed human
cells 41]. Although originally it was thought that HERVs are transcriptionally silent in most
normal tissues, in silico57], 84], 166], 178] and PCR and microarray 6], 50], 140], 174], 179] data suggest that HERV-derived RNAs are more widely expressed in normal tissues than
originally anticipated. HERV-K is transcribed during normal human embryogenesis 56]. Syncytin, the envelope gene of human defective endogenous retrovirus HERV-W, is
expressed in multinucleated placental syncytiotrophoblasts and may mediate placental
cytotrophoblast fusion 18], 123], 198].

Genes originated by exon shuffling are expressed in tumors and may lead to oncogenic
transformation

The principle of gene origin by exon shuffling is the following: new genes are created
by recombining previously existing exons that leads to the origin of mosaic genes
and proteins 54], 75], 110], 141]–143]. The exon shuffling is important mode of the origin of new genes: at least 19% of
the exons in data base were involved in exon shuffling 109]. The correlation between exon-intron organization of the gene and the domain organization
of the corresponding protein is most evident in the case of young vertebrate genes,
e.g. genes coding for proteases of blood coagulation, fibrinolytic and complement
cascades, etc. That is why the first evidence for exon shuffling came from studies
on proteases of blood coagulation and fibrinolysis 143].

The mechanisms of exon shuffling include illegitimate recombination 192], 193], retroposition 125], segmental duplication 45] and L1 retrotransposon-mediated 3? transduction 125].

Modular domain rearrangements can lead to cancer. The fusion of the self-oligomerizing
SAM domain from the gene TEL to the catalytic domain of the nonreceptor tyrosine kinase Abl in some human leukemias
results in constitutively clustered chimeric protein, persistent activation of tyrosine
kinase and oncogenic transformation. Tyrosine kinases other than Abl are also activated
in fusion proteins by oligomerization of SAM domain of TEL 106]. Activation of Abl tyrosine kinase seen in patients with chronic myelogenous leukemia
is caused by translocation of the tip of chromosome 9 encoding Abl to chromosome 22
encoding BCR and formation of fusion protein. Oligomerization of coiled-coil domains
from BCR leads to constitutive activation of Abl 106].

The Tre2(USP6) oncogene is a hominoid-specific gene. It originated by the fusion of two genes, USP32 (NY-REN-60) and TBC1D3. USP32 is an ancient gene and highly conserved. TBC1D3 is young and originated by recent segmental duplication in primates. Tre2 is young for humans as far as it originated 21–33 million years ago after TBC1D3 segmental duplication in primates 144].

Atypical splicing in combination with retrotransposition may also lead to exon shuffling.
Moreover atypical splicing of existing genes may be the most prevalent mechanism of
novel protein creation. Atypical splicing includes alternative splicing within the
single-gene transcripts and intergenic splicing of transcripts from tandemly located
genes. Transcription-induced chimeras may evolve into gene fusions, and alternative
splicing may evolve to gene fission (reviewed in 8]). For instance, the chimeric PIPSL gene was formed by L1-mediated retrotransposition of a readthrough, intergenically
spliced transcript in hominoids 9]. This phenomenon was called transcription-mediated gene fusion. Many examples of
intergenic splicing have been described in the human genome. The authors suggest that
it is a novel mechanism of gene origin, where transcription-induced chimerism followed
by retroposition may result in new gene 2]. At least 4 %–5 % of the tandem gene pairs in the human genome can be transcribed
into a single RNA coding for chimeric protein 139].

Alternative splicing often participates in exonization process. When the new exon
is alternatively spliced and expressed at low levels, splice variants with and without
new exon are represented, and the pre-existing function is not destroyed. This opens
the way to the origin of new gene with a new function and/or new functional module
due to novel exon 54], 128], 177], 199]. The comparison of human, mouse and rat genomes indicates that alternative splicing
is associated with an increased frequency of exon creation and/or loss 124].

Transposed element exonization may be a source of new constitutively spliced exons.
Alu-containing exons are alternatively spliced. Comparative analysis of transposed
element insertion within human and mouse genomes reveals Alu’s unique role in shaping
the human transcriptome 172], 176].

The alternative splicing is widespread in cancer. The splice changes in cancer are
global. Up to half of all alternative splicing events may be changed in tumors. Some
splice isoforms are upregulated in all studied cancers, the others are characteristic
to certain types of tumors. Affected proteins include transcription factors, cell
signal transducers, transmembrane proteins, secreted extracellular proteins, proteins
involved in metabolism, angiogenesis, apoptosis, cell motility and invasion, oncoproteins
and tumor suppressor proteins. Genes with alternative transcripts associated with
various cancers include CD44, p53, p73, PTEN, APC, BCL-X, VEGF4, mdm2, BRCA1, TACC1, TERT, KLF6, SURVIVIN,
ASIP, NF1, Caspase 8, CDH17, Ron, BARD1, AR, FGFR2, RUNX1, HOXA9, WT1, BIM, TF,
HERV-K env (p9), HNRPK and many others. Many of these genes have multiple splicing patterns, e.g. mdm2 gene locus produces over 72 mdm2 variants. Alternative splicing in cancer-related genes may have impact on all major
aspects of tumor cell biology. All hallmarks of cancer have alternatively spliced
regulators. There are also many cancer-associated splice variants with unknown functions
7], 35], 42], 52], 59], 85], 101], 102], 133], 156], 160], 182], 195], 196].

Atypical splicing events do not alter the number of genes in DNA, but produce altered
proteins which influence all aspects of tumor biology. In evolutionary perspective,
atypical splicing combined with retrotransposition may lead to the origin of novel
genes. The promising direction of research would be to study what proportion of spicing
events involved in cancer have already generated (through retroposition) novel genes
in the germ plasm.

Genes originated de novo are specifically expressed in tumors

“Senseless” DNA sequences may acquire new functions in the organism and become new
genes. New functions may be connected not only with protein-coding genes, but also
with various functional non-coding RNAs. This mechanism of novel genes origin is called
de novo origin.

New promoter elements such as GC-islands, TATA-boxes, LINE1 promoters or retroviral
LTRs may arise as a result of mutational process, gene rearrangements, retrotransposition
or viral infection. Such events can lead to expression of “senseless” DNA sequences
that subsequently may accumulate mutations that alter their protein-coding capacity.
The senseless DNA sequences acquire new functions. Noncoding RNAs may eventually acquire
ORFs and become protein-coding mRNAs. These could be mechanisms of de novo gene origin. Exonization by alternative splicing may be the mechanism of de novo exon origin (see discussion above in Genes originated by exon shuffling are expressed
in tumors and may lead to oncogenic transformation).

Three novel human protein-coding genes have been shown to originate from noncoding
DNA since the divergence with chimp. These genes have no protein-coding homologs in
any other genome. Few human-specific mutations altered protein-coding capacity by
destroying “disablers” in the ancestral sequences. The existence of protein-coding
genes is supported by expression and proteomic data 86]. One of those genes – CLLU1 – has been shown earlier to be specifically expressed in chronic lymphocytic leukemia
(CLL) 23]. The CLL expression specificity of CLLU1 was later confirmed in several studies 24], 74], 134], 159]. It was also shown that CLLU1 is expressed in other tumors (tumors of lung, stomach, prostate and spleen), but
in no normal tissue [97], in press]. We may conclude that CLLU1 belongs to TSEEN genes.

PBOV1, a gene of the recent de novo origin specific to humans, has highly tumor-specific expression profile 165] (see discussion above in PBOV1, de novo originated human gene with tumor-specific
expression).

PBOV1 expression levels positively correlate with relapse-free survival in breast cancer
patients and with overall longitude of survival in glioma patients 165]. On the contrary, CLLU1 is highly expressed in poor-prognostic patients 23], 24], 74], 134], 159].

Positive selection of human tumor-related genes in primate lineage

Positive Darwinian selection participates in the evolution of the novel genes. Comparison
of the rate of amino acid replacement substitution with the rate of synonymous substitution,
population genetic analyses of polymorphisms and the findings of convergent evolution
support the adaptive evolution of the novel genes. There are many examples of rapidly
evolving novel genes and gene families supported by positive selection. In humans,
strong positive selection and accelerated evolution was documented for lactase gene
and for many other genes with different molecular functions, e.g. transcription factors,
genes involved in nuclear transport, DNA metabolism/cell cycle, protein metabolism,
pigmentation pathways, dystrophin protein complex, heat shock proteins; various types
of genes related to sensory perception, immune response, reproduction, morphology,
host-pathogen interactions, and neuronal functions. Examples of positively selected
gene families are also numerous, including those in African great apes and hominids.
Several gene families have expanded or contracted rapidly in primates, including brain-related
families in humans. Many of such families show evidence for positive selection. The
proportion of positively selected genes is significantly higher in younger genes in
humans, i.e. positive selection may play a role in faster evolution of younger genes.
Many examples of rapid evolution and positive selection of new genes described in
the literature points out that this phenomenon is widespread. It supports involvement
of novel genes and gene families in adaptation and speciation and in evolution and
enhancement of new functions (reviewed in 91]).

For our consideration, it is important that positive selection in primate lineage
was described for many human tumor-related genes 39], 40], 43], 129], 145], 180].

SPANX, GAGE, PRAME and CTAGE families of cancer/testis antigen genes, with unknown functions yet, undergo positive
selection in primate evolution 43], 55], 87], 108], 214]. Comparison of human/chimp orthologues of CT-X genes has shown that they diverge
faster and undergo stronger positive selection than those on the autosomes 180].

Adaptive evolution of the tumor suppressor BRCA1 in humans and chimps was demonstrated
68]. Most of the internal BRCA1 sequence is variable between primates and evolved under
positive selection 145].

Angiogenin (ANG) is the tumor-growth promoter due to its ability to stimulate the
formation of new blood vessels. Its expression is elevated in variety of tumors. The
study among several primate species showed that ANG gene has a significantly higher rate of nucleotide substitution at nonsynonymous
site than at synonymous sites, an indication of positive selection 212].

Comparison of 7645 chimp gene sequences with their human and mouse orthologs showed
accelerated evolution in functions related to oncogenesis 39]. A search for positively selected genes in the genomes of humans and chimps showed
the evidence for positive selection in many genes involved in tumor suppression, apoptosis
and cell cycle control 129].

More examples of positively selected tumor-related genes are reviewed in 40].

Positive selection of many human tumor-related genes in the evolution of primates
confirms the prediction of evolution by tumor neofunctionalization hypothesis concerning
expression of evolutionarily new genes in tumors and selection for their new organismal
functions. If an evolutionarily new gene is expressed in tumors, or a sequence that
is expressed in tumors acquires a function beneficial to the organism and becomes
an evolutionarily new gene, selection of organisms for the enhancement of the new
function should take place, as predicted by the hypothesis. This is exactly what was
found in papers discussed above: the positive selection of genes and proteins in different
primate groups, not the somatic evolution of tumor cells. More discussion of positive
selection in relation to the possible evolutionary role of tumors may be found in
91].

The paradox of the positive selection of many tumor-associated genes is difficult
to explain otherwise than by the postulation that tumors play a positive evolutionary
role. The other attempt to explain positive selection of tumor-related genes is based
on the concept of genomic conflict and antagonistic coevolution 40], 129].

Some evolutionarily novel genes are cellular oncogenes. The Tre2(USP6) oncogene is a hominoid-specific gene 144] (see discussion above in part 2.3). Evolutionarily novel genes CT45A1, TBC1D3 and NCYM may act like oncogenes (reviewed in 215]). Y. Zhang and M. Long suggest that these genes may also assume other biological
functions, and attract the selection, pleiotropy and compensation hypothesis of M.
Pavlicev and G.P. Wagner 146] to explain the paradox related to their oncogene role.