Unexpected links reflect the noise in networks

Mathematical formalism

Encouraged by these results, to better understand the properties of this new metric (PUC) we went further to establish a rigorous mathematical framework.

Our hypothesis that unexpected correlations are erroneous can be rigorously proven for systems that transition between two stable states with two types of relationship between parameters: stimulation and inhibition. Herein, as an example, we provide a proof of our hypothesis using a simple Bayesian network9 with two equilibrium states and linear dependences between nodes. The general case is considered in Section II.2 of the Additional file 1.

In order to formulate our results, we begin by stating the following mathematical notions and definitions. A regulatory network is represented as a directed acyclic graph (DAG) G?=?(V,?E). Any edge e???E is an ordered pair of vertices (nodes) e?=?(v,?w)???V
2. The order of vertices in an edge represents the direction of causality in a regulatory network (that is, in the edge (v,?w), v regulates w). For any node v we associate the set of its parents as pa(v)?:=?{u???V?:?(u,?v)???E}. We define the set of root-nodes gf(G) for the graph G as the set of all nodes without parents: gf(G)?:=?{v???V?:?pa(v)?=??}. For simplicity, we consider a regulatory network with only one root-node, |gf(G)|?=?1, denoted by the vertex o. The case with more than one root-node is covered by the general model considered in Additional file 1, see SectionI.et graph G be weighted, meaning that every edge e?=?(v,?w)???E has an associated label (weight) c
vw
????. With every node v???V we associate a random variable M
v
. Variables M
v
v???V, are connected by the following structural linear equations:

Here, the random variables ?
v
,?v???V, representing the noise in the system, are mutually independent and identically distributed with mean 0 and variance ?

v
2
. We suppose heteroscedasticity with uniformly bounded variances: there exists ?
2 such that ?

v
2
????
2 for all v???V. By defining the distribution of the root-node variable M
o
, we obtain a unique joint distribution of random variables M
v
v???V. This joint distribution will be referred to as equilibrium state.

In the previously discussed biological framework, a graph G represents the entire gene expression network. A node v represents a gene with the corresponding expression level M
v
. . An edge e?=?(v,?w) represents a causal link between two genes v and w in which the expression of w is regulated by v, and c
vw
is the interaction weight. The sign of c
vw
reflects the direction of regulation: a negative (positive) sign corresponds to inhibition (stimulation). The parents of v are simply all genes which regulate v and the root-node of G is the primary regulator of the entire network.

In order to define two distinct equilibrium states, say P and Q (e.g. case and control, disease and health, etc.) for a system defined by causal graph G and structural equations (1), we need only to define two independent root-node variables, M

o
(P)
and M

o
(Q)
, together with mutually indepennt noise variables ?

v
(P)
?

v
(Q)
v???V. Let M

v
(P)
and M

v
(Q)
denote the expressions of the gene at node v in two distinct equilibrium states P and Q. For any v we denote the changes in expression between states as ( {varDelta}_v=mathbb{E}left({M}_v^{(P)}right)-mathbb{E}left({M}_v^{(Q)}right) ), where ( mathbb{E} ) denotes the expectation value (mean) of corresponding variable.

The mathematical definition of expected and unexpected links, as introduced informally in the introduction, is now formally expressed in the following definition.

Definition
. An edge e???E is called an expected link between nodes v,?w???V if and only if ?
v
?
?
w
cov(M

v
(P)
,?M

w
(P)
)??0 and ?
v
?
?
w
cov(M

v
(Q)
,?M

w
(Q)
)??0. An edge which is not an expected link is said to be an unexpected link.

This definition effectively states that the directions of regulation of two genes between two states should agree with the sign of the correlation between them within each state.

Note that the covariances in the definition can be substituted by the coefficient of correlation (Pearson correlation).

In the main lemma stated below, we show that here, all unexpected links are produced by the noise in the system: i.e. if ?
2 is small enough, then the system will have no unexpected links.

Lemma 1. For any finite DAG with linear structural equations (1) and two equilibrium states there exists some ?
0such that if ?

v
2
???

02

for all v???V, then there are no unexpected links in the system.

A formal proof of this statement (under certain conditions) is given in Section III.3 of the Additional file 1, as well as an explanation for why this makes intuitive sense. The basic idea is that false edges are, in principle, equally likely to have expected correlations as they are to have unexpected correlations.