- Research
- Open Access

# Dynamically analyzing cell interactions in biological environments using multiagent social learning framework

- Chengwei Zhang
^{1}, - Xiaohong Li
^{1}Email author, - Shuxin Li
^{1}and - Zhiyong Feng
^{2}

**8 (Suppl 1)**:31

https://doi.org/10.1186/s13326-017-0142-0

© The Author(s) 2017

**Published:**20 September 2017

## Abstract

### Background

Biological environment is uncertain and its dynamic is similar to the multiagent environment, thus the research results of the multiagent system area can provide valuable insights to the understanding of biology and are of great significance for the study of biology. Learning in a multiagent environment is highly dynamic since the environment is not stationary anymore and each agent’s behavior changes adaptively in response to other coexisting learners, and vice versa. The dynamics becomes more unpredictable when we move from fixed-agent interaction environments to multiagent social learning framework. Analytical understanding of the underlying dynamics is important and challenging.

### Results

In this work, we present a social learning framework with homogeneous learners (e.g., Policy Hill Climbing (*PHC*) learners), and model the behavior of players in the social learning framework as a hybrid dynamical system. By analyzing the dynamical system, we obtain some conditions about convergence or non-convergence. We experimentally verify the predictive power of our model using a number of representative games. Experimental results confirm the theoretical analysis.

### Conclusion

Under multiagent social learning framework, we modeled the behavior of agent in biologic environment, and theoretically analyzed the dynamics of the model. We present some sufficient conditions about convergence or non-convergence and prove them theoretically. It can be used to predict the convergence of the system.

## Keywords

- Multiagent learning
- Cell interaction
- Nonlinear dynamic

## Background

All living systems live in environments that are uncertain and dynamically-changing. However, it is remarkable that these systems survive and achieve their goals by exhibiting intelligent features such as adaption and robustness. Biological system behaviors [1] and human diseases [2] are often the outcome of complex interactions among a very large number of cells and their environments [3, 4].

Similarly, in the multiagent system [5–9], an important ability of an agent is to adjust its behavior adaptively to facilitate efficient coordination among agents in unknown and dynamic environments. If we regard the cells in the biological system as the agents in the multiagent system, we can analyse the cells’ behavior using the theory of multiagent system. So understanding collective decision made by such intelligent multiagent system is an interesting research topic not only for artificial intelligent but also for biology. The conclusion of the theoretical analysis can be applied to the research of biology, for example, the results of convergence can be used for explaining the phenomenon of cell’s group behaviour.

Now, computational methods have been widely used to solve biological problems [10, 11]. Many researchers have investigated biological systems which are composed of cells and their environments via modeling and simulation [1, 12]. There are two principal approaches: population based modeling and discrete agent based modeling. Population based modeling approximates the cells within any grid box by a set of variables associated with the grid box [13, 14]. Discrete agent based modeling maps each cell to a discrete simulation entity [13, 15, 16].

We use multiagent learning techniques to model the behaviors of each cell agent, which is an important technique to achieve efficient coordination in multiagent system area [9, 17–19]. Until now, significant amount of efforts have been devoted to develop effective learning techniques for different multiagent interaction environments [20–23]. In the multiagent environments, each agent interacts with the agent selected from its neighborhood randomly each round, and updates its strategy based on the feedback in the current round. To describe the behavior of an agent, one common line of researches is to extend existing reinforcement learning techniques in single-agent environment to multiple-agent interaction environment. However, due to the violation of Markov property, the existing theoretical guarantees do not hold any more in multiagent environment. It is important and challenging for us to model the multi-agent environment and analyse the learning dynamics of multiagent environments.

This paper presents a social learning framework to simulate the dynamics of multiagent system in biological environment and a theoretical analysis of the learning dynamics of this model is also given. The analysis results shed lights on how and when the consistent knowledge in terms of equilibrium can be or not be evolved among the population of agents. In the social learning framework, all agents play *PHC* strategy [24] for decision making, and use a weighted graph model for neighbor selection. In the part of theoretical analysis, we present a theoretical model to analyze the learning dynamics of the learning framework. The purpose of analysing the learning dynamics is to judge whether the learning algorithm that the agent adopt can converge or not. The intention behind is that convergence to an equilibrium has been the most commonly accepted goal to pursue in multiagent learning literature. Firstly, we model the overall dynamics among agents as a system of differential equations. Then, some conditions are proved to be the sufficient condition of convergence or non-convergence. It can be used to predict the convergence of the system. Finally, we estimate the prediction through simulation experiment. The experimental results confirm the predictive outcomes of our theoretical analysis.

The remainder of the paper is organized as follows. “Method” section first reviews normal-form game and the basic gradient ascent approach with a GA-based algorithm named *PHC*, and then introduces the multiagent learning framework where all the agents are *PHC* learners. In the “Result and discussion” section, we present the theoretical model of the learning dynamics of agents, and prove convergence and non-convergence conditions by analyze geometrical behaviors of the hybrid dynamic system in the help of nonlinear dynamic theory. In the “Experimental simulation” section, we evaluate the predictive ability of our theoretical model by comparing it with the simulation results. Lastly we conclude the paper and point out future directions in “Conclusion” section.

## Method

### Notation and definition

#### Normal-form games

*i*∈{

*k,l*} can be specified by a matrix as follows,

Each player *i* selects an action simultaneously from its action set *A*
_{
i
}={1,2}, and the payoff of each player is determined by their joint actions. For example, if player *k* selects the pure strategy of action 1 while player *l* selects the pure strategy of action 2, then player *k* receives a payoff of \(r_{k}^{12}\) and player *l* receives the payoff of \(r_{l}^{21}\).

*p*

_{ k }∈[0,1] and

*p*

_{ l }∈[0,1] denote the probability of choosing action 1 by player

*k*and player

*l*, respectively. Given a joint mixed strategy profile (

*p*

_{ k },

*p*

_{ l }), the expected payoffs of player

*l*and player

*k*can be computed as follows,

A strategy profile is a Nash Equilibrium (NE) if no player can get a better expected payoff by changing its current strategy unilaterally. Formally, \(\left (p_{k}^{*},p_{l}^{*} \right)\in \left [0,1 \right ]^{2}\) is a NE, iff \(V_{k}\left (p_{k}^{*},p_{l}^{*} \right)\geq V_{k}\left (p_{k},p_{l}^{*} \right)\) and \(V_{l}\left (p_{k}^{*},p_{l}^{*} \right)\geq V_{l}\left (p_{k}^{*},p_{l}\right)\) for any (*p*
_{
k
},*p*
_{
l
})∈[0,1]^{2}.

#### Gradient ascent (GA) and *PHC* algorithm

*i*that employs GA-based algorithm updates its policy towards the direction of its expected reward gradient, which is shown in the following equations.

*η*is the size of gradient step.

*Π*

_{[0,1]}is the projection function mapping the input value to the valid probability range of [0,1], which is used for preventing the gradient from moving the strategy out of the valid probability space. Formally, we have

In the case of infinitesimal size of gradient step (*η*→0), the learning dynamics of the agent can be modeled as a system of differential equations. Further, it can be analyzed using dynamic system theory [25]. It is proved that the strategies of all agents will converge to a Nash equilibrium, or if the strategies do not converge, agents’ average payoff will converge to the average payoff of Nash equilibrium [26]. The policy hill-climbing algorithm (*PHC*) is a combination of gradient ascent algorithm and Q-learning where each agent *i* adjusts its policy *p* to follow the gradient of expected payoff (or the value function *Q*). It is shown in the Algorithm 1.

Here, *α*∈(0,1] and *δ*∈(0,1] are learning rate, and *Q* values are maintained just as in normal *Q*-learning. The policy is improved by increasing the probability of selecting the highest valued action based on the learning rate *δ*.

### Modeling multiagent learning

Under the multiagent social learning framework with *N* agents, each agent interacts with one of its neighbors selected randomly from its neighborhood each round. The neighborhood of each agent is determined by its underlying network topology. The interaction between each pair of agents is modeled as a two-player normal-form game. During each interaction, each agent selects its action following a specified learning strategy, which is updated repeatedly based on the feedback from the environment at the end of interaction. The framework is presented in Algorithm 2.

We use graph *G*=(*V,E*) to model the underlying neighborhood network, which is composed by *N*=|*V*| agents. The edges *E*={*e*
_{
ij
}}, *i,j*∈*V* represent social contacts among agents, where *e*
_{
ij
} denotes the probability that agent *i* chooses agent *j* to interact with. We have \({\sum \nolimits }_{j\in V}{{{e}_{ij}}=1}\wedge {{e}_{ii}}=0\). Here, we propose an adaptive strategy for agents to make their decisions in social learning framework with PHC learning strategy, which is shown in Algorithm 3.

## Result and discussion

### Analysis of the multiagent Learning Dynamics

In this section, we present a theoretical model to estimate and analyze the learning dynamics of the above multiagent learning framework in Algorithm 3. We extend notations in section to the multiagent environment. Without loss of generality, we consider the case with two-action only.

*i*∈

*V*can be defined as a fixed matrix

*R*

_{ i },

*i*when

*i*selects action

*m*and its neighbor selects

*n*. Here, we use the

*p*

_{ i }to denote the probability that the player

*i*selects action 1. Then the mixed strategy (

*p*

_{1},

*p*

_{2},…,

*p*

_{ N }) in multiagent framework can be considered as a point in \({\mathbb {R}}^{N}\) constrained to the unit square. The expected payoff

*V*

_{ i }(

*p*

_{1},

*p*

_{2},…,

*p*

_{ N }) of player

*i*can be computed as follows,

where \(u_{i}=r_{i}^{11}+r_{i}^{22}-r_{i}^{12}-r_{i}^{21}\), \(c_{i}=r_{i}^{12}-r_{i}^{22}\), \(V_{i,j}\left (p_{i},p_{j}\right)=r_{i}^{11}p_{i}p_{j}+r_{i}^{12}p_{i}\left (1-p_{j}\right)+r_{i}^{21}\left (1-p_{i}\right)p_{j}+r_{i}^{22}\left (1-p_{i}\right)\left (1-p_{j}\right)\), and *e*
_{
ij
} is the probability that the agent *i* selects agent *j* to interact with.

*i*updates its strategy in order to maximize the value of

*V*

_{ i }. Recall the Eqs. 4 and 5, we can obtain

where parameter *η* is the size of gradient step.

*η*

_{ p }→0, it is straightforward that the Eq. 11 becomes differential equation. Considering the step size to be infinitesimal, the unconstrained dynamics of the all players’ strategies can be modeled by the following differential equations.

where *P*=(*p*
_{1},*p*
_{2},…,*p*
_{
N
})^{
T
}, \(\dot {P}={{({{\dot {p}}_{1}},{{\dot {p}}_{2}},\ldots,{{\dot {p}}_{N}})}^{T}}\) and *C*=(*c*
_{1},*c*
_{2},…,*c*
_{
N
})^{
T
}. The matrix *U*=*diag*(*u*
_{1},*u*
_{2},…,*u*
_{
N
}) is the diagonal matrix generated by (*u*
_{1},*u*
_{2},…,*u*
_{
N
}).

where \({G}_{i}={{u}_{i}}{\sum \nolimits }_{j\in V}{{{e}_{ij}}{{p}_{j}}}+{{c}_{i}}\).

Notice that Eq. 14 is a hybrid system composed of two parts: a series of continuous linear differential dynamic systems in the respective domain space and a switch mechanism between differential dynamic systems when dynamic touch the boundary. Generally, it is hard to obtain a complete conclusion by analyzing dynamics of a general hybrid system, even though the differential system is linear. But we can still find some convergence and non-convergence conditions under certain instances(i.e., Eq. 14).

### Non-convergence condition of the multiagent learning framework

According to the above definition, we have the following general result under which non-convergence is guaranteed.

###
**Theorem 1**

*N*agent, two-action, integrated general sum game, every player follows the constrained dynamics of the strategy we defined in Eq. 14. If the following two conditions are met,

- 1.
There exists a point \(P^{*}=\left (p_{1}^{*},p_{2}^{*},\ldots,p_{N}^{*}\right)\in \left (0,1\right)^{N}\), that

*U**E**P*^{∗}+*C*=0, - 2.
There exists a pair of pure imaginary eigenvalues of matrix

*UE*,

then there exists a set \(\mathbb {P}\subset {{[0,1]}^{N}}\), that the solution of the initial value problem of Eq. 14 with \(P(0)\in \mathbb {P}\) can not converge.

###
*Proof*

*P*with

*P*=

*X*+

*P*

^{∗}, where

*U*

*E*

*P*

^{∗}+

*C*=0, we get

*UE*is an

*N*×

*N*matrix, then there is a invertible matrix

*T*=(

*v*

_{1},…,

*v*

_{ N }) that can transform

*UE*into

*J*,

*J*

_{ i }is a square matrix and its form is one of the following two,

*β*≠0. Here,

*J*is the Jordan normal form of matrix

*UE*.

*J*

_{ i }is the Jordan block corresponding to

*λ*

_{ i }, which is a repeated eigenvalue of

*UE*with multiplicity

*n*

_{ i }. If eigenvalue

*λ*

_{ i }is a real number, then

*J*

_{ i }is in the form (1), else

*J*

_{ i }is in the form (2). Suppose that

*λ*

_{1},…,

*λ*

_{ k }are matrix

*UE*’s real eigenvalues, and

*λ*

_{ k+1},…,

*λ*

_{ m }is matrix

*UE*’s complex eigenvalues, then we have

*n*

_{1}+…+

*n*

_{ k }+2(

*n*

_{ k+1}+…

*n*

_{ m })=

*N*.

*X*(0) will be

*Y*(

*t*)=

*T*

^{−1}

*X*(

*t*), we have

Suppose that *λ*
_{
k
}=*β*
*i* is a pure imaginary eigenvalue of *UE* with multiplicity *n*
_{
k
}, so \({{\bar {\lambda }}_{k}}=-\beta {i}\) is an eigenvalue of *UE* with multiplicity *n*
_{
k
}. Then *J* has a block *J*
_{
k
}, \({{J}_{k}}={{\left [ \begin {array}{cccc} {{D}_{2}} & {{I}_{2}} & \cdots & {} \\ {} & {{D}_{2}} & {{I}_{2}} & {} \\ \vdots & {} & \ddots & \vdots \\ {} & {} & \cdots & {{D}_{2}} \\ \end {array} \right ]}}\), where \({{D}_{2}}=\left [ \begin {array}{cc} 0 & \beta \\ -\beta & 0 \\ \end {array} \right ]\).

*Y*(

*t*) as follows.

*y*

_{ i }(0)≠0∨

*y*

_{ i+1}(0)≠0, then Eq. 14 has a periodic solution. Let

*v*

_{ i }and

*v*

_{ i+1}to denote eigenvector of

*T*=(

*v*

_{1},…,

*v*

_{ N }) corresponding to

*λ*

_{ k }and \(\bar {\lambda }_{k}\), respectively. Note that

*X*(

*t*)=

*T*

*Y*(

*t*), then the solution of Eq. 13 with the initial value

*P*(0)∈

*S*is cyclical, where

*P*

^{∗}∈(0,1)

^{ N }, there must exists a

*ε*>0 for the deleted neighborhood \(\mathbb {B}({{P}^{*}};\varepsilon)\subset \left (0,1\right)^{N}\) of

*P*

^{∗},

Let \(\mathbb {P}\) denote \(S\bigcap \mathbb {B}({{P}^{*}};\varepsilon)\), the solution of the Eq. 14 with any initial value belongs to \(\mathbb {P}\) is cyclical, which means the algorithm corresponding to the Eq. 14 can not converge. □

Theorem 1 shows that there exist some situations in which the agents fail to converge under the multiagent social learning framework. Before giving the details of those situations, we need to introduce the following notations first.

According to the theorem 1, *T* is the transformation matrix for *T*
^{−1}
*U*
*E*
*T*=*J*, *T*=(*v*
_{1},*v*
_{2},…,*v*
_{
N
}). Let \(\phantom {\dot {i}\!}v_{j1},v_{j2},\ldots,v_{{jn}_{j}}\) denote eigenvectors associated to eigenvalue *λ*
_{
j
}, *j*=1,2,…,*m*. According to properties of the matrix transformations [27], \(\phantom {\dot {i}\!}v_{j1},v_{j2},\ldots,v_{{jn}_{j}}\) are linearly independent. Classify column vectors of the transformation matrix *T* into three parts corresponding to *λ*, *V*
_{1}={*v*
_{
i
}|*R*
*e*(*λ*
_{
i
})<0}, *V*
_{2}={*v*
_{
i
}|*R*
*e*(*λ*
_{
i
})=0} and *V*
_{3}={*v*
_{
i
}|*R*
*e*(*λ*
_{
i
})>0}. Now we are ready to give the precise description of the subspace where the agents fail to converge, which is summarized in the following theorem.

###
**Theorem 2**

*λ*

_{ k }=

*β*

*i*, \(\overline {{\lambda _{\mathrm {k}}}} = - \beta i\) are a pair of pure imaginary eigenvalues of

*UE*, then there exists a pair of vectors

*v*

_{ k },

*v*

*k*′∈

*V*

_{2},

*ε*>0, and a set \(\mathbb {P}=\mathbb {S}\cap \mathbb {B}({{P}^{*}};\varepsilon)\), where

###
*Proof*

*λ*

_{ i }associated to vector

*v*

_{ i }∈

*V*

_{1}, there are

*R*

*e*(

*λ*

_{ i })<0. According to conclusions of bifurcation theory [25], the subspace

*span*(

*V*

_{1}) is a stable submanifold of the unconstrained dynamics (13), which means every trajectory start from

*S*

^{′}will eventually convergence to

*P*

^{∗}, where

Then trajectories start from \(\mathbb {S}\) will eventually convergence to *S*, thus we got the final conclusion that the solution of the initial value problem of the Eq. 14 with \(P(0)\in \mathbb {P}\) can’t convergence. □

Note that Theorem 1 and 2 are just sufficient conditions of non-convergence.

### Convergence condition of the multiagent learning framework

In most cases, the conditions that guarantee the convergence of a algorithm are more valuable.

###
**Theorem 3**

*N*agent, two-action, integrated general sum game, every player follows the constrained dynamics of the strategy we defined in Eq. 14. If the following two conditions are met,

- 1.
There exists a point \(P^{*}=\left (p_{1}^{*},p_{2}^{*},\ldots,p_{N}^{*}\right)\in \left (0,1\right)^{N}\), that

*U**E**P*^{∗}+*C*=0, - 2.
All of the eigenvalues of matrix

*UE*has negative real part,

then all the solutions of the initial value problem of Eq. 14 with *P*(0)∈[0,1]^{
N
} will converge eventually.

###
*Proof*

The conclusion is obvious. It is known that the construction of the linear dynamic system is stable. If all eigenvalues of matrix *UE* have negative real part, then point *P* is a stable equilibrium point. It means that all the solutions of the initial value problem of the Eq. 14 with *P*(0)∈[0,1]^{
N
} will converge to *P*. □

Theorem 3 proposes a sufficient condition to identify the convergence of dynamic in Eq. 14. We know that it is hard to calculate eigenvalues of a matrix with high dimensional. Here, we propose a more realistic convergence condition which is suitable for multiagent learning framework shown in Algorithm 3.

###
**Theorem 4**

In an *N* agent, two-action, integrated general sum game, every player follows the constrained dynamics of the strategy we defined in Eq. 14. If matrix *UE* is symmetrical, then all the solution of the initial value problem of Eq. 14 with *P*(0)∈[0,1]^{
N
} will converge eventually.

###
*Proof*

*UE*are real:

- 1.
There exists a point \(P^{*}=\left (p_{1}^{*},p_{2}^{*},\ldots,p_{N}^{*}\right)\in \left (0,1\right)^{N}\), that

*U**E**P*^{∗}+*C*=0. - 2.
There are no such a point, that

*U**E**P*^{∗}+*C*=0.

For case 1), if all eigenvalues of matrix *UE* are negative number, then point *P* is a stable equilibrium points; otherwise, all the solutions of the initial value problem of the hybrid system with *P*(0)∈[0,1]^{
N
} will move away from *P* toward boundary of the hybrid system [25]. Because the domain of hybrid system represented by 14 has boundary(i.e., *P*(*t*)∈[0,1]^{
N
}), then there must exists a point *P*
^{′}=(*p*1′,…,*p*
*N*′)^{
T
} in the boundary of the domain, where \(({{p^{\prime }}_{i}}=0\wedge {{G}_{i}}\le 0)\vee ({{p^{\prime }}_{i}}=1\wedge {{G}_{i}}\ge 0)\) for all *i*∈*V*. The dynamic *P*(*t*) will converge to *P*
^{′} eventually.

Similarly, we can find a point \(P^{\prime }=\left (p^{\prime }_{1},\ldots,p^{\prime }_{N}\right)^{T}\) in the boundary of the hybrid system domain in case 2) and the dynamic *P*(*t*) will converge to *P*
^{′} eventually. The theorem must hold. □

Based on conclusions of Subsections Non-convergence condition of the multiagent learning framework and Convergence condition of the multiagent learning framework, we can determine the learning dynamics of any cases we defined in Eqs. 14 and 13. However, the computational complexity may be prohibitive when the model size becomes too large. In the next section, we consider a special case under an interesting network structure which can be analyzed with relatively light computational complexity for any network size.

### The simplest case whose underlying topology is a ring

*E*is

where *G*
_{
i
}=*u*
_{
i
}
*p*
_{
i+1}+*c*
_{
i
}, *i*={1,2,…,*N*−1}, and *G*
_{
N
}=*u*
_{
N
}
*p*
_{1}+*c*
_{
N
}. Through analyzing the dynamics of this model, we have the following conclusion.

###
**Theorem 5**

In an N-player, two-action, integrated general-sum game, every agent follows the constrained dynamics of the model in Eq. 15. If one of the agents converges to a strategy, then every agent will converges eventually.

###
*Proof*

Suppose agent *k* converges at some time, according to the definition, its strategy *p*
_{
k
} will be a constant. In Eq. 15, we have *G*
_{
k−1}=*u*
_{
k−1}
*p*
_{
k
}+*c*
_{
k−1} be a constant, which means convergence of player *k* implies convergence of player *k*−1. By induction, every agent will converge eventually. □

According to the above theorem, we can easily obtain the following proposition.

###
**Proposition 1**

In Eq. 15, if there exists a dominant strategy for some players, then their strategies will asymptotically converge to a Nash equilibrium.

According to the above conclusion, finally we present the following unconvergence result.

###
**Theorem 6**

*N*agent, two-action, integrated general sum game, every player follows the constrained dynamics of the strategy we defined in Eq. 15. If every player has no dominant strategy, and met one of the following conditions,

- 1.
*N*=4*k*, \(k\in \mathbb {N}\) and \(\prod _{i=1}^{N}u_{i}>0\). - 2.
*N*=4*k*+2, \(k\in \mathbb {N}\) and \(\prod _{i=1}^{N}u_{i}<0\).

then there exists a set \(\mathbb {P}\subset \left [0,1\right ]^{N}\) that the solution of the initial value problem of the Eq. 15 with \(P(0)\in \mathbb {P}\) can’t converge.

###
*Proof*

*i*is

Since every agent has no dominant strategy, we have \(\left (r_{i}^{11}-r_{i}^{21}\right)\left (r_{i}^{12}-r_{i}^{22}\right)<0\).

*u*

_{ i }

*c*

_{ i }<0, and

*P*

^{∗}∈(0,1)

^{ N }and

*U*

*E*

*P*+

*C*=0. Considering the Eq. 15, by calculating the eigenvalue of matrix

*UE*, we have

If *N*=4*k*, \(k\in \mathbb {N}\) and \(\prod _{i=1}^{N}u_{i}>0\), then matrix *UE* has a pair of pure imaginary eigenvalue. Otherwise, if *N*=4*k*+2, \(k\in \mathbb {N}\) and \(\prod _{i=1}^{N}u_{i}<0\), then matrix *UE* has a pair of pure imaginary eigenvalue. According to Theorem 1, there exists a set \(\mathbb {P}\subset \left [0,1\right ]^{N}\) that the solution of the initial value problem of Eq. 15 with \(P(0)\in \mathbb {P}\) can not convergence. □

## Experimental simulation

In this section, we compare the empirical dynamics of the multiagent social learning framework composed by *PHC* learners with theoretical prediction of our hybrid dynamic model. We perform two experiments that satisfy the Theorem 1 and 4, respectively.

### A non-convergence multiagent Game

*R*

_{ i },

*i*∈{1,2,3,4} is the payoff matrix of agent

*i*, and element

*e*

_{ ij }of matrix

*E*is the probability that player

*i*selects player

*j*in each interaction. In this game, we have

*u*

_{1}=

*u*

_{3}=2,

*u*

_{2}=

*u*

_{4}=−2,

*c*

_{1}=

*c*

_{3}=−1, and

*c*

_{2}=

*c*

_{4}=1. Then the unconstrained dynamic model of this game is \(\dot {P}=UEP+C\), where

This game has a *P*
^{∗}=(1/2,1/2,1/2,1/2)^{
T
}∈(0,1)^{4}, which satisfies *U*
*E*
*P*
^{∗}+*C*=0. Matrix *UE* has a pair of pure imaginary eigenvalues which is *λ*
_{1}=2*i* and *λ*
_{1}=2*i*. The eigenvectors are *v*
_{1}=(0,1/2,0,1/2)^{
T
} and *v*
_{2}=(1/2,0,1/2,0)^{
T
} corresponding to *λ*
_{1} and *λ*
_{2}. Let *P*(0)=*P*
^{∗}+*k*
_{1}
*v*
_{1}+*k*
_{2}
*v*
_{2}. As long as *k*
_{1} and *k*
_{2} are sufficiently small, according to Theorem 1, the solution of the initial value problem of game 1 with *P*(0) can’t converge.

*P*(0) is plotted, where

*k*

_{1}=

*k*

_{2}=0.1. Each of the four lines in Fig. 1 shows the strategy’s dynamic changing of each agent, respectively. We can see that the strategies of those agents do not converge. Obviously, the simulation results are consistent with the theoretical prediction.

### A convergence multi-agent Game

*R*

_{ i },

*i*∈{1,2,3,4} is the payoff matrix of agent

*i*, and element

*e*

_{ ij }of matrix

*E*is the probability that player

*i*selects player

*j*in each interaction. In this game, we have

*u*

_{ i }=2 and

*c*

_{ i }=−1,

*i*∈{1,2,3,4}. Then the unconstrained dynamic model of this game is \(\dot {P}=UEP+C\), where

Because matrix *UE* is symmetrical, according to Theorem 4, the solution of the initial value problem of this game with any *P*(0)∈[0,1]^{4} will converge eventually.

*P*(0)=(1/2,1/2,1/2,1/2)

^{ T }. Each of the four lines in Fig. 2 shows the strategy’s dynamic changing of each agent, respectively. We can see that the strategies of those agents converge eventually, which are consistent with the theoretical prediction.

## Conclusion

In this work, we proposed a multiagent social learning framework to model the behavior of agent in biologic environment, and theoretically analyzed the dynamics of multiagent social learning framework using non-linear dynamic theories. We present some sufficient conditions about convergence or non-convergence and prove them by the theoretically analysis. It can be used to predict the convergence of the system. Experimental results show that the predictions of our dynamic model are consistent with the simulation results.

As future work, more extensive study of the dynamics of multiagent social learning framework with *PHC* learners is needed. Other worthwhile directions include to improve the *PHC* algorithm, to develop more realistic multiagent social learning framework to model the realistic interactions among cells in biologic environments, and to achieve better convergence performance based on our theoretical findings.

## Notes

## Declarations

### Acknowledgements

We thank the reviewers’ valuable comments for improving the quality of this work.

### Funding

This work has partially been sponsored by the National Science Foundation of China (No. 61572349,61572355).

### Availability of data and materials

All data generated or analysed during this study are included in this published article.

### About this supplement

This article has been published as part of Journal of Biomedical Semantics Volume 8 Supplement 1, 2017: Selected articles from the Biological Ontologies and Knowledge bases workshop. The full contents of the supplement are available online at https://jbiomedsem.biomedcentral.com/articles/supplements/volume-8-supplement-1.

### Authors’ contributions

CZ contributed to the algorithm design and theoretical analysis. SL had a main role in the editing of the manuscript. XL and ZF contributed equally to the the quality control and document reviewing. All authors read and approved the final manuscript.

### Ethics approval and consent to participate

Not applicable.

### Consent for publication

Not applicable.

### Competing interests

The authors declare that they have no competing interests.

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

## Authors’ Affiliations

## References

- Kang S, Kahan S, Mcdermott J, Flann N, Shmulevich I. Biocellion: accelerating computer simulation of multicellular biological system models. Bioinformatics. 2014; 30(21):3101–8.View ArticleGoogle Scholar
- Peng J, Bai K, Shang X, Wang G, Xue H, Jin S, Cheng L, Wang Y, Chen J. Predicting disease-related genes using integrated biomedical networks. BMC Genomics. 2017; 18(1):1043.View ArticleGoogle Scholar
- Malanchi I, Santamaria-Martínez A, Susanto E, Peng H, Lehr HA, Delaloye JF, Huelsken J. Interactions between cancer stem cells and their niche govern metastatic colonization. Nature. 2012; 481(7379):85–9.View ArticleGoogle Scholar
- Buehler MJ, Ballarini R. Materiomics: Multiscale Mechanics of Biological Materials and Structures. Vienna: Springer Vienna; 2013.View ArticleGoogle Scholar
- Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: A survey. J Artif Intell Res. 1996; 4(1):237–85.Google Scholar
- Hao J, Huang D, Cai Y, Leung H-f. The dynamics of reinforcement social learning in networked cooperative multiagent systems. Eng Appl Artif Intell. 2017; 58:111–22.View ArticleGoogle Scholar
- Hao J, Leung HF, Ming Z. Multiagent reinforcement social learning toward coordination in cooperative multiagent systems. Acm Trans on Autonomous and Adaptive Systems. 2014; 9(4):374–8.Google Scholar
- Hao J, Leung HF. The Dynamics of Reinforcement Social Learning in Cooperative Multiagent Systems. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence. Beijing: AAAI Press: 2013. p. 184–90.Google Scholar
- Busoniu L, Babuska R, De Schutter B. A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern C. 2008; 38(2):156–72.View ArticleGoogle Scholar
- Peng J, Li H, Liu Y, Juan L, Jiang Q, Wang Y, Chen J. Intego2: a web tool for measuring and visualizing gene semantic similarities using gene ontology. BMC Genomics. 2016; 17(5):530.View ArticleGoogle Scholar
- Peng J, Wang T, Wang J, Wang Y, Chen J. Extending gene ontology with gene association networks. Bioinformatics. 2016; 32(8):1185–94.View ArticleGoogle Scholar
- Torii M. Detecting concept mentions in biomedical text using hidden markov model: multiple concept types at once or one at a time?J Biomed Semant. 2014; 5(1):3–3.View ArticleGoogle Scholar
- Anderson ARA, Chaplain MAJ. Cheminform abstract: Continuous and discrete mathematical models of tumor-induced angiogenesis. ChemInform. 1999; 30(9):857–9943.MATHGoogle Scholar
- Xavier JB, Martinezgarcia E, Foster KR. Social evolution of spatial patterns in bacterial biofilms: when conflict drives disorder. Am Nat. 2009; 174(1):1–12.View ArticleGoogle Scholar
- Ferrer J, Prats C, López D. Individual-based modelling: An essential tool for microbiology. J Biol Phys. 2008; 34(1):19–37.View ArticleGoogle Scholar
- Jeannin-Girardon A, Ballet P, Rodin V. An Efficient Biomechanical Cell Model to Simulate Large Multi-cellular Tissue Morphogenesis: Application to Cell Sorting Simulation on GPU. In: Theory and Practice of Natural Computing: Second International Conference, TPNC 2013, Cáceres, Spain, December 3-5, 2013, Proceedings. Berlin: Springer Berlin Heidelberg: 2013. p. 96–107.Google Scholar
- Matignon L, Laurent GJ, Fort-Piat NL. Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. Knowl Eng Rev. 2012; 27(1):1–31.View ArticleGoogle Scholar
- Bloembergen D, Tuyls K, Hennes D, Kaisers M. Evolutionary dynamics of multi-agent learning: a survey. J Artif Intell Res. 2015; 53(1):659–97.MathSciNetMATHGoogle Scholar
- Li J, Qiu M, Ming Z, Quan G, Qin X, Gu Z. Online optimization for scheduling preemptable tasks on iaas cloud systems. J Parallel Distrib Comput. 2012; 72(5):666–77.View ArticleGoogle Scholar
- Abdallah S, Lesser V. A multiagent reinforcement learning algorithm with non-linear dynamics. J Artif Intell Res. 2008; 33(1):521–49.MathSciNetMATHGoogle Scholar
- Zhang C, Lesser VR. Multi-agent learning with policy prediction. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence. Atlanta: AAAI Press: 2010.Google Scholar
- Chakraborty D, Stone P. Multiagent learning in the presence of memory-bounded agents. Auton Agent Multi-Agent Syst. 2014; 28(2):182–213.View ArticleGoogle Scholar
- Song S, Hao J, Liu Y, Sun J, Leung H-F, Zhang J. Improved EGT-Based robustness analysis of negotiation strategies in multiagent systems via model checking. IEEE Transactions on Human-Machine Systems. 2016; 46(2):197–208.View ArticleGoogle Scholar
- Bowling M, Veloso M. Multiagent learning using a variable learning rate. Artif Intell. 2002; 136(2):215–50.MathSciNetView ArticleMATHGoogle Scholar
- Shilnikov LP, Shilnikov AL, Turaev DV, Chua LO. Methods of Qualitative Theory in Nonlinear Dynamics. Singapore: World Scientific; 1998.View ArticleMATHGoogle Scholar
- Singh SP, Kearns MJ, Mansour Y. Nash convergence of gradient dynamics in general-sum games. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. San Francisco: Morgan Kaufmann Publishers Inc.: 2000. p. 541–8.Google Scholar
- Olshevsky V, Tyrtyshnikov EE. Matrix methods: theory, algorithms and applications: dedicated to the memory of Gene Golub. Hackensack: World Scientific; 2010. p. 604.View ArticleMATHGoogle Scholar