Hello! I’m Ilya Usov, a tech lead from the SunkeyToolkit service team for remote testing of mobile applications. In this article, I’ll talk about how we tried to investigate some of the probabilistic characteristics associated with the hardware loads of a mobile device farm.
We will briefly talk about the first service in the Russian Federation for remote testing of mobile applications MTS SunkeyToolkit told here. The base of the service is a farm of more than 300 mobile devices, it is a set of machines with different operating systems: MacOS and Ubuntu for devices with IOS and Android OS, respectively, to which devices are connected through hubs.
In reality, it looks like this:
We have developed a very extensive functionality:

mechanism of reservations (reservations) of devices;

mobile phone screen streaming;

interaction with the phone (swipe, tap, long tap, etc.) directly from the interface of our application and from anywhere in the country;

realtime log collection;

change of orientation;

entering text (including Russian) from the user’s physical keyboard;

SMS collection service and much more.
In addition, one user can launch up to 4 devices at the same time in different tabs.
We will tell you more about this functionality in the next articles, but now we will talk about the approach to predicting the average number of orders in the system and some characteristics related to the load on farm machines. To do this, we will try to put into practice the latest research in the field of queuing systems and heterogeneous Markov chains with continuous time.
Be careful, there’s the math next!
We take the following recent studies as a basis:
Concepts
Queuing system (CMS) is a system that serves the requirements that come into it. Maintenance of requirements in the CMO is carried out by service devices (servers/machines). A classic CMO contains from one to an infinite number of devices. Such systems can be studied using the theory of differential calculus and Markov chains.
Markov chain is a sequence of random events with a finite or countable number of outcomes, where the probability of the occurrence of each event depends only on the state reached in the previous event.
Application (requirement) is a service object in the SMO for which a typical maintenance task is being solved.
Queue is the sequence of requests awaiting service.
A stream of events is said to be random if the occurrence of events occurs at random points in time. For example, the flow of calls to the PBX, the flow of customers in the supermarket, the flow of server failures, and the like.
Stream It is characterized by the intensity λ — the frequency of occurrence of events, that is, the average number of events that occur per unit of time, and μ is the intensity of the service of the requirement.
Also, we will use all the basic designations and terms from “Markov chains and models with continuous time”, Zeifman A. I.
Description of the approach
In order to calculate the probability of an empty queue and the expected value of the average number of requests on the farm nodes, let’s consider the simplest situation (without taking into account machine breakdowns and their recovery) and take it as a basis Model M/M/2. For simplicity’s sake, let’s assume that we have only 2 nodes. The study of the characteristics of such a system is reduced to the solution of the Kolmogorov direct system (a system of ordinary differential equations) of the form:
where A is the transposed intensity matrix (the dependence on t is not shown for the sake of brevity):
And the main task is to determine the intensity of the receipt of requirements into the system and maintenance. To do this, we need to analyze the frequency of events, that is, the average number of events that occur per unit of time and service, respectively. Obviously, we need to consider these intensities as timedependent, since customers use the service unevenly during the day, the main load on our equipment is during working hours, and the number of tests or streams launched at night is noticeably smaller. After a detailed study of the statistics, we were able to determine the intensity of the receipt of claims and services in 8 working hours:
Now we will use the fourthorder RungeKurt method to solve the Kolmogorov direct system, with the help of the Python language, libraries: numpy, scipy, Matplotlib. We will also plot the average number of orders in the system (waiting value) and the probability of an empty queue in this queuing system. These graphs will clearly show the behavior of these system characteristics over time.
So, we’ve written a fairly simple algorithm to get a solution for this system:
def calc_solve(start_vector, t_list, function):
return solve_ivp(function, t_list, np.ravel(start_vector), method='RK45', atol=1e12, rtol=1e12)
"""Получаем решения с заданными нач. условиями (start_vector)"""
t_period = [0, t_end + 1]
start_vector = get_start_vector()
solution = calc_solve(start_vector, t_period, dot_function)
"""Считаем мат. ожидание"""
mean = calc_mean(solution.y)
Once we have the solution, we can plot the probability of an empty queue p_0
# For p_0
{
‘num’: 1,
‘y_label’: ‘Вероятность’,
‘x_label’: ‘t’,
‘lines’: [
{
‘t’: solution.t[:t_end_index],
‘sol’: np.transpose(solution.y[0, :t_end_index]),
‘label’: ‘X(0) = 50’,
},
]
},
# For Mean
{
‘num’: 3,
‘y_label’: ‘Среднее’,
‘x_label’: ‘t’,
‘lines’: [
{
‘t’: solution.t,
‘sol’: mean,
‘label’: ‘X(0) = 50′,
},
]
},
]
def build_charts(plots, legend=’upper right’):
for graph in plots:
plt.figure(graph[‘num’])
fig, ax = plt.subplots()
for pl in graph[‘lines’]:
ax.plot(pl[‘t’], pl[‘sol’], label=pl[‘label’])
ax.legend(loc=legend)
plt.xlabel(graph[‘x_label’])
plt.ylabel(graph[‘y_label’])
plt.grid(True)
plt.show()
Thus, we were able to plot the following graphs:1. the probability of an empty queue;
2. Average number of applications in the system.
These graphs allow you to see how the probability of an empty queue changes over time and the behavior of the average over 8 business hours.
Conclusion
We have described one of the approaches to studying the characteristics of the system associated with loads on equipment within the framework of the stochastic process, when events occur at a random point in time. It is worth noting that this approach is not applicable to all systems and products in practice. Probably, it is possible to take more complex models, in which, for example, breakdowns and car repairs are already taken into account (on this topic, we recommend that you familiarize yourself with Prendeville’s model in Ergodicity Bounds and Limiting Characteristics for a Modified Prendiville Model, Ilya Usov , Yacov Satin, Alexander Zeifman and Victor Korolev).
Also, with a more detailed study of the statistics of requirements receipts and their services based on logs in large and highload systems, it is possible to predict the load on individual system components and investigate other limit characteristics of the system. For example, the rate of convergence to the limit regime.
In general, queuing theory has many applications in practice in real life, for example, in modeling assemblytoorder systems (Irvani, S.M.R., Luangkesorn, K.L., SimchiLevi, D. 2010. On assemble to order systems with flexible customers. IIE Trans), production lines (Fadiloglu, M.M., Yeralan, S. 2002. Models of production lines as quasibirthdeath processes. Math. Comput. Model), wireless networks (Kim, Y.Y., Li, S. 1999. Performance evaluation of packet data services over cellular voice networks. Wirel. Netw. 5:211–219) and much more.
P.S. We are well aware that this article does not contain a full description of all terms and designations, and it is likely that it will be difficult for an unprepared reader to immerse himself in the context. Therefore, we have prepared a list of references for everyone who is interested in this topic:
1. https://elar.urfu.ru/bitstream/10995/117140/1/9785799635398_2022.pdf
2. Jain, M., Maheshwari, S. 2004. Npolicy for a machine repair systemwith spares and reneging. Appl. Math. Model. 28:513–531.
3. Jain, M., Ghimire, R.P. 1997. Machine repair queueing system with nonreliable server and heterogeneous services discipline. J. MACT. 30:105–115.
4. Wang, K. 1994. Profit analysis of the 𝑀/𝑀/𝑅 machine repair problem with spares and server breakdowns. J. Oper. Res. Soc. 45:539–548.
———
Acknowledgement and Usage Notice
The editorial team at TechBurst Magazine acknowledges the invaluable contribution of Илья Усов the author of the original article that forms the foundation of our publication. We sincerely appreciate the author’s work. All images in this publication are sourced directly from the original article, where a reference to the author’s profile is provided as well. This publication respects the author’s rights and enhances the visibility of their original work. If there are any concerns or the author wishes to discuss this matter further, we welcome an open dialogue to address potential issues and find an amicable resolution. Feel free to contact us through the ‘Contact Us’ section; the link is available in the website footer.