Hello.
This is the second article in a series in which we look at the architecture of core OpenStack services. The first one, in which I talked about Nova and general platform building patterns, is here. Today we will dive into the details of the implementation of the DBaaS service, which is called Trove in the platform. We will look at how the main components of the service are arranged and how they interact, touch upon some features of the implementation of security mechanisms, and also briefly discuss the features of code-style.
Trove Architecture
Functionally, the service can be divided into 2 main components:
-
control-plane, which consists of conductor, task manager, and API services;
-
agent, to manage DBMS nodes.
Control-plane runs on cluster controllers; Agent: Directly on customer instances.
API, as in Nova, is a few thin layers on top of multiple WSGI services.
Conductor is a thin layer whose task is to proxy messages between the agent and Trove. The idea of the conductor service, as in the case of Nova, is to provide communication with the user agent without the latter’s direct access to the service database. However, if in the previous case the conductor is also responsible for the execution of tasks, then here the responsibility is divided and the tasks are handled by the task manager.
Task manager takes care of the entire workload. Tasks arrive via RPC, wrap themselves in an object, and run as part of this service. In this way, asynchrony is achieved.
Trove has its own RPC, built on the same principles and technologies as Nova, but completely separate from all others. I described in detail the semantics of RPC calls within OpenStack earlier.
Trove can also interact with other platform services through its own drivers. For example, with Neutron (network service) or Compute. In this way, the service can request specific resources, such as an instance to host a DBMS.
Platform Features
Managing the DBMS replica set is the main goal of the service. Trove supports enough A standard set of databases (datastore) for management, with some functions marked as work in progress.
Starting a cluster from the vanilla version of the service Available in experimental mode for only a few datastores. In general, it is suggested to use native replication mechanisms for a particular DBMS. For example, MariaDB uses GTID. Despite this, in this article I will use the term “cluster” as a synonym for “replica-set” of a DBMS.
CLO currently only works with Mysql and Postgresql. This is due to the fact that our service does not just proxy requests to OpenStack, we also extend its functionality. For example, our team has implemented the following functionality:
-
the ability to make backup copies of individual databases;
-
Monitoring the “health” of cluster nodes. A mechanism that recreates the replica set if it detects that one of the cluster nodes has become unresponsive and ensures that the cluster is restored.
We will certainly add more DBMS in the near future.
Let’s take a look at the main features that Trove provides to the user.
Launch of the datastore service
Trove operates the Nova service to host the DBMS within a virtual server. Control over the instance resources remains on the Nova side, and all further interaction with the control-plane cluster is done through the agent. On its side, Trove supports ID mapping in the Nova database. Part of the service code is helper functions that implement interaction with other services in the platform, for example, instance resize on the Compute side.
After the API validates the call parameters, the task to create the instance is started — Trove contacts Nova and waits for the launch. Trove will know that the datastore has deployed correctly when it starts receiving heartbeat messages from the agent in the conductor. Depending on the configuration of the proposed schema, you can pass the ID of the master node to the request to create an instance. In this case, the service will take a snapshot of it and deploy the slave from the snapshot.
Agent Compromise as an Attack Vector
Given that the agent is actually in the customer space, trust in them should be limited. When the DBMS is compromised, the attacker gains at least access to the mechanism of communication with the brokerIn 2014, the Netherlands
To minimize possible damage, the architecture of interaction with the agent is built as follows:
-
Starting with Victoria, Trove Uses Docker container for deploying the datastore itself. Even in the case of a successful attack on the DBMS, it will still be necessary to escalate privileges to the parent OS.
-
As I wrote above, Trove has its own message broker that is not connected to other services.
-
For each newly created instance Generated A unique key. This key is then used to encrypt messages involved in any interactions with a particular agent.
Database operation within the datastore
When creating an instance, a specially prepared image is used, the creation mechanism in this case does not differ from creating a regular VM in the context of OpenStack. After it is deployed, the agent establishes a connection with the conductor service and sends it heartbeat messages, notifying it of the current status: NEW, BUILDING, ACTIVE, etc.
The DBMS itself is managed using the manager service running on the instance. This functionality is the responsibility of the driver, the code of which is unique for each specific datastore. At this level, replication is configured; Modules are installed if the datastore allows it. DBMS configuration settings are applied.
In a situation where we have lost touch with the master, we can initiate the process of re-selecting him. All nodes in the cluster go to the EJECT status, and Trove tries to determine the replica with the most up-to-date copy of the data: it polls all slaves and sorts them by the ID of the last transaction. Once the master is defined, Trove reconnects and synchronizes all nodes to the ACTIVE state. If necessary, it is possible to “promote” a specific instance in the replica set before the master, there is a separate API method for this.
Loading Your Own Configuration
Trove allows you to configure the DBMS. To do this, a separate configuration group entity is created for each specific project and datastore. In the future, this entity can be applied to multiple instances with the same DBMS.
In general, there is no special magic here. The configuration is applied sequentially to all instances in the cluster using mechanisms implemented within the specific datastore agent: literally setting values through the CLI.
Backup & Restore
Trove uses 2 separate docker containers to perform backup tasks. They are managed by the manager service agent, who starts and stops containers if necessary.
In the vanilla version, Trove allows you to back up all clusters in a project or a single cluster itself. The stream is read from the backup process, such as tar, and uploaded to storage. This is where the architecture of the service begins to “stick sticks”. The OpenStack community is actively promoting its own implementation of object storage — Swift. Despite the fact that the implementation of the functionality of a particular repository Inherited from an abstract class, the API does not assume a different type of storage, referring exclusively on the semantics of Swift containers. This means that even if you want to implement backup storage on another system, you will have to significantly change the API. Thankfully, Ceph implements a Swift-compatible API, allowing it to be used as a backend for these tasks.
Code-style
The first thing that immediately catches your eye when you read the code of the service is a much cleaner writing style, if you will. However, this is easily explained by the fact that the service is much smaller than the Nova. However, the difference is very much felt. For example, the logic of the services here does not “flow” from one to the other.
You won’t see here Such piles of decorators, as in Nova, but can be found everywhere Solutions using the context manager.
A big surprise for me was the initialization of the parameters Some functions with mutable elements. For those who don’t know why this is bad, you can read, for example, In here. In short: theThis parameter is initialized on the first function call and can be changed, which will affect its state in subsequent calls. Well, let’s hope that developers don’t allow themselves to modify the object between calls.
Another characteristic feature is the excessive, in my opinion, exploitation of architectural patterns. Here you go factory, which is in fact just a nice name for the implementation of an abstract class obtained through the same Dynamic Loadingand Singletons via global variables (Tyk, why you shouldn’t do it)… And, of course, strategies are everywhere.
Starting with the fact that the user is asked to create a separate “strategy” for backup tasks, ending with a huge amount of use of this pattern literally on Each Layer. The raccoon with the circular from refactoring.guru is satisfied.
Conclusion
The Trove service looks pretty neat compared to the monstrous Nova. The code is easy to read, and the functionality is clear and simple. However, there are also architectural surprises. But that’s the world of Open Source – you can customize it to suit your needs.
A question for youauser: what do you think the quality of open source projects depends on and has it changed recently? Is it a problem of the competence of the developers or maintainers of the project?
———-
Acknowledgement and Usage Notice
The editorial team at TechBurst Magazine acknowledges the invaluable contribution of Харинский Алексей the author of the original article that forms the foundation of our publication. We sincerely appreciate the author’s work. All images in this publication are sourced directly from the original article, where a reference to the author’s profile is provided as well. This publication respects the author’s rights and enhances the visibility of their original work. If there are any concerns or the author wishes to discuss this matter further, we welcome an open dialogue to address potential issues and find an amicable resolution. Feel free to contact us through the ‘Contact Us’ section; the link is available in the website footer.