In those years working specifically with application performance is very common, in war rooms especially, the search for a “culprit”; and the blame often lies with the database.
Before continuing I would like to make a caveat that today, with the focus on culture DevOps, the performance should not be in the search for a “culprit”, but rather a unique team seeking opportunities to optimize the environment.
When I use the term “seeking opportunities to optimize” the meaning is to solve the problem as quickly as possible, as often trying to understand why at a certain time of day the system is slow can take much more time than it identifying opportunities to make the environment resilient and scalable. Or even when the “culprit” is found, it may be the result of a lack of proper testing, change in the behavior of environment use, etc. That is why the “culprit” may not be so guilty.
Let’s return to the focus theme of this article.
Optimizing queries is a great first step!
Before I was a manager, I was (and am) one of the database-focused performance specialists, and it has always been very common in our analysis to find that the slowness came from the database, but in many cases the database reflected a construction problem in applications. For example, commands generating compilation on each run because of the version and object mapping done on the NHibernate layer. And in this case there is no point in setting the parameter:
cursor_sharing=force
If the database is Oracle, then builds will continue to occur as the table and column aliases are changing.
In some scenarios the main opportunities for database optimization are:
- Command rewriting;
- Written in the processing logic;
- Technology
- Index Creation.
Note that in these cases it is not the database’s fault, but the way the queries and processes were written.
Identify Containment
In some analyzes we identified that there was some containment in the application layer, specifically in code snippets (logic or through components), but when evaluating the database layer there was the identification of high volume of locks and the wrong view that this was the problem, when in fact bank time was about 6 percent of the total transaction time.
Other interesting scenarios were some cases where the command was written using subqueries in the select clause, and these subqueries had a range scan execution plan on some index or table. Imagine this command returning 300 lines and running 2K times per minute, each subquery of the command will run 600K times per minute. In this situation the internal mechanism of the database itself caused a lock in its structure, a type of serialization (latch), due to the high volume of readings in the same data blocks
What I am trying to show here is that it is not generally the fault of the bank.
I consider the bank “fault” in case of any version bugs, incorrect parameterization or something. In other cases the bank only reflects a problem in system design, either by coding, outdated components, etc.
Analyze application or database?
Initiating a database layer analysis of a problem may be a good strategy to eliminate the possibility of a layer problem or even to identify a problem that is generated in the other layers but that is reflecting in the DB.
Analysis from the database layer may reflect application adjustments, programming logic, or command script adjustments, but the application layer should not be forgotten as the focus is on a scalable and resilient environment.
Often initiating analysis by the application layer can be more assertive in terms of identifying the main bottleneck, but it may take more time as more than one layer must be evaluated.
Conclusion
In short, “bank blame” is relative, as sometimes the containment identified in the DB layer can have an insignificant “weight” in the total time of a transaction. So be careful when you say that it is DB’s fault, properly monitor your environment, evaluate the weight of the problem as a whole and attack the problem if it is a quick win, a silver bullet, but most of the time The adjustment will have to be done on the application layer.
We are increasingly in a DevOps world, a whole team building and supporting a solution, no Dev or Ops, no bank or application. Nor should it be the focus just asking more and more infra to run a system that is slow, from its conception to its use it must be developed, tested and prepared to be scalable and resilient.
By Ronaldo Sales – Service Manager