When we talk about query optimization, we mean improving response times in a relational database management system, because optimization is the process of modifying a system to improve its efficiency or also the use of available resources.
In relational databases, the SQL query language is the most commonly used by developers and developers to obtain information from the database. The complexity that some queries can achieve can be such that the design of a query can take considerable time, not always getting an optimal response.
Cost-based optimizers assign a cost (which attempts to estimate the cost of the query in terms of required input-output operations, CPU requirements, and other factors) to each of those plans, and choose the one with the lowest cost. The set of execution plans is formed by examining possible access paths (using indexes or sequentials), join algorithms (sort-merge join, hash join, nested loops). The optimizer cannot be accessed directly by users, but, once the queries are sent to the server, they first go through the scanner and just then reach the optimizer.
Most optimizers present execution plans as a plan node tree. A plan node encapsulates a simple operation in query execution. Intermediate results flow from the leaves of the tree to the root. The children of a node represent the operations whose outputs are the input of the parent node. For example, a join node will have two sons, representing the two operands of the join. Tree leaves represent operations that produce results by searching on disk, for example, by performing an indexed search or a sequential search.
The efficiency of an execution plan is largely determined by the order in which the tables are operated. For example, by joining a small table with much larger ones, it will take longer if you operate the large tables first and then the small table. Most optimizers determine the order of join using a dynamic programming algorithm powered by IBM’s System R database project project, which works with stages such as sequential search.
A tuple in a relationship or table corresponds to a row in that table. Tuples are commonly messy since mathematically a relationship is defined as a set and not as a list. There are no duplicate tuples in a relationship or table given the fact that a relationship is a set and sets by definition do not allow duplicate elements. An important corollary at this point is that the primary key always exists given the uniqueness condition of the tuples, therefore at least the combination of all the attributes of a table can serve for the conformation of the primary key, however usually it is not necessary to include all the attributes, commonly some minimum combinations are sufficient.
Where does optimization affect?
- The cost of communication of access to secondary storage.
- The cost of storage.
- The cost of computing.
- The optimizer is also involved in updates and deletions.
- The optimization process