Since I have been doing more training of beginning users of the Denodo, there have been a lot of questions around performance best practices and optimization. This article is a quick summary of some of the high points of the Donodo documentation, which are typically useful.
However, I would like to point out that the performance of Denodo views is:
- usually, an ongoing process as your environment evolves and that your code changes
- also, the performance of Denodo views may involve elements be on the Denodo framework itself, such as source system databases
- and may require some administration configuration and reengineering to achieve your full benefits in terms of establishing environment sizing, data movement databases, use of bulk load processes, and maintenance processes (E. G., Scheduled index maintenance, scheduled statistics maintenance)
- furthermore, good general SQL and coding practices have a great deal to do with performance unrelated to the denotative toolset.
Avoid ‘Create View From Query’
Using ‘Create view from Query’ to create base views bypasses the denodo optimization engine and pushes directly to the data source as written.
Make Sure Primary Keys (PK) And Unique Indexes Have Been Set
Accurately setting the primary key (PK) on views (especially, base views): aides:
- The static optimization phase, primary keys, and unique indexes enable Join pruning and Aggregation push-down when appropriate
- The Primary Key is presented to consuming applications and RESTFUL web services
- Allow browsing across the associations of views with Data Catalog.
Mirror Source System Indexes
Aiding source database indexes to denodo base views aids the denodo optimizer to make appropriate decisions. However, avoid adding indexes on the table which do not exist in the source database. This will cause the optimizer to make incorrect execution plans and will undercut performance.
Note: Primary Keys (PK) are enforced by denodo, only used to enable optimization and application capabilities.
Apply Foreign Key (FK) And Referential Constraint Associations
An association represents a foreign key relation. However, when a referential constraint is applied to an association, every row of the ‘Dependent’ view has a matching value in the ‘Principal’ view, which meets the Condition mapping.
Adding Indexes To Cache Views
Adding Primary Keys and Unique indexes to cached tables also aids the optimizer and, if properly maintained, aids normal database operation when querying cache tables.
Gather and Maintain View Statistics
View statistics play an essential role, helping the optimizer make decisions about execution plans and data movement. Statistics are most important for base views and cached, especially the total rows, average size, and distinct view values.
Caching Derived Views
Caching large, long-running, complex views can improve performance and limit source system impacts and Denodo if the cache guidelines are followed. However, to optimize efficient cached views should have Primary Keys, Unique Indexes, performance indexes, and Statistics. See caching Guidelines for additional detail.
Use Effective Joins
Effective joins are essential to performant view. Here are some high-level tips to keep in mind when building joins:
- When possible, use simple join condition
- Join on primary keys or unique indexes
- Leverage Foreign Key (FK) and Primary Keys (PK). Especially when an association referential constraint is defined
- Use Inner joins when Possible
- When using outer joins, Organize joins by data source when using multiple data sources
- Avoid using view parameters and subqueries on the join condition
Use A Building Block Approach
Breaking views into discreet units allows the optimizer more opportuning to optimize SQL’s and performance. Here are a few tips for using the building block approach:
- Create views for different entities (Fact, Dimension, or Subject set)
- Build views for discreet and/or distinct data subsets
- Use SQL tuning rules to arrive at the smallest result set as soon as possible
- Tune each view individually
Let Denodo Determine Optimal Data Movement
Where possible, avoid manually assigning data movement strategy. Letting Denodo determine the optimal data movement strategy (assume other view optimizations have been applied) provides the greatest flexibility as the data changes across time. When precursor views are updated and/or tuned and Prevents errors due to data movement strategy conflicts.
Denodo Reference Links
Best Practices to Maximize Performance I: Modeling Big Data and Analytics Use Cases
Best Practices to Maximize Performance II: Configuring the Query Optimizer
Best Practices to Maximize Performance III: Caching
Best Practices to Maximize Performance IV: Detecting Bottlenecks in a Query