Tuning your JCR Queries for the AEM & Jackrabbit OAK

 Comments

Application tuning is an activity which which every developer and architect will encounter sooner or later for any application. AEM based applications are no different. One of the activities that is particularly important with AEM 6 is query tuning. AEM 6 uses Jackrabbit OAK as the repository and hence we would have to know more about OAK to tune the queries. Apart from tuning the queries itself a majority of the time you would be tuning your indexes based on the query. AEM itself uses a lot of queries within for the application to function. It is important to understand how the queries and the indexes work in AEM.

Update: This post was not written as a generic performance tuning guideline for AEM, but just on tuning the queries on OAK repository. Although a lot of the performance depends on how well tuned is your repository. Adobe's recent service packs on AEM 6, and AEM 6.1 in general are much stable and performant. Please refer to the Adobe's documentation for general guidelines on performance tuning.

While the instructions herein are primarily written for AEM applications, non-AEM applications which uses Jackrabbit OAK can also benefit.

There are no hard and fast rules that a particular index will work. This is very similar to database indexes. You would have to look at the current queries being fired and the current repository state and arrive at the best indexes. Often you arrive at the best indexes over time. You would have to apply an index and the re-evaluate your queries, then further tune your queries or try out a different index. Repeat these steps over and over again to find the best and optimized index.

However, there are some basic guidelines which should be followed, which can help you define the best indexes to you application specific needs:

Ground Rules (AEM Specific)

  1. Apply the latest Service Pack available and Oak specific hotfixes (if applicable). As of date SP-2 is publicly available.
  2. AEM has some built in indexes as well. Do not arbitrarily re-index these OOTB indexes. Indexing is a costly and resource-intensive operation. OOTB indexes should be re-indexed only when specifically advised by Adobe (this is a catch - sometimes you know that re-indexing a particular OOTB index will solve your problem at hand, but get a buy-in from Adobe for this). At other times you would have to create a new index (type) since the OOTB / existing index is in-efficient.
  3. Be aware that there has been some important changes since OAK 1.0.9 - especially related to Lucene indexes, hence be mindful about reference resources that advice about using OAK indexes. Check the OAK's latest documenting on query and indexing.

Tools

  1. If you are using AEM, the best tool to help you debug and explain your queries, find out slow and popular queries is to install Adobe AEM ACS Tools package (0.0.20 and above). Once installed, you can have access it using the URL: http://yourinstance/etc/acs-tools/explain-query.html 
  2. If the above is not possible, you may have to depend on JMX to find slow and popular queries, you can use the Felix console to identify slow and popular queries on any instance 
  3. You can also use the OSGi Service of org.apache.jackrabbit.api.jmx.QueryStatManagerMBean to programmatically find the slow and popular queries. 
    1. queryStatManagerMBean.getSlowQueries().values()
      queryStatManagerMBean.getPopularQueries().values()
  4. Use the following debug logs for details on various aspects of query and indexing:
    1. Enable DEBUG logging on  org.apache.jackrabbit.oak.query for logs to find the details on query cost calculation (explain)
    2. Enable DEBUG logging on  org.apache.jackrabbit.oak.plugins.index for logs to find the details on indexing
  5. Use JMX to have high level overview about lucene indexes (OOTB or custom):
  6. To look at the actual indexes, you can stop the AEM / Oak instance and use the oak-run "console" to explore the indexes among various other options:
    • java -jar oak-1.1.6.jar explorer /path/to/crx-quickstart/repository/segmentstore
      (oak repository)
    • On the resulting console, run the following command, to dump the index into file system:
      • lc dump path/to/dump /path/to/luceneIndex/inRepository
      • eg.
        lc dump path/to/dump /oak:index/myCustomLuceneIndex
    • Now analyse the resulting dump using Lucene Index analyser tool Luke:
java -XX:MaxPermSize=512m -classpath luke-with-deps.jar;oak-lucene-1.1.6.jar org.getopt.luke.Luke

    Guidelines

    The basic steps for tuning your queries can be summarized as follows:
    1. Identify the slow queries in your system. This could be the one that takes a lot of time to execute primarily because there are not proper indexes and there are node traversals happening. The system itself can be queries through JMX to identify the slow and popular queries.
    2.  Tune your queries to ensure that you can use indexes whenever possible.
    3. Identify properties and nodes that needs to be indexed, based on the slow queries identified, and create Oak indexes for them.
    4. Verify your queries and indexes after creating indexes. The indexes are optimum if the cost of the queries identified are lowest - 0 being the best.
    5. Repeat steps 3 and 4 above (by even creating different indexes for the same query) till the most cost effective index is created.

    Identification of Slow Queries

    Identify the slow queries by monitoring the error.log. Note the queries which traverses many nodes, where the log may look something like:
        org.apache.jackrabbit.oak.spi.query.Cursors$TraversingCursor Traversed 1000 nodes
        ... consider creating an index or changing the query 
    You can also identify the slowest and popular queries in the system (which the Jackrabbit Oak keeps track of) using JMX or Adobe AEM Commons as described in the "Tools" section above.

      Indexing

      For the same query you can create multiple indexes. However choose the one to be promoted to your production instances which prove to be the the best in terms of achieving lower cost to the query at hand.

      Consider the following when deciding the type of index to be used:
        1. Full Text Index Types
          • Lucene & Solr
          • Aggregate Index - query time aggregation
        2. Lucene Property Index
          • Since OAK-1.0.9 Lucene Property indexes are preferred indexes over other type of indexes
          • Async
          • Combines properties & full-text.
          • Supports property condition, range conditions, ordering, full-text
          • Index-time aggregation - performance improvements
          • Using Lucene & Solr.
          • Follow instructions for creating the index nodes as in the OAK's Lucene documentation.
        3. Property Index
          • Default Indexes
          • Synchronous
          • Unique or Non-Unique
          • eg. UUID
          • property equals query works best
        4. NodeType Index
          • Internally Uses Property Index
          • Uses primaryType and mixins to identify nodes
        5. Traversing Index
          • Traverses Repo
          • Does not store any data
          • Used to retrieve the given path or its child nodes (sometimes the parent nodes as well)
        6. Ordered Index
          • Async
          • Not recommended of nodes > 1000 nodes

      Index Management

      1. Currently, the indexes can be found under /oak:index/ on AEM.
      2. Once you have identified the best indexes, plan to install them as a package on AEM.
      3. Re-indexing is usually not needed. However when you see degrade in system performance, do double check your existing indexes if it would need a re-index.
      4. Re-index is a costly and resource intensive operation. Use it only when any oak upgrade or  mentioned explicitly in release doc for any patch or hotfix
      5. Avoid having multiple properties in index , it may slow your query , its recommended to have one index for each property.
      6. Set the system property  oak.queryLimitInMemory to 200000 (or something similar). This will limit the traversal of the nodes to that number

      Finally...

      Jackrabbit Oak is a project under active development, especially around the area of Indexing. Hence please check official documentation for details. Indexing and tuning requires a lot of observation and practice to get perfect.

      I would recommend watching a webinar on this topic by Thomas Muller and Marcel Reutegger on AEM Gems, which is a bit old (before OAK-1.0.9 was released), but immensely useful to understand indexing on Jackrabbit Oak.

      Happy Indexing!!!


      AEM CQ OAK PERFORMANCE TUNING
      blog comments powered by Disqus