Skip to content

Employ ADX best practices

Azure Data Explorer (ADX) is an extremely capable tool for exploring "hot data" using the intuitive and powerful Kusto Query Language (KQL). You can find an overview of the capabilities of Azure Data Explorer (ADX) is available here ADX overview.

This article discusses how you can configure ADX for optimal performance.

Managing partitions for optimal performance

ADX provides several mechanisms for configuring its partitioning strategy to meet the needs of your datasets.

Choose an efficient shuffle key

The shuffle query strategy in Azure Data Explorer (ADX) provides performance benefits when you are writing a query that joins using a key high cardinality (many unique values) and when your query is hitting limits due to data volume.

The shuffle query informs ADX on how query data should be grouped and distributed between all cluster nodes during query execution. ADX partitions data using the shuffle query, and each node processes one partition.

You can significantly improve performance by defining an effective shuffle query. Make sure to choose a shuffle key based on the query's summarization, join, make-series and partitioning keys.

Enhance query performance by defining a custom partitioning policy

You can improve ADX performance for some queries by customizing its partitioning policy. See the Supported partitioning scenarios for when a partitioning policy is recommended.

Use a composite column as the partitioning key

ADX partitions on a single datetime or string column. However, it's possible to partition using a composite string column that combines the values from multiple columns. An example of a composite column is concatenation of 'longitude' and 'latitude' columns through geospatial clustering.

Improve performance by ingesting and transforming data

ADX performs best when processing local data that has been transformed for optimal query performance.

Ingesting data locally for better performance

You should generally ingest data into ADX rather than querying it from external data sources, as it enhances query efficiency.

Transform geospatial data for efficient filtering

Geospatial clustering is a technique used to cluster geographically proximate locations into layers of grids. The grids have cells and each cell is encoded into a short string.

During ingestion, it's possible to define a transformation policy to enrich the data with a geospatial column. This column is then used for filtering, which is more efficient than filtering directly on the latitude and longitude columns.

Note: Setting transformation policies can slow down the ingestion process significantly.

Use Azure Data Factory to ingest data

Microsoft recommends using Azure Data Factory to ingest data from different data stores. ADF is integrated with ADX and supports data transformations.

ADF's copy activity is suitable high volume data ingestion and in production scenarios. The copy activity assures high availability and reliability by providing load-balancing, retry logic, and error handling. There's no size limit on the data amount to ingest using the copy activity.

Managing ADX compute resources

You can customize how ADX compute resources are deployed and configured to support your business requirements.

Provide dedicated compute to different users and/or applications

ADX allows you to create multiple dedicated read-only clusters that are attached to your ADX database. This allows you to provide dedicated query compute for different users and/or applications (for example, you can run ingestion on the lead database, while users run queries on a copy). You do this by attaching a separate ADX instance as a follower database to your ADX cluster, which is called the "leader database."

Define capacity policies to prioritize workloads

Use the capacity policy to control ADX prioritizes different data management operations. For example, to prioritize ingesting data into ADX, the core utilization capacity is modified from the default 75% to 100%.

Note: Increasing the core utilization capacity of one data management operation can impact the performance of others.

For more information