This page explains how to use the make_set aggregation function in APL.
The make_set
aggregation in APL (Axiom Processing Language) is used to collect unique values from a specific column into an array. It is useful when you want to reduce your data by grouping it and then retrieving all unique values for each group. This aggregation is valuable for tasks such as grouping logs, traces, or events by a common attribute and retrieving the unique values of a specific field for further analysis.
You can use make_set
when you need to collect non-repeating values across rows within a group, such as finding all the unique HTTP methods in web server logs or unique trace IDs in telemetry data.
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.
Splunk SPL users
In Splunk SPL, the values
function is similar to make_set
in APL. The main difference is that while values
returns all non-null values, make_set
specifically returns only unique values and stores them in an array.
ANSI SQL users
In ANSI SQL, the GROUP_CONCAT
or ARRAY_AGG(DISTINCT)
functions are commonly used to aggregate unique values in a column. make_set
in APL works similarly by aggregating distinct values from a specific column into an array, but it offers better performance for large datasets.
column
: The column from which unique values are aggregated.limit
: (Optional) The maximum number of unique values to return. Defaults to 128 if not specified.An array of unique values from the specified column.
In this use case, you want to collect all unique HTTP methods used by each user in the log data.
Query
Output
id | make_set_method |
---|---|
user123 | [‘GET’, ‘POST’] |
user456 | [‘GET’] |
This query groups the log entries by id
and returns all unique HTTP methods used by each user.
In this use case, you want to collect all unique HTTP methods used by each user in the log data.
Query
Output
id | make_set_method |
---|---|
user123 | [‘GET’, ‘POST’] |
user456 | [‘GET’] |
This query groups the log entries by id
and returns all unique HTTP methods used by each user.
In this use case, you want to gather the unique service names involved in a trace.
Query
Output
trace_id | make_set_service.name |
---|---|
traceA | [‘frontend’, ‘checkoutservice’] |
traceB | [‘cartservice’] |
This query groups the telemetry data by trace_id
and collects the unique services involved in each trace.
In this use case, you want to collect all unique HTTP status codes for each country where the requests originated.
Query
Output
geo.country | make_set_status |
---|---|
USA | [‘200’, ‘404’] |
UK | [‘200’] |
This query collects all unique HTTP status codes returned for each country from which requests were made.
make_set
, but returns all values, including duplicates, in a list. Use make_list
if you want to preserve duplicates.count
when you need the total count rather than the unique values.dcount
when you need the number of unique values, rather than an array of them.max
when you are interested in the largest value rather than collecting values.This page explains how to use the make_set aggregation function in APL.
The make_set
aggregation in APL (Axiom Processing Language) is used to collect unique values from a specific column into an array. It is useful when you want to reduce your data by grouping it and then retrieving all unique values for each group. This aggregation is valuable for tasks such as grouping logs, traces, or events by a common attribute and retrieving the unique values of a specific field for further analysis.
You can use make_set
when you need to collect non-repeating values across rows within a group, such as finding all the unique HTTP methods in web server logs or unique trace IDs in telemetry data.
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.
Splunk SPL users
In Splunk SPL, the values
function is similar to make_set
in APL. The main difference is that while values
returns all non-null values, make_set
specifically returns only unique values and stores them in an array.
ANSI SQL users
In ANSI SQL, the GROUP_CONCAT
or ARRAY_AGG(DISTINCT)
functions are commonly used to aggregate unique values in a column. make_set
in APL works similarly by aggregating distinct values from a specific column into an array, but it offers better performance for large datasets.
column
: The column from which unique values are aggregated.limit
: (Optional) The maximum number of unique values to return. Defaults to 128 if not specified.An array of unique values from the specified column.
In this use case, you want to collect all unique HTTP methods used by each user in the log data.
Query
Output
id | make_set_method |
---|---|
user123 | [‘GET’, ‘POST’] |
user456 | [‘GET’] |
This query groups the log entries by id
and returns all unique HTTP methods used by each user.
In this use case, you want to collect all unique HTTP methods used by each user in the log data.
Query
Output
id | make_set_method |
---|---|
user123 | [‘GET’, ‘POST’] |
user456 | [‘GET’] |
This query groups the log entries by id
and returns all unique HTTP methods used by each user.
In this use case, you want to gather the unique service names involved in a trace.
Query
Output
trace_id | make_set_service.name |
---|---|
traceA | [‘frontend’, ‘checkoutservice’] |
traceB | [‘cartservice’] |
This query groups the telemetry data by trace_id
and collects the unique services involved in each trace.
In this use case, you want to collect all unique HTTP status codes for each country where the requests originated.
Query
Output
geo.country | make_set_status |
---|---|
USA | [‘200’, ‘404’] |
UK | [‘200’] |
This query collects all unique HTTP status codes returned for each country from which requests were made.
make_set
, but returns all values, including duplicates, in a list. Use make_list
if you want to preserve duplicates.count
when you need the total count rather than the unique values.dcount
when you need the number of unique values, rather than an array of them.max
when you are interested in the largest value rather than collecting values.