This page explains how to use the union operator in APL.
The union
operator in APL allows you to combine the results of two or more queries into a single output. The operator is useful when you need to analyze or compare data from different datasets or tables in a unified manner. By using union
, you can merge multiple sets of records, keeping all data from the source tables without applying any aggregation or filtering.
The union
operator is particularly helpful in scenarios like log analysis, tracing OpenTelemetry events, or correlating security logs across multiple sources. You can use it to perform comprehensive investigations by bringing together information from different datasets into one query.
To understand how the union
operator works, consider these datasets:
Server requests
_time | status | method | trace_id |
---|---|---|---|
12:10 | 200 | GET | 1 |
12:15 | 200 | POST | 2 |
12:20 | 503 | POST | 3 |
12:25 | 200 | POST | 4 |
App logs
_time | trace_id | message |
---|---|---|
12:12 | 1 | foo |
12:21 | 3 | bar |
13:35 | 27 | baz |
Performing a union on Server requests
and Application logs
would result in a new dataset with all the rows from both DatasetA
and DatasetB
.
A union of requests and logs would produce the following result set:
_time | status | method | trace_id | message |
---|---|---|---|---|
12:10 | 200 | GET | 1 | |
12:12 | 1 | foo | ||
12:15 | 200 | POST | 2 | |
12:20 | 503 | POST | 3 | |
12:21 | 3 | bar | ||
12:25 | 200 | POST | 4 | |
13:35 | 27 | baz |
This result combines the rows and merges types for overlapping fields.
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.
Splunk SPL users
In Splunk SPL, the append
command works similarly to the union
operator in APL. Both operators are used to combine multiple datasets. However, while append
in Splunk typically adds one dataset to the end of another, APL’s union
merges datasets while preserving all records.
ANSI SQL users
In ANSI SQL, the UNION
operator performs a similar function to the APL union
operator. Both are used to combine the results of two or more queries. However, SQL’s UNION
removes duplicates by default, whereas APL’s union
keeps all rows unless you use union with=kind=unique
.
T1, T2, T3, ...
: Tables or query results you want to combine into a single output.withsource
: Optional, adds a field to the output where each value specifies the source dataset of the row. Specify the name of this additional field in FieldName
.The union
operator returns all rows from the specified tables or queries. If fields overlap, they are merged. Non-overlapping fields are retained in their original form.
In log analysis, you can use the union
operator to combine HTTP logs from different sources, such as web servers and security systems, to analyze trends or detect anomalies.
Query
Output
_time | id | status | uri | method | geo.city | geo.country | req_duration_ms |
---|---|---|---|---|---|---|---|
2024-10-17 12:34:56 | user123 | 500 | /api/login | GET | London | UK | 345 |
2024-10-17 12:35:10 | user456 | 500 | /api/update-profile | POST | Berlin | Germany | 123 |
This query combines two datasets (HTTP logs and security logs) and filters the combined data to show only those entries where the HTTP status code is 500.
In log analysis, you can use the union
operator to combine HTTP logs from different sources, such as web servers and security systems, to analyze trends or detect anomalies.
Query
Output
_time | id | status | uri | method | geo.city | geo.country | req_duration_ms |
---|---|---|---|---|---|---|---|
2024-10-17 12:34:56 | user123 | 500 | /api/login | GET | London | UK | 345 |
2024-10-17 12:35:10 | user456 | 500 | /api/update-profile | POST | Berlin | Germany | 123 |
This query combines two datasets (HTTP logs and security logs) and filters the combined data to show only those entries where the HTTP status code is 500.
When working with OpenTelemetry traces, you can use the union
operator to combine tracing information from different services for a unified view of system performance.
Query
Output
_time | trace_id | span_id | [‘service.name’] | kind | status_code |
---|---|---|---|---|---|
2024-10-17 12:36:10 | trace-1234 | span-567 | frontend | server | error |
2024-10-17 12:38:20 | trace-7890 | span-345 | frontend | client | error |
This query combines traces from two different datasets and filters them to show only errors occurring in the frontend
service.
For security logs, the union
operator is useful to combine logs from different sources, such as intrusion detection systems (IDS) and firewall logs.
Query
Output
_time | id | status | uri | method | geo.city | geo.country | req_duration_ms |
---|---|---|---|---|---|---|---|
2024-10-17 12:34:56 | user789 | 200 | /api/login | GET | Berlin | Germany | 245 |
2024-10-17 12:40:22 | user456 | 404 | /api/nonexistent | GET | Munich | Germany | 532 |
This query combines web and security logs, then filters the results to show only those records where the request originated from Germany.
This example combines all rows from github-push-event
and github-pull-request-event
without any transformation or filtering.
This example combines the datasets, and then filters the data to only include rows where the method
is GET
.
This example combines the datasets and summarizes the data, counting the occurrences of each combination of content_type
and actor
.
This query combines GitHub pull request event logs and GitHub push events, filters by actions made by github-actions[bot]
, and displays key event details such as time
, repository
, commits
, head
, id
.
This example removes the content_type
and commits
field in the datasets sample-http-logs
and github-push-event
before combining the datasets.
This example performs a union and then filters the resulting set to only include rows where the method
is GET
.
After the union, the result is ordered by the type
field.
This example performs a union and then filters the resulting dataset for rows where content_type
contains the letter a
and city
is seattle
.
After the union, the query calculates the number of unique geo.city
and repo
entries in the combined dataset.
The example below returns the union of all datasets that match the pattern github*
and counts the number of events in each.
To maximize the effectiveness of the union operator in APL, here are some best practices to consider:
union
operator, ensure that the fields being merged have compatible data types.project
or project-away
to include or exclude specific fields. This can improve performance and the clarity of your results, especially when you only need a subset of the available data.This page explains how to use the union operator in APL.
The union
operator in APL allows you to combine the results of two or more queries into a single output. The operator is useful when you need to analyze or compare data from different datasets or tables in a unified manner. By using union
, you can merge multiple sets of records, keeping all data from the source tables without applying any aggregation or filtering.
The union
operator is particularly helpful in scenarios like log analysis, tracing OpenTelemetry events, or correlating security logs across multiple sources. You can use it to perform comprehensive investigations by bringing together information from different datasets into one query.
To understand how the union
operator works, consider these datasets:
Server requests
_time | status | method | trace_id |
---|---|---|---|
12:10 | 200 | GET | 1 |
12:15 | 200 | POST | 2 |
12:20 | 503 | POST | 3 |
12:25 | 200 | POST | 4 |
App logs
_time | trace_id | message |
---|---|---|
12:12 | 1 | foo |
12:21 | 3 | bar |
13:35 | 27 | baz |
Performing a union on Server requests
and Application logs
would result in a new dataset with all the rows from both DatasetA
and DatasetB
.
A union of requests and logs would produce the following result set:
_time | status | method | trace_id | message |
---|---|---|---|---|
12:10 | 200 | GET | 1 | |
12:12 | 1 | foo | ||
12:15 | 200 | POST | 2 | |
12:20 | 503 | POST | 3 | |
12:21 | 3 | bar | ||
12:25 | 200 | POST | 4 | |
13:35 | 27 | baz |
This result combines the rows and merges types for overlapping fields.
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.
Splunk SPL users
In Splunk SPL, the append
command works similarly to the union
operator in APL. Both operators are used to combine multiple datasets. However, while append
in Splunk typically adds one dataset to the end of another, APL’s union
merges datasets while preserving all records.
ANSI SQL users
In ANSI SQL, the UNION
operator performs a similar function to the APL union
operator. Both are used to combine the results of two or more queries. However, SQL’s UNION
removes duplicates by default, whereas APL’s union
keeps all rows unless you use union with=kind=unique
.
T1, T2, T3, ...
: Tables or query results you want to combine into a single output.withsource
: Optional, adds a field to the output where each value specifies the source dataset of the row. Specify the name of this additional field in FieldName
.The union
operator returns all rows from the specified tables or queries. If fields overlap, they are merged. Non-overlapping fields are retained in their original form.
In log analysis, you can use the union
operator to combine HTTP logs from different sources, such as web servers and security systems, to analyze trends or detect anomalies.
Query
Output
_time | id | status | uri | method | geo.city | geo.country | req_duration_ms |
---|---|---|---|---|---|---|---|
2024-10-17 12:34:56 | user123 | 500 | /api/login | GET | London | UK | 345 |
2024-10-17 12:35:10 | user456 | 500 | /api/update-profile | POST | Berlin | Germany | 123 |
This query combines two datasets (HTTP logs and security logs) and filters the combined data to show only those entries where the HTTP status code is 500.
In log analysis, you can use the union
operator to combine HTTP logs from different sources, such as web servers and security systems, to analyze trends or detect anomalies.
Query
Output
_time | id | status | uri | method | geo.city | geo.country | req_duration_ms |
---|---|---|---|---|---|---|---|
2024-10-17 12:34:56 | user123 | 500 | /api/login | GET | London | UK | 345 |
2024-10-17 12:35:10 | user456 | 500 | /api/update-profile | POST | Berlin | Germany | 123 |
This query combines two datasets (HTTP logs and security logs) and filters the combined data to show only those entries where the HTTP status code is 500.
When working with OpenTelemetry traces, you can use the union
operator to combine tracing information from different services for a unified view of system performance.
Query
Output
_time | trace_id | span_id | [‘service.name’] | kind | status_code |
---|---|---|---|---|---|
2024-10-17 12:36:10 | trace-1234 | span-567 | frontend | server | error |
2024-10-17 12:38:20 | trace-7890 | span-345 | frontend | client | error |
This query combines traces from two different datasets and filters them to show only errors occurring in the frontend
service.
For security logs, the union
operator is useful to combine logs from different sources, such as intrusion detection systems (IDS) and firewall logs.
Query
Output
_time | id | status | uri | method | geo.city | geo.country | req_duration_ms |
---|---|---|---|---|---|---|---|
2024-10-17 12:34:56 | user789 | 200 | /api/login | GET | Berlin | Germany | 245 |
2024-10-17 12:40:22 | user456 | 404 | /api/nonexistent | GET | Munich | Germany | 532 |
This query combines web and security logs, then filters the results to show only those records where the request originated from Germany.
This example combines all rows from github-push-event
and github-pull-request-event
without any transformation or filtering.
This example combines the datasets, and then filters the data to only include rows where the method
is GET
.
This example combines the datasets and summarizes the data, counting the occurrences of each combination of content_type
and actor
.
This query combines GitHub pull request event logs and GitHub push events, filters by actions made by github-actions[bot]
, and displays key event details such as time
, repository
, commits
, head
, id
.
This example removes the content_type
and commits
field in the datasets sample-http-logs
and github-push-event
before combining the datasets.
This example performs a union and then filters the resulting set to only include rows where the method
is GET
.
After the union, the result is ordered by the type
field.
This example performs a union and then filters the resulting dataset for rows where content_type
contains the letter a
and city
is seattle
.
After the union, the query calculates the number of unique geo.city
and repo
entries in the combined dataset.
The example below returns the union of all datasets that match the pattern github*
and counts the number of events in each.
To maximize the effectiveness of the union operator in APL, here are some best practices to consider:
union
operator, ensure that the fields being merged have compatible data types.project
or project-away
to include or exclude specific fields. This can improve performance and the clarity of your results, especially when you only need a subset of the available data.