Group By

Use Group By to combine and aggregate data based on a selected field. The results of the operation are added to the project as a newly derived dataset.

Potential use cases for group by operations include:

  • Computing the average air quality statistics of a static location over a period of time.
  • Retrieving the total count of a specific feature on the map.
  • Calculating the distribution of a COVID-19 outbreak for any given region.

Run a Group By Operation

Follow these steps to run a group by operation in Studio:

  1. Navigate to the Columns tab.
  2. Click ⋮ More Options >> Group By on the column to base the group by operation on.

The selected column for the group by operation will appear at the top of the Group By panel. Click the selected column to choose a new column for the group by operation.

In this example, a dataset containing trees in San Francisco is grouped by h3_9, a column containing H3 index strings.


Selecting a column to use in a group by operation.
  1. Define aggregation rules for the group by operation.

A panel appears with fields to define the aggregation rules for combining data in each column.

First, select/unselect fields to include/exclude in the grouped dataset. If your dataset contains many columns you wish to remove, click Unselect all and then select only the columns you wish to include.

Next, change the aggregation methods for any column by clicking the purple aggregation text. Based on the data type of each column, Studio will provide a list of all applicable aggregation methods.

Defining aggregation methods for the Group By.

Defining aggregation methods for the Group By.

  1. Click Run to run the group by operation.

A window will appear containing a preview of the grouped dataset. If you notice any issues with the grouped dataset, alter the aggregation rules then click Rerun.

  1. Click Confirm to add the grouped dataset to the project.

The grouped dataset will appear connected to the source dataset. Studio may also automatically generate a new layer for the grouped dataset. You may start configuring this dataset to visualize the newly aggregated data.

In the example below, the newly aggregated count TreeID column visualizes the distribution of street trees throughout San Francisco in H3 cells at resolution 9. Green areas represent areas with more street trees, while brown areas represent areas with fewer street trees. The tooltip may also contain any field from the grouped dataset.

SF Tree data bucketed into H3 cells, colored by count.

SF Tree data bucketed into H3 cells, colored by count.

Available Group By Operations

Refer to the table below to see the supported aggregation options based on the data type of the columns in your dataset.

Data Types
Aggregation Method Numerical Time Categorical (String) Boolean Geometry H3
COUNT ✔️ ✔️ ✔️ ✔️ ✔️
SUM ✔️
MEAN ✔️
MAX ✔️ ✔️
MIN ✔️ ✔️
DEVIATION ✔️
VARIANCE ✔️
PERCENTILE (05, 25, 50, 75, 95) ✔️
MODE ✔️ ✔️ ✔️ ✔️
UNIQUE ✔️ ✔️ ✔️
MERGE ✔️ ✔️

Spatial Merge

Use spatial merge with geojson or h3 data to dissolve borders or otherwise combine geometries based on a common field. When merging geojson data, all polygons, LineStrings, Points, and other features will be merged. When merging h3 data, all h3 indexes that share the field will be merged.

Follow these steps to complete a spatial merge:

  1. Start a new group by operation, then select a field shared by the geometries you wish to merge.
  2. Choose any other columns you wish to include in the group-by operation.
  3. Select the Merge aggregation option for your polygon or h3 column.

Upon competing the group by operation, a new dataset will appear with your merged spatial objects.

Example


Grouping US election data by number number of party-controlled congressional districts.

In the above example, a dataset containing election data is loaded into Studio. This election data has the column _geojson, which contains GeoJSON MultiPolygon objects representing congressional districts, as well as a column PARTY_LEADING, indicating which party controls the congressional district.

To view the number of congressional districts controlled by an individual party, we run a group by operation on the STATENAME field. Then, we choose the Merge aggregation option on our _geojson column and Mode on our PARTY_LEADING column.

Upon completing the merge, a new layer is automatically generated with the new Merge _geojson column. We can quickly configure this layer to display the data calculated by the group by operation. In this case, we alter the color scale to have just two steps, then select the new Mode PARTY_LEADING column to fill the polygons. This map now shows the party that controls the most congressional districts in each US state.

Grouped Dataset Storage

The derived datasets created via group by operations are loaded with the project and are not automatically saved to your Studio dataset cloud storage.

If you wish to use a grouped dataset in other projects, use Studio's Extract feature.


Sign In