Announcing Apache Pinot 0.9

Mark Needham
Nov 18, 2021

We are excited to announce the release this week of Apache Pinot 0.9. Apache Pinot is a real-time distributed datastore designed to answer OLAP queries with high throughput and low latency.

This release is cut from commit 13c9ee9556498bb6dc4ab60734743edb8b89773c and contains the segment merge and roll up task, improvements to the Query Console UI, new SQL functions, as well as bug fixes and performance improvements. You can find a full list of everything included in the release notes.

Let’s have a look at some of the changes, with the help of the batch, realtime, and hybrid QuickStart configurations.

Segment Merge and Roll-up

Seunghyun Lee and Jiapeng Tao operate a large multi-tenant cluster at LinkedIn that serves a business metrics dashboard and noticed that their tables consisted of millions of small segments. This was leading to slow operations in Helix/Zookeeper, long running queries due to having too many tasks to process, as well as using more space because of a lack of compression.

To solve this problem they added the Segment Merge task, which compresses segments based on timestamps and rolls up/aggregates older data. The task can be run on a schedule or triggered manually via the Pinot REST API.

At the moment this feature is only available for offline tables but will be added for real-time tables in a future release. You can also watch a meetup presentation where Seunghyun and Jiapeng explain this feature in more detail.

Query Console UI

This release also sees improvements to Pinot’s query console UI.

Hristo Stoyanov made the SQL editor box expandable, Sanket Shah added tooltips, and Priyenbhai Patel made it possible to execute a query by pressing Cmd + Enter. You can see the changes in the animation below:

pinot query editor

New SQL functions

Yupeng Fu added the IN function, which returns a boolean that indicates if a value is contained in a list of values. An example of how to use this function is shown below:

SELECT yearID IN (1871, 1872, 1873) AS earlyYears, count(*)
FROM baseballStats
GROUP BY earlyYears

Atri Sharma added the LIKE predicate, which can be used to filter string columns:

SELECT count(*)
FROM baseballStats
WHERE playerID LIKE 'aa%'

Yash Agarwal added the MODE function, which returns the most frequent value from all rows in a column. If we want to find the most frequent yearID before 2012, we could run the following query:

FROM baseballStats
WHERE yearID < 2012

Most frequent yearID

We can also pass in an optional second parameter of ‘MIN’, ‘MAX’, or ‘AVG’, which will be used if we have multiple values with the same frequency. In this case, 2007 and 2008 appear the same number of times, so passing in AVG will return the average of those values:

FROM baseballStats
WHERE yearID < 2012

Most frequent yearID average

Lakshmanan Velusamy added functions that return serialized/raw values of the PercentileEst (QuantileDigest) and PercentileTDigest (TDigest) data structures. We can return the raw values at the 50th percentile by running the following query:

FROM baseballStats

These functions can also be used on multi value columns:

FROM airlineStats

New SQL Geo functions

Xiaotian (Jackie) Jiang added Geo scalar functions that convert from the WKT or WKB format to geometry/geography and from the geometry to WKB format:

SELECT stAsBinary(location),
FROM meetupRsvp

Geo scalar functions

Yupeng Fu added the geoToH3 function, which returns the H3 index address of a given point or lat/long coordinates:

SELECT group_name,
       geoToH3(group_lat, group_lon, 6),
       geoToH3(location, 6)
FROM meetupRsvp

GeoToH3 function

Other features

Bug fixes and optimizations


If you want to try out Apache Pinot, the following resources will help you get started:


Don’t miss the latest updates!

Get totally non-spammy, super helpful tips, stories, and resources from our network