Share This

Mark Needham

Mark Needham

November 18, 2021

We are excited to announce the release this week of Apache Pinot 0.9. Apache Pinot is a real-time distributed datastore designed to answer OLAP queries with high throughput and low latency.

This release is cut from commit 13c9ee9556498bb6dc4ab60734743edb8b89773c and contains the segment merge and roll up task, improvements to the Query Console UI, new SQL functions, as well as bug fixes and performance improvements. You can find a full list of everything included in the release notes.

Let’s have a look at some of the changes, with the help of the batch, realtime, and hybrid QuickStart configurations

Segment Merge and Roll-up

Seunghyun Lee and Jiapeng Tao operate a large multi-tenant cluster at LinkedIn that serves a business metrics dashboard, and noticed that their tables consisted of millions of small segments. This was leading to slow operations in Helix/Zookeeper, long running queries due to having too many tasks to process, as well as using more space because of a lack of compression.

To solve this problem they added the Segment Merge task, which compresses segments based on timestamps and rolls up/aggregates older data. The task can be run on a schedule or triggered manually via the Pinot REST API.

At the moment this feature is only available for offline tables, but will be added for real-time tables in a future release. You can also watch a meetup presentation where Seunghyun and Jiapeng explain this feature in more detail.

Query Console UI

This release also sees improvements to Pinot’s query console UI.

Hristo Stoyanov made the SQL editor box expandable, Sanket Shah added tooltips, and Priyenbhai Patel made it possible to execute a query by pressing Cmd + Enter. You can see the changes in the animation below:

Pinot Query Editor

New SQL functions

Yupeng Fu added the IN function, which returns a boolean that indicates if a value is contained in a list of values. An example of how to use this function is shown below:

SELECT yearID IN (1871, 1872, 1873) AS earlyYears, count(*)
FROM baseballStats
GROUP BY earlyYears
LIMIT 10

Atri Sharma added the LIKE predicate, which can be used to filter string columns:

SELECT count(*)
FROM baseballStats
WHERE playerID LIKE 'aa%'
LIMIT 100

Yash Agarwal added the MODE function, which returns the most frequent value from all rows in a column. If we want to find the most frequent yearID before 2012, we could run the following query:

SELECT MODE(yearID)
FROM baseballStats
WHERE yearID < 2012
Most frequent yearID

Most frequent yearID

We can also pass in an optional second parameter of ‘MIN’, ‘MAX’, or ‘AVG’, which will be used if we have multiple values with the same frequency. In this case 2007 and 2008 actually appear the same number of times, so passing in AVG will return the average of those values:

SELECT MODE(yearID, 'AVG')
FROM baseballStats
WHERE yearID < 2012
Most frequent yearID (average)

Most frequent yearID (average)

Lakshmanan Velusamy added functions that return serialized/raw values of the PercentileEst (QuantileDigest) and PercentileTDigest (TDigest) data structures. We can return the raw values at the 50th percentile by running the following query:

SELECT PERCENTILERAWEST50(runs), 
       PERCENTILERAWTDIGEST50(runs)
FROM baseballStats

These functions can also be used on multi value columns:

SELECT PERCENTILERAWEST50MV(DivAirportIDs), 
       PERCENTILERAWTDIGEST50MV(DivAirportIDs)
FROM airlineStats

New SQL Geo functions

Xiaotian (Jackie) Jiang added Geo scalar functions that convert from the WKT or WKB format to geometry/geography and from the geometry to WKB format:

SELECT stAsBinary(location), 
       stGeomFromWKB(stAsBinary(location)), 
       stGeogFromWKB(stAsBinary(location)), 
       stGeomFromText(stAsText(location)),
       stGeogFromText(stAsText(location))
FROM meetupRsvp 
LIMIT 10
Geo scalar functions

Geo scalar functions

Yupeng Fu added the geoToH3 function, which returns the H3 index address of a given point or lat/long coordinates:

SELECT group_name,
       group_lat,
       group_lon,
       venue_name,
       stAsText(location),
       geoToH3(group_lat, group_lon, 6),
       geoToH3(location, 6)
FROM meetupRsvp 
LIMIT 5
geoToH3 function

geoToH3 function

Other features

Bug fixes and optimizations

Resources

If you want to try out Apache Pinot, the following resources will help you get started:


Share This

Don’t miss the latest updates!

Receive release and feature updates, news, events, and more in our monthly newsletter!