Announcing Apache Pinot 0.9

Mark Needham
ByWritten byMark Needham
November 28, 20215 minutes read

We are excited to announce the release this week of Apache Pinot 0.9. Apache Pinot is a real-time distributed datastore designed to answer OLAP queries with high throughput and low latency.

This release is cut from commit 13c9ee9556498bb6dc4ab60734743edb8b89773c and contains the segment merge and roll up task, improvements to the Query Console UI, new SQL functions, as well as bug fixes and performance improvements. You can find a full list of everything included in the release notes.

Let’s have a look at some of the changes, with the help of the batch, realtime, and hybrid QuickStart configurations.

Segment Merge and Roll-up

Seunghyun Lee and Jiapeng Tao operate a large multi-tenant cluster at LinkedIn that serves a business metrics dashboard and noticed that their tables consisted of millions of small segments. This was leading to slow operations in Helix/Zookeeper, long running queries due to having too many tasks to process, as well as using more space because of a lack of compression.

To solve this problem they added the Segment Merge task, which compresses segments based on timestamps and rolls up/aggregates older data. The task can be run on a schedule or triggered manually via the Pinot REST API.

At the moment this feature is only available for offline tables but will be added for real-time tables in a future release. You can also watch a meetup presentation where Seunghyun and Jiapeng explain this feature in more detail.

Query Console UI

This release also sees improvements to Pinot’s query console UI.

Hristo Stoyanov made the SQL editor box expandable, Sanket Shah added tooltips, and Priyenbhai Patel made it possible to execute a query by pressing Cmd + Enter. You can see the changes in the animation below:

Query console UI improvements in Apache Pinot 0.9

New SQL functions

Yupeng Fu added the IN function, which returns a boolean that indicates if a value is contained in a list of values. An example of how to use this function is shown below:

SELECT yearID IN (1871, 1872, 1873) AS earlyYears, count(*)
FROM baseballStats
GROUP BY earlyYears
LIMIT 10
Copy

Atri Sharma added the LIKE predicate, which can be used to filter string columns:

SELECT count(*)
FROM baseballStats
WHERE playerID LIKE 'aa%'
LIMIT 100
Copy

Yash Agarwal added the MODE function, which returns the most frequent value from all rows in a column. If we want to find the most frequent yearID before 2012, we could run the following query:

SELECT MODE(yearID)
FROM baseballStats
WHERE yearID < 2012
Copy

Sample query result for the most frequent yearID before 2012 using improved features in Apache Pinot 0.9

We can also pass in an optional second parameter of ‘MIN’, ‘MAX’, or ‘AVG’, which will be used if we have multiple values with the same frequency. In this case, 2007 and 2008 appear the same number of times, so passing in AVG will return the average of those values:

SELECT MODE(yearID, 'AVG')
FROM baseballStats
WHERE yearID < 2012
Copy

Sample ‘AVG’ parameter query result in Apache Pinot 0.9

Lakshmanan Velusamy added functions that return serialized/raw values of the PercentileEst (QuantileDigest) and PercentileTDigest (TDigest) data structures. We can return the raw values at the 50th percentile by running the following query:

SELECT PERCENTILERAWEST50(runs),
       PERCENTILERAWTDIGEST50(runs)
FROM baseballStats
Copy

These functions can also be used on multi value columns:

SELECT PERCENTILERAWEST50MV(DivAirportIDs),
       PERCENTILERAWTDIGEST50MV(DivAirportIDs)
FROM airlineStats
Copy

New SQL Geo functions

Xiaotian (Jackie) Jiang added Geo scalar functions that convert from the WKT or WKB format to geometry/geography and from the geometry to WKB format:

SELECT stAsBinary(location),
       stGeomFromWKB(stAsBinary(location)),
       stGeogFromWKB(stAsBinary(location)),
       stGeomFromText(stAsText(location)),
       stGeogFromText(stAsText(location))
FROM meetupRsvp
LIMIT 10
Copy

Sample query results for geo scalar functions in Apache Pinot 0.9

Yupeng Fu added the geoToH3 function, which returns the H3 index address of a given point or lat/long coordinates:

SELECT group_name,
       group_lat,
       group_lon,
       venue_name,
       stAsText(location),
       geoToH3(group_lat, group_lon, 6),
       geoToH3(location, 6)
FROM meetupRsvp
LIMIT 5
Copy

Sample H3 index address query results in Apache Pinot 0.9

Other features

Bug fixes and optimizations

Resources

If you want to try out Apache Pinot, the following resources will help you get started:

Apache Pinot