What Is Apache Pinot?

Apache Pinot logo

Apache Pinot is an open source distributed database designed for real-time, user-facing analytics. It is classified as an Online Analytical Processing (OLAP) database, capable of low-latency query execution even at extremely high throughput. Ingest directly from event streaming sources like Apache Kafka and make events available for querying immediately. You can also ingest data from online transactional processing (OLTP) databases using Change Data Capture (CDC), or from batch data sources such as data warehouses or cloud object stores. Queries are made using a subset of SQL.

Apache Pinot is a top-level project of the Apache Software Foundation (ASF).
Users can find out more about Apache Pinot at its official website: pinot.apache.org.

Advantages of Apache Pinot

Highly Scalable

Apache Pinot is a distributed system that can span thousands of nodes. These nodes act as a single system responding to query requests in unison.

High Performance

Ingest real-time event streams of millions of events per second, allowing data to be queried immediately with low latency results and without data caching.

High Concurrency

Supports user-facing analytics, running hundreds of thousands of simultaneous queries across your data without performance bottlenecks.

Fault Tolerant

Automatically replicates and distributes data between nodes, using Apache Helix for highly resilient cluster management.

Flexible Indexing

Supports multiple types of data indexing, including its unique star-tree index that allows user-tunable performance.

Fast Column Store

Stores data in a columnar format, perfect for highly efficient OLAP workloads. It also supports smart query routing and aggregation optimizations.

What Makes Apache Pinot so Fast?

Chapter 1: Query Lifecycle and Optimization Techniques

Discover Apache Pinot’s query lifecycle and optimization techniques, plus dive into its architecture.

Chinmay Soman + 1

Read now

Chapter 2: The Power of Indexing

Learn how Pinot uses indexes to optimize different kinds of data queries.

Chinmay Soman + 1

Read now

How Does Apache Pinot Compare to Other Real-Time Analytics Databases?

Comparing Three Real-Time OLAP Databases: Apache Pinot, Apache Druid, and ClickHouse

In this article, we attempt to provide a fair comparison of the three technologies, along with areas of strength and opportunities for improvement for each.

Chinmay Soman + 1

Read now

Discover More

Apache pinot resources

Apache Pinot case studies

How Can I Get Started?

StarTree Developer

Our developer website will help you get started.

Apache Pinot Official Website

The official project website hosted at the Apache Software Foundation

Who uses it?

Apache Pinot is used across many organizations and industries. Here are just a few of the many organizations who have adopted Apache Pinot in production:

Social Media Platforms, Online Collaboration Tools, Banking and Financial Services, Video and Audio Streaming Services, Cybersecurity, Mobile Telephony, Retail Shopping and Payment Systems, Adtech and Martech, Transportation and Delivery Services

History

Traditional OLAP databases were oriented to batch processing of data — getting periodic dumps or syncs from Online Transaction Processing (OLTP) databases that only occurred every few hours or even once a day. Yet as the field of “big data” grew, the need to shorten cycle times and deal with orders of magnitude more data meant that old batch data methodologies gave way to newer systems with ever shortening time windows for data updates. Software designers sped up this cycle through a process known as “microbatching,” which shortened the update cycle to every few seconds or minutes. But it still wasn’t “real-time.”

This was becoming an increasingly urgent problem to solve in data-intensive organizations that had already implemented real-time event streaming systems, such as Apache Kafka, which could produce millions of events per second. There needed to be complementary OLAP databases designed specifically to handle the kinds of real-time event streaming architectures Apache Kafka could enable.

LinkedIn was the birthplace of Apache Kafka (incubated 2011, graduated 2012). Its widespread internal use at LinkedIn meant there was a ready audience and technical infrastructure to require and support a real-time analytical database to integrate with an event streaming data architecture. A team headed by Kishore Gopalakrishna created Pinot in 2014. Its first use case was to power the “Who’s Viewed Your Profile” feature. It was first announced to the world in a 2014 blog, “ Real-time Analytics at Massive Scale with Pinot”. In 2015 the project was open sourced.

By 2018 Pinot had entered incubation at the Apache Software Foundation (ASF), from which it graduated in 2021, becoming an Apache top-level project: Apache Pinot. By this time other pioneering industry-leading organizations had begun to adopt it for their own use cases: Amazon-Eero, Doordash, Factual/FourSquare, LinkedIn, Stripe, Uber, Walmart, Weibo, WePay, and others.

Kishore Gopalakrishna eventually left LinkedIn, and, with a team of co-founders, created StarTree. StarTree Cloud is a Database-as-a-Service (DBaaS) powered by Apache Pinot, built to provide a fully managed platform for real-time analytics. By removing the burden of infrastructure management, companies can focus on delivering real-time insights to their end users.

What is Apache Pinot?

Advantages of Apache Pinot

Highly Scalable

High Performance

High Concurrency

Fault Tolerant

Flexible Indexing

Fast Column Store

What Makes Apache Pinot so Fast?

Chapter 1: Query Lifecycle and Optimization Techniques

Chapter 2: The Power of Indexing

How Does Apache Pinot Compare to Other Real-Time Analytics Databases?

Comparing Three Real-Time OLAP Databases: Apache Pinot, Apache Druid, and ClickHouse

Discover More

How Can I Get Started?

Who uses it?

History

Ready to deploy real-time analytics?