SQL (beta)

Last updated:

|Edit this page

SQL insights enable you to directly access data in PostHog using SQL queries. They're powered by HogQL.

SQL commands

You create SQL queries out of a list of commands that shape what data, filters, aggregations, and ordering we want to see.

SQL queries in PostHog don’t require the trailing semi-colon (;) of traditional SQL queries.

SELECT

Use SELECT to select data (usually columns, transformations, or aggregations) from one or more tables in the database.

SQL
SELECT *
FROM events

Common values to select are * (representing all), event, timestamp, properties, and functions like count(). You can access properties using dot notation like person.properties.$initial_browser. These values can be found in the data management properties tab or inside tables in the database warehouse tab.

Add the DISTINCT clause to SELECT commands to keep only unique rows in query results.

SQL
SELECT DISTINCT person_id
FROM events

FROM

Use FROM to select the database table to run the queries on. In PostHog, examples include events, groups, raw_session_replay_events, and more listed in the data management database tab in PostHog.

SQL
SELECT session_id, min_first_timestamp, click_count
FROM raw_session_replay_events

JOIN

You can query over multiple tables together using the LEFT JOIN or INNER JOIN commands which takes one table before the command and another after the command and combines them based on the join condition using the ON keyword.

SQL
SELECT events.event, persons.is_identified
FROM events
LEFT JOIN persons ON events.person_id = persons.id

This is especially useful when querying using the data warehouse and querying external sources. For example, once you set up the Stripe connector, you can query for a count of events from your customers like this:

SQL
SELECT events.distinct_id, COUNT() AS event_count
FROM events
INNER JOIN prod_stripe_customer ON events.distinct_id = prod_stripe_customer.email
GROUP BY events.distinct_id
ORDER BY event_count DESC

WHERE

Use WHERE to filter rows based on specified conditions. These conditions can be:

  1. Comparison operators like =, <, or >=
  2. Logical operators like AND, OR, or NOT. These are often used to combine multiple conditions.
  3. Functions like toDate, today()
  4. Clauses like LIKE, IN, IS NULL, BETWEEN
SQL
SELECT *
FROM events
WHERE event = '$pageview'
AND toDate(timestamp) = today()
AND properties.$current_url LIKE '%/blog%'

WHERE is also useful for querying across multiple tables. For example, if you have the Hubspot connector set up, you can get a count of events for contacts with a query like this:

SQL
SELECT COUNT() AS event_count, distinct_id
FROM events
WHERE distinct_id IN (SELECT email FROM hubspot_contacts)
GROUP BY distinct_id
ORDER BY event_count DESC

GROUP BY

Use GROUP BY to group rows that have the same values in specified columns into summary rows.

SQL
select
properties.$os,
count()
from events
group by
properties.$os

ORDER BY

Use ORDER BY to sort the query results by one or more columns. You can specify order by ascending with ASC or descending with DESC.

SQL
SELECT
properties.$browser,
count()
FROM events
GROUP BY properties.$browser
ORDER BY count() DESC

AS

Use AS to alias columns or tables with different names. This makes the query and results more readable.

SQL
SELECT
properties.$current_url as current_url,
count() as current_url_count
FROM events
GROUP BY current_url
ORDER BY current_url_count DESC

LIMIT

Use LIMIT to restrict the number of rows returned by the query. It specifies the maximum number of rows the query should retrieve. By default, PostHog sets it at 100.

SQL
SELECT
properties.$lib as library,
count() as library_count
FROM events
WHERE properties.$lib != ''
GROUP BY library
ORDER BY library_count DESC
LIMIT 1

HAVING

Use HAVING with the GROUP BY command to filter the results based on aggregate function values. While WHERE filters rows before grouping, HAVING filters grouped results after aggregation.

SQL
SELECT
properties.$os,
count()
FROM events
GROUP BY
properties.$os
HAVING count() > 100

WITH

Use WITH to define a temporary result set that you can reference within a larger query. It helps break down complex queries into smaller parts. You can think of it as a function that returns a temporary table similar to using a subquery in a FROM command. The difference is that we query WITH subqueries each time they are used, potentially leading to slower queries.

SQL
with first_query as (
select
count() as first_count
from events
)
select
first_count
from first_query

Useful functions

HogQL includes many functions to aggregate and manipulate queried data. Below are some examples of some of the most popular SQL functions you can use in your insights.

Aggregate functions

These aggregate results for columns across all rows. They include:

  • count(): Counts the number of rows that match a certain condition.
  • avg(): Calculates the average numeric value of a column.
  • sum(): Calculates the total (sum) numeric value of a column.
  • max(), min(): Finds the maximum or minimum value of a column.
SQL
SELECT
avg(properties.$screen_height),
sum(properties.$screen_height),
max(properties.$screen_height),
min(properties.$screen_height)
FROM events
WHERE event = '$pageview' AND properties.$screen_height IS NOT NULL

You can find a full list of these in supported aggregations.

Regular functions

HogQL provides many functions for accessing, modifying, and calculating data from queries. Along with the one’s listed below, many basics include calculation operators (+, -, /, *), type conversions (toInt, toString), conditional statements (if, multiIf), and rounding (floor, round).

You can find a full list of these in supported ClickHouse functions.

JSON

You can access nested data in JSON and objects directly.

SQL
select properties.$set.$geoip_country_name
from events

You can parse JSON with JSONExtractRaw() to return a value.

SQL
SELECT
JSONExtractRaw(properties.$set) as set_properties
FROM events
WHERE properties.$set IS NOT NULL

Specialized JSONExtract functions exist for different data types including:

  • JSONExtractFloat
  • JSONExtractArrayRaw
  • JSONExtractString
  • JSONExtractBool

Array

  • arrayElement(arr, n): Retrieves the element with the index of n from the array arr.
  • arrayJoin(arr): Takes a row and generates multiple rows for the number of elements in the array. It copies all the column values, except the column where this function is applied. It replaces the applied column with the corresponding array value.
SQL
SELECT flag, count()
FROM (
SELECT arrayJoin(JSONExtractArrayRaw(assumeNotNull(properties.$active_feature_flags))) as flag
FROM events
WHERE event = '$pageview' and timestamp > '2023-08-01'
)
GROUP BY flag
ORDER BY count() desc

Read more in How to filter and breakdown arrays with HogQL.

Date and time

  • now(), today(), yesterday(): Returns the current time, date, or yesterday’s date respectively.
  • interval: A length of time for use in arithmetic operations with other dates and times.
  • toDayOfWeek, toHour, toMinute: Converts date number of day of week (1-7), hour in 24-hour time (0-23), and minute in hour (0-59).
  • toStartOfYear, toStartOfMonth, toMonday, toStartOfDay, toStartOfMinute: rounds date down to the nearest year, month, Monday, day, hour, or minute respectively
  • dateDiff('unit', startdate, enddate): Returns the count in unit between startdate and enddate.
  • formatDateTime: Formats a time according to a MySQL datetime format string.
SQL
SELECT
formatDateTime(now(), '%a %b %T') AS current_time,
toDayOfWeek(now()) AS current_day_of_week,
dateDiff('day', timestamp, now()) AS days_since_event
FROM events
WHERE timestamp > now() - interval 1 day

Read more examples in How to do time-based breakdowns (hour, minute, real time) and Using HogQL for advanced time and date filters.

String

  • extract: Extracts a fragment of a string using a regular expression.
  • concat: Concatenates strings listed without separator.
  • splitByChar, splitByString, splitByRegexp, splitByWhitespace: splits a string into substring separated by a specified character, string, regular expression, or whitespace character respectively.
  • match: Return whether the string matches a regular expression pattern.
  • replaceOne, replaceRegexpOne: Replace the first occurrence of matching a substring or regular expression pattern respectively with a replacement string.
  • trim: Remove specified characters (or whitespace) from the start or end of a string.
SQL
select
extract(elements_chain, '[:|"]attr__class="(.*?)"') as class_name,
concat(properties.$os, ' version: ', properties.$os_version),
replaceRegexpOne(properties.$current_url, '^/', 'site/') AS modified_current_url
from events
where event = '$autocapture'

Read more in How to analyze autocapture events with HogQL.

Sparkline

A sparkline is a tiny graph contained in one cell of your query result. As an argument, it takes an array of integers.

SQL
SELECT sparkline(range(1, 10)) FROM (SELECT 1)

You can use it to visualize queries, such as a 24-hour $pageview count for different $current_url values.

SQL
SELECT
pageview,
sparkline(arrayMap(h -> countEqual(groupArray(hour), h), range(0,23))),
count() as pageview_count
FROM
(
SELECT
properties.$current_url as pageview,
toHour(timestamp) AS hour
FROM
events
WHERE
timestamp > now () - interval 1 day
and event = '$pageview'
) subquery
GROUP BY
pageview
ORDER BY
pageview_count desc

You can also use it for art.

SQL
select
sparkline(arrayMap(a -> cos(toSecond(timestamp) + a/4), range(100 + 5 * toSecond(timestamp))))
from events

Accessing data

Strings and quotes

Quotation symbols work the same way they would work with ClickHouse, which inherits from ANSI SQL:

  • Single quotes (') for Strings literals.
  • Double quotes (") and Backticks (`) for DataBase identifiers.

For example:

SQL
SELECT * FROM events WHERE properties.`$browser` = 'Chrome'

Types

Types (and names) for the accessible data can be found in the database, properties tabs in data management as well as in the data warehouse tab for external sources. They include:

  • STRING (default)
  • JSON (accessible with dot or bracket notation)
  • DATETIME(in ISO-8601, read more in our data docs)
  • INTEGER
  • NUMERIC(AKA float)
  • BOOLEAN

For example:

SQL
SELECT round(properties.$screen_width * properties.$screen_height / 1000000, 2) as `Screen MegaPixels` FROM events LIMIT 100

This works because $screen_width and $screen_height are both defined as numeric properties. Thus you can multiply them.

To cast a string property into a different type, use type conversion functions, such astoString, toDate, toFloat, JSONExtractString, JSONExtractInt, and more.

Property access

To access a property stored on an event or person, just use dot notation. For example properties.$browser or person.properties.$initial_browser. You can also use bracket notation like properties['$feature/cool-flag'].

Nested property or JSON access, such as properties.$some.nested.property, works as well.

PostHog's properties include always include $ as a prefix, while custom properties do not (unless you add it).

Property identifiers must be known at query time. For dynamic access, use the JSON manipulation functions from below on the properties field directly.

Questions?

Was this page useful?

Next article

LLM insights (beta)

This integration is currently in private beta. If you'd like to be added to the beta, email Lior@posthog.com . We've teamed up with Langfuse to track metrics for LLM applications. Langfuse is an open source solution for monitoring LLM applications. It tracks metrics such as model costs, latency, token usage, and more. Combining your Langfuse and PostHog data makes it easy to answer questions like: What are my LLM costs by customer, model, and in total? How many of my users are interacting…

Read next article