From Information to Insights: Segmenting Airbnb’s Provide | by Alexandre Salama | The Airbnb Tech Weblog | Nov, 2024

By: Alexandre Salama, Tim Abraham

At Airbnb, our provide comes from hosts who resolve to record their areas on our platform. In contrast to conventional lodges, these areas usually are not all interchangeable items in a constructing which might be accessible to ebook year-round. Our hosts are individuals, with totally different earnings targets and schedule constraints — resulting in totally different ranges of availability to host. Understanding these variations is a key enter into how we develop our merchandise, campaigns, and operations.

Through the years, we’ve created numerous methods to measure host availability, growing “options” that seize totally different points of how and when listings can be found. Nevertheless, these options present an incomplete image when considered in isolation. For instance, a ~30% availability charge might point out two very totally different situations: a number who solely accepts bookings on weekends, or a number whose itemizing is just accessible throughout a particular season, reminiscent of summer time.

That is the place segmentation is available in.

By combining a number of options, segmentation permits us to create discrete classes that signify the totally different availability patterns of hosts.

However conventional segmentation methodologies, reminiscent of “RFM” (Recency, Frequency, Financial), are targeted on buyer worth fairly than calendar dynamics, and are sometimes restricted to one-off analyses on small datasets. In distinction, we’d like an strategy that may deal with calendar knowledge and each day inference for thousands and thousands of listings.

To handle the above challenges, this weblog publish explores how Airbnb used segmentation to raised perceive host habits at scale. By enriching availability knowledge with novel options and making use of machine studying strategies, we developed a sensible and scalable strategy to phase availability for thousands and thousands of listings each day.

Contemplate Alice and Max, two hosts with equivalent 2-bedroom flats on Airbnb. Nevertheless, Alice solely lists her property in the summertime, whereas Max has it accessible year-round — reflecting two distinct internet hosting types.

Alice’s seasonal availability means that she would possibly dwell within the property more often than not, solely renting it out in the course of the summer time months. Airbnb can assist her with seasonal pricing suggestions, onboarding guides for infrequent hosts, and settings recommendations.

Conversely, Max’s full-time availability signifies a extra skilled internet hosting type, probably his main earnings supply. Airbnb can present him with superior reserving analytics, instruments for managing a number of reservations, and steering on earnings and tax implications.

Two Hosts with Related Profiles (Illustrative)

How can we create a dataset that captures these essential variations in internet hosting habits?

Availability Charge

A primary step is to seize the host’s “intention to be accessible” on a particular evening. Availability might be each analyzed from a backward-looking (previously) or forward-looking (sooner or later) perspective. For simplicity, this publish focuses on backward-looking availability, because it displays the ultimate state of a calendar in any case modifications in stock, bookings and cancellations have occurred. Ahead-looking availability is just not as simple as a result of modifications can nonetheless occur between the evaluation date and the longer term dates being analyzed.

We contemplate each:

  • Nights Vacant: nights when the itemizing was listed as accessible for reserving on Airbnb, and remained vacant.
  • Nights Booked: nights when the itemizing was listed as accessible for reserving on Airbnb, and was later booked on Airbnb.

Consequently, we will calculate the corresponding Nights Supposed to be Obtainable, or Nights Obtainable, for the 365-day look-back interval because the sum of Nights Vacant and Nights Booked. We then divide it by 365, to acquire the corresponding Availability Charge.

Distribution of Listings by Availability Charge within the Earlier Yr (Illustrative)

From this distribution we observe:

  • A substantial proportion of listings has little-to-no availability (~0% availability charge).
  • Conversely, a major proportion of listings has close to full availability (~100% availability charge).
  • Between these extremes, a major set of listings emerges with out sturdy breakpoints.

How can we additional differentiate these listings that fall within the center vary?

Streakiness

For listings that aren’t at both finish of the spectrum, availability charge by itself is inadequate for capturing the nuances of how an inventory is made accessible all through the month. Contemplate listings A and B, which each have a 50% availability charge in a given month.

Two Listings with Related Availability Charges however Distinct Calendar Patterns

Though these listings have distinct availability patterns, they each have the identical availability charge (50%)!

Itemizing A’s concentrated, block-like availability might lend itself to suggestions for weekly keep reductions, or recommendation for hosts who’re away for an extended stretch — steering which might not be appropriate for Itemizing B.

To seize this distinction, we introduce “Streakiness”. Within the instance above, Itemizing A had 1 lengthy streak of availability which was interrupted on evening 16, whereas Itemizing B had 8 quick streaks of availability, every lasting 2 nights earlier than a 2-night break.

We outline a streak as a consecutive sequence of availability with a minimal of two consecutive nights, adopted by a subsequent interval of no less than 2 consecutive nights of unavailability, as described within the diagram under. Notice that we initially thought of utilizing a single evening of availability/unavailability as a threshold however discovered it to be a much less dependable sign of the consistency that streakiness goals to measure.

Streak Definition

This leads us to the corresponding Streakiness characteristic, computed because the ratio of Streaks divided by the variety of Nights Obtainable (computed within the earlier part). At this level, we now have two comparatively orthogonal options for our evaluation: availability charge and streakiness.

Combining Availability + Streakiness

Seasonality

We discovered that whereas availability and streakiness present a strong foundation for measuring quantity and consistency, they don’t seize a calendar’s “compactness” — in different phrases, its seasonality. For example, contemplate Listings C and D, which each have round 15% availability and 14 streaks:

  • Itemizing C concentrates its availability inside a narrower block of time (summer time season) — see first calendar under.
  • Itemizing D distributes its availability extra evenly throughout a number of quarters — see second calendar under.
Two Listings with Related Availability Charges / Streakiness however Distinct Calendar Patterns

Seasonality performs an important function in Airbnb’s enterprise, as visitor demand and host availability fluctuate with modifications in seasonal enchantment, holidays, and native occasions. Given this, we suggest to create a Quarters with at Least One Evening of Availability characteristic.

Moreover, we create a Most Consecutive Months characteristic which captures streakiness at a yearly scale, highlighting the longest steady interval an inventory is obtainable. Collectively, these options give clearer perception into seasonal patterns.

Closing dataset

The ultimate characteristic set consists of all listings that had been listed on the platform as of a broad set of dates. For every itemizing, we calculate the options we’ve designed within the earlier sections. Then, we take a big, random pattern throughout these dates. Lastly, we scale the numerical options to make sure they’re on a comparable scale.

Pattern Listings Depicting our Characteristic Set

We will now apply a K-means clustering algorithm to establish segments, testing fashions with Okay values from 2 to 10. Utilizing the elbow plot to seek out the optimum variety of clusters, we choose 8 clusters as the most effective illustration of our knowledge.

We now have our clusters, however they don’t have names but. Our cluster naming course of includes a number of steps:

  • Checking the distribution of every characteristic by cluster to establish sturdy variations (e.g., “cluster 1 has the very best availability charge”)
  • Randomly sampling listings from every cluster and visualizing their calendars
  • Iterating on naming with a cross-functional inside working group

The output of this course of is summarized within the desk under, whereas the next diagram shows a “typical” calendar for every cluster.

From Cluster Instinct to Cluster Identify
Examples of Calendars by Cluster

Since we’re measuring a latent attribute — underlying host habits patterns that don’t have “floor reality” labels — there isn’t any completely correct technique to validate our segmentation. Nevertheless, we will use numerous methodologies to make sure that it “is smart” from a enterprise perspective, and reliably displays real-life host behaviors.

We accomplish that in three steps:

  • A/B Testing
  • Correlates of Availability Segments
  • Person Expertise (UX) Analysis

A/B Testing

In an A/B check, we assessed how the totally different segments beforehand used a characteristic that inspired hosts to finish “advisable actions” (e.g., letting visitors ebook their dwelling last-minute) so they might earn a financial incentive.

Instance of Host-Going through Beneficial Actions

We present the usage of the characteristic by every phase under. These outcomes align with our instinct: hosts who use Airbnb for particular events or hardly ever might not be excited by following suggestions, even when incentivized. Equally, “At all times On” hosts, who’re already extremely engaged and proactive in managing their listings, would possibly desire to depend on their very own methods fairly than observe Airbnb’s recommendations. Hosts who fall someplace in between, with reasonable ranges of engagement, could be the supreme goal for incentives, as they’re doubtless open to changes that would increase their efficiency.

Instance of Heterogeneous Remedy Results by Availability Section
(“CI” = Confidence Interval)

Correlates of Availability Segments

We additionally validate our clusters by checking correlations with recognized attributes. As an illustration, we verify that “At all times On” listings are doubtless extra managed by professionals, or that “Brief Seasonal” listings are doubtless extra frequent in ski or seaside locations.

Moreover, we all know it’s common to look at a rise within the variety of listings round massive occasions. As anticipated, we observe an increase in “Occasion Motivated” listings main as much as and through main occasions durations, reflecting hosts’ responsiveness to elevated demand.

Impression of an Occasion on the % of Occasion-Motivated Listings (Illustrative)

UX Analysis

Lastly, we all know the UX Analysis workforce conducts host surveys to create qualitative personas, which we evaluate in opposition to our clusters to make sure they align with real-world habits. As an illustration, we confirm if segments with excessive weekend availability match hosts who self-report preferring weekend leases.

Now, we have to scale this segmentation to all our listings.

To attain this, we use a decision tree algorithm. We prepare a mannequin utilizing our 4 options, with cluster labels from our Okay-means mannequin as outputs. We additionally carry out a train-test cut up to verify the mannequin precisely predicts every cluster.

This new mannequin offers a easy, interpretable set of if-else guidelines to categorise listings into clusters. Utilizing the choice tree construction, we translate the mannequin’s logic right into a SQL question by changing the choice tree’s “IF” situations into “CASE WHEN” statements. This integration allows the mannequin to be propagated in our knowledge warehouse.

Determination Tree Construction

At Airbnb, numerous groups leverage these segments: product groups to tell technique and analyze heterogeneous therapy results in A/B assessments, advertising groups for focused messaging, and UX analysis groups for insights into hosts’ motivations.

As an illustration, we revealed a chance to spice up Instant Book adoption amongst “Occasion Motivated” hosts, who could sometimes record their main residence and like guide visitor screening. Including an choice for hosts to solely settle for visitors with a sure score could make Instantaneous E-book extra interesting to them, providing a stability between host management and reserving effectivity.

Initially designed for itemizing availability knowledge, this segmentation methodology has additionally been tailored to host exercise knowledge. We developed a second segmentation targeted on days with “host engagement” (e.g., adjusting costs, updating insurance policies, revising itemizing descriptions) to distinguish occasional “Settings Tinkerers” from frequent “Settings Optimizers.”

This strategy will also be tailored to different industries the place understanding temporal engagement is crucial, for example, to tell apart:

  • Social Media: informal lurkers vs. energetic content material creators
  • Ridesharing: occasional drivers throughout peak demand vs. full-time drivers
  • Streaming Companies: nighttime streamers vs. steady streamers
  • E-commerce: gross sales/holidays fans vs. year-round buyers

This weblog publish was a collaborative effort, with vital contributions from Tim Abraham, the primary co-author. We’d additionally wish to acknowledge the invaluable assist of workforce members from a number of organizations, together with (however not restricted to) Regina Wu, Maggie Jarley, and Peter Coles.