Airbnb Brandometer: Powering Model Notion Measurement on Social Media Knowledge with AI | by Tiantian Zhang | The Airbnb Tech Weblog | Apr, 2024

Tiantian Zhang
The Airbnb Tech Blog

How we quantify model perceptions from social media platforms by way of deep studying

By Tiantian Zhang, Shuai Shao (Shawn)

At Airbnb, we have now developed Brandometer, a state-of-the-art pure language understanding (NLU) approach for understanding model notion based mostly on social media information.

Model notion refers back to the normal emotions and experiences of consumers with an organization. Quantitatively, measuring model notion is an especially difficult job. Historically, we depend on buyer surveys to seek out out what clients take into consideration an organization. The downsides of such a qualitative research is the bias in sampling and the limitation in information scale. Social media information, however, is the biggest shopper database the place customers share their experiences and is the perfect complementary shopper information to seize model perceptions.

In comparison with conventional approaches to extract concurrency and count-based prime related subjects, Brandometer learns word embeddings and makes use of embedding distances to measure relatedness of brand name perceptions (e.g., ‘belonging’, ‘related’, ‘dependable’). Phrase embedding represents phrases within the type of real-valued vectors, and it performs nicely in reserving semantic meanings and relatedness of phrases. Phrase embeddings obtained from deep neural networks are arguably the preferred and evolutionary approaches in NLU. We explored a wide range of phrase embedding fashions, from quintessential algorithms Word2Vec and FastText, to the most recent language mannequin DeBERTa, and in contrast them when it comes to producing dependable model notion scores.

For ideas represented as phrases, we use similarity between its embedding and that of “Airbnb” to measure how vital the idea is with respect to the Airbnb model, which is known as as Notion Rating. Model Notion is outlined as Cosine Similarity between Airbnb and the particular key phrase:

the place

Eq. 1

On this weblog publish, we’ll introduce how we course of and perceive social media information, seize model perceptions through deep studying and how one can ‘convert’ the cosine similarities to calibrated Brandometer metrics. We may also share the insights derived from Brandometer metrics.

Downside Setup and Knowledge

With a view to measure model notion on social media, we assessedall Airbnb associated mentions from 19 platforms (e.g., X — previously referred to as Twitter, Fb, Reddit, and so forth) and generated phrase embeddings with state-of-the-art fashions.

With a view to use Social media information to generate significant phrase embeddings for the aim of measuring model notion, we conquered two challenges:

  • High quality: Social media posts are principally user-generated with various content material equivalent to standing sharing and opinions, and will be very noisy.
  • Amount: Social media publish sparsity is one other problem. Contemplating that it sometimes requires a while for social media customers to generate information in response to sure actions and occasions, a month-to-month rolling window maintains a superb steadiness of promptness and detectability. Our month-to-month dataset is comparatively small (round 20 million phrases) as in comparison with a typical dataset used to coach good high quality phrase embeddings (e.g., about 100 billion phrases for Google Information Word2Vec mannequin). Heat-start from pre-trained fashions didn’t assist for the reason that in-domain information barely moved the discovered embeddings.

We developed a number of information cleansing processes to enhance information high quality. On the identical time, we innovated the modeling methods to mitigate the influence on phrase embedding high quality on account of information amount and high quality.

Along with information, we explored and in contrast a number of phrase embedding coaching methods with the aim to generate dependable model notion scores.


Word2Vec is by far the only and most generally used phrase embedding mannequin since 2013. We began with constructing CBOW-based Word2Vec fashions utilizing Gensim. Word2Vec produced respectable in-domain phrase embeddings, and extra importantly, the idea of analogies. In our domain-specific phrase embeddings, we’re in a position to seize analogies within the Airbnb area, equivalent to “host” — “present” + “visitor” ~= “want”, “metropolis” — “mall” + “nature” ~= “park”.


FastText takes under consideration the interior construction of phrases, and is extra sturdy to out-of-vocabulary phrases and smaller datasets. Furthermore, as impressed by Sense2Vec, we affiliate phrases with sentiments (i.e., POSITIVE, NEGATIVE, NEUTRAL), which varieties model notion ideas on the sentiment ranges.


Latest progress in transformer-based language fashions (e.g., BERT) has considerably improved the efficiency of NLU duties with the benefit of producing contextualized phrase embeddings. We developed DeBERTa based mostly phrase embeddings, which works higher with smaller dataset and pays extra consideration to surrounding context through disentangled consideration mechanisms. We educated every part from scratch (together with tokenizer) utilizing Transformers, and the concatenated final consideration layer embeddings resulted in the most effective phrase embeddings for our case.

Model Notion Rating Stabilization and Calibration

The variability of phrase embeddings has been broadly studied (Borah, 2021). The causes vary from the underlying stochastic nature of deep studying fashions (e.g., random initialization of phrase embeddings, embedding coaching which ends up in native optimum for world optimization standards) to the amount and high quality adjustments of information corpus throughout time.

With Brandometer, we have to scale back the variability in embedding distances to generate secure time sequence monitoring. Steady embedding distances helped protect the inherent patterns and constructions current within the time sequence information, and therefore it contributes to higher predictability of the monitoring course of. Moreover, it made the monitoring course of extra sturdy to noisy fluctuations. We studied the influential components and took the next steps to scale back:

  1. Rating averaging over repetitive coaching with bootstrap sampling
  2. Rank-based notion rating

Score averaging over repetitive coaching with upsampling

For every month’s information, we educated N fashions with the identical hyper-parameters, and took the common of N notion scores as the ultimate rating for every idea. In the meantime, we did upsampling to ensure that every mannequin iterated on an equal variety of information factors throughout months.

We outlined variability as:


the place

CosSim(w) refers back to the cosine similarity based mostly notion rating outlined in Eq. 1, A refers back to the algorithm, M refers back to the time window (i.e. month), V refers back to the vocabulary and |V| is the vocabulary measurement, and n refers back to the variety of repetitively educated fashions.

As N approaches 30, the rating variability values converge and settle inside a slender interval. Therefore, we picked N = 30 for all.