top of page
  • Writer's picturePolly

How Polly™ Builds Her Samples

askpolly is the self-serve version of Polly™ the world’s first AI capable of statistically-valid public opinion research reports on demand from social media data.

askpolly’s patented algorithm, Conditional Independence Coupling (CIC), creates samples from online sources (Twitter, Reddit, and TikTok) with the same statistical independence and distribution as Random Digit Dialing. With CIC, askpolly’s sample sizes are in the hundreds of thousands.

Polly’s ability to forecast future behaviour is regularly proven through studies with known outcomes such as elections, disease tracking and transit usage, among others. – Polly’s results have been peer-reviewed and presented by leading science bodies such as the IEEE and The National Academy of Sciences.

How Polly Builds Her Samples

  1. Choose any census metropolitan area of 5,000 people or more

  2. Select one person at random from that neighbourhood

  3. Crawl the network out from that person to look for other people from that neighbourhood until we find someone who is independent of the person we started from using ‘network hops’

After the Sample is Built: Balancing Demographics

  1. For each person in the sample, askpolly uses classifiers to generate a probabilistic demographic distribution based on the person’s:

    1. first and last name

    2. social avatar

    3. location, description

    4. text history

    5. description

  2. Each of these classifiers is compared through a Bayesian Belief Network and a consensus on age, gender and location is created. Privacy-preserving techniques, including removal of personally identifiable information, k-anonymity and differential privacy are applied to the output, protecting against inadvertent privacy leaks.

  3. Repeat

    1. Polly samples are made up of thousands of individuals, so this process gets repeated thousands of times and Polly’s AI capabilities make this process take days instead of months.

    2. A sample is finished when:

      1. The demographic makeup is balanced with that of the census data for the geo-location

      2. The number of individuals is balanced to be large enough to reduce the possibility of random variation trust, but not so large that the wait time for askpolly results gets affected.

65 views0 comments

Recent Posts

See All
bottom of page