Amazon’s cloud division, Amazon Web Services (AWS), is reportedly investigating Perplexity AI, a rising AI search startup, over allegations of web scraping without consent. This follows a Wired report uncovering a Perplexity crawler seemingly disregarding the Robots Exclusion Protocol (robots.txt).

The Robots Exclusion Protocol is an industry-standard where website owners place a file specifying which pages automated bots can’t access. While not legally binding, respecting robots.txt is generally expected by reputable companies.

Wired‘s investigation identified a virtual machine on an AWS server, with an IP address linked to Perplexity, that bypassed Conde Nast’s robots.txt instructions. This machine reportedly visited Conde Nast properties hundreds of times in the past three months, potentially scraping content. Similar visits were detected by The Guardian, Forbes, and The New York Times.

To test Perplexity’s practices further, Wired entered headlines or short descriptions of articles from these publications into the company’s chatbot. The tool responded with results that closely resembled the articles, with minimal attribution. This suggests Perplexity might be using scraped content to generate its responses.

This investigation raises concerns about Perplexity’s data collection methods. Scraping websites without consent can be unethical and potentially violate terms of service. Additionally, the quality and accuracy of AI models heavily depend on the training data. Using potentially scraped content could introduce biases or inaccuracies into Perplexity’s search results.

AWS confirmed the investigation to Wired, indicating they are looking into whether Perplexity’s practices violate their terms of service. The outcome of this investigation could potentially impact Perplexity’s ability to operate on the AWS cloud platform.

This incident highlights a broader debate about ethical data collection practices in the AI industry. As AI companies strive to develop powerful models, ensuring responsible data acquisition is crucial. With Amazon investigating a well-funded startup like Perplexity, it sends a strong message about the importance of respecting website owners’ wishes regarding their content.