Meta AI Used Public Posts from Facebook and Instagram for Training, Avoid Private Data

Isaac Adeyemi

7 months ago

Meta Slashes Subscription Fee in Half for EU Users: A Price Cut to Appease Regulators

Meta Platforms, the parent company of Facebook and Instagram, has revealed that it used public posts from these platforms to train parts of its new Meta AI virtual assistant. The company emphasized that it excluded private posts shared only among family and friends to respect users’ privacy.

In an interview with Reuters, Nick Clegg, Meta’s President of Global Affairs, stated that the company took steps to filter out private information from public datasets used for training. He mentioned that a “vast majority” of the data utilized for training Meta AI was publicly available, emphasizing the company’s efforts to avoid datasets with excessive personal information.

Clegg cited LinkedIn as an example of a website whose content Meta deliberately chose not to use due to privacy concerns. This approach reflects Meta’s commitment to maintaining user privacy while developing AI technologies.

Tech companies, including Meta, OpenAI, and Google, have faced criticism for using web-scraped information to train their AI models without explicit permission. These models require extensive data to summarize information and generate various content, raising concerns about the use of private or copyrighted materials.

Meta AI, unveiled at Meta’s annual Connect conference, is a significant product that employs a custom model based on the powerful Llama 2 large language model. Additionally, it uses a new model called Emu for generating images based on text prompts. The product can generate text, audio, and imagery and has real-time information access through a partnership with Microsoft’s Bing search engine.

Clegg noted that Meta AI’s training data included public posts from Facebook and Instagram, encompassing both text and photos. The image generation aspect of the product was trained using these posts, while the chat functions were based on Llama 2 with publicly available and annotated datasets incorporated.

Furthermore, interactions with Meta AI may contribute to ongoing feature improvements. Meta has implemented safety restrictions, including a ban on creating photo-realistic images of public figures, to ensure responsible use of the technology.

Regarding copyrighted materials, Clegg anticipates potential litigation concerning whether creative content falls under the existing fair use doctrine, which permits limited use of protected works for purposes such as commentary, research, and parody. Meta has introduced new terms of service to prevent users from generating content that violates privacy and intellectual property rights.

The approach taken by Meta reflects its commitment to privacy and responsible AI development in an era of increasing scrutiny and legal challenges related to data usage and AI technology.