OpenAI, the AI powerhouse behind ChatGPT, continues to expand its AI offerings. Recently, it unveiled Sora, a text-to-video AI model generating images and videos. However, the company faces scrutiny over its data sourcing and training practices.

Amidst allegations of using copyrighted material without permission, OpenAI’s CTO Mira Murati faced probing questions. During a recent Wall Street Journal interview, the journalist quizzed Murati about Sora’s training data. Specifically, they asked what publicly available data OpenAI used, mentioning YouTube and Facebook.

Murati’s response was evasive, merely stating OpenAI used publicly available and licensed data. She refused to provide details about the data sources used to train Sora. This lack of transparency fuels concerns about the legitimacy and credibility of AI-generated responses.

Notably, OpenAI previously faced a lawsuit from The New York Times over alleged unauthorized use of articles. Such incidents underscore the ethical importance of disclosing data sources for AI training. Failure to do so raises questions about the AI models’ outputs.

Also Read:

Furthermore, as OpenAI aims to launch an AI-powered search engine, transparency becomes crucial. Recent events involving Google’s image generator have heightened concerns about AI data practices. Without clear disclosure, public confidence in AI’s future applications wanes, especially regarding workforce displacement.

Despite Microsoft’s backing, OpenAI’s secrecy breeds skepticism about its commitment to ethical AI development. As AI models become increasingly sophisticated, addressing transparency and data sourcing issues is paramount. Failing to do so could hinder society’s trust in this revolutionary technology.

Shares: