The Dawn of Autonomous AI

Crafting Tasks and Mastering Skills

Harsha Angeri
DataDrivenInvestor

--

On the Brink of Self-Learning Autonomous AI Machines

In the burgeoning field of Artificial Intelligence, a new class of autonomous agents is emerging, exemplified by concepts like BabyAGI & AutoGPT. These agents are not only designed to perform tasks but to also conceive them, given a broader objective. This blog post is a deep dive into the mechanics of how an AI can independently chart its path of learning, defining tasks, and subsequently acquiring the necessary skills to execute them effectively.

The Genesis of Self-Tasking AI

The potential of AI extends far beyond executing predefined tasks. The latest developments suggest AI can now set its agenda, defining what needs to be done to achieve a set objective. Inspired by the recent discourse on self-tasking AI agents​​ we can push the boundaries even further. Given an intent/ objective an autonomous agent can define the tasks required to get to the objective & perform that using tools. This has been covered before by me & will not be the focus of this blog.

For this blog, we assume that autonomous AI Agents can given an objective break it into tasks & use tools at its disposal to accomplish it. But what happens if for a task the agent does not have the tools/ skills to accomplish it. We humans will learn the skill & then accomplish the task. Can autonomous AI Agents figure out new skills & learn from it?

The Framework of Learning

Our journey began with the challenge of not just identifying but defining what constitutes skills in the context of a given tasks. We live in an increasingly API driven world. From paying money to modifying selfies there is an API for everything. These APIs are searchable and increasingly also aggregated in one place on websites like Hugging Face, Rapid API etc. The complete code to run the API including documentation is also provided. Here’s an example of a AI model hosted on Hugging Face capable of sentiment analysis on text. You can see the code to run this is a few lines and is provided on the page which a Large language Model (LLM) like GPT4/ Google Gemini can search & execute.

from transformers import pipeline

distilled_student_sentiment_classifier = pipeline(
model="lxyuan/distilbert-base-multilingual-cased-sentiments-student",
return_all_scores=True
)

# english
distilled_student_sentiment_classifier ("I love this movie and i would watch it again and again!")
>> [[{'label': 'positive', 'score': 0.9731044769287109},
{'label': 'neutral', 'score': 0.016910076141357422},
{'label': 'negative', 'score': 0.009985478594899178}]]

# malay
distilled_student_sentiment_classifier("Saya suka filem ini dan saya akan menontonnya lagi dan lagi!")
[[{'label': 'positive', 'score': 0.9760093688964844},
{'label': 'neutral', 'score': 0.01804516464471817},
{'label': 'negative', 'score': 0.005945465061813593}]]

# japanese
distilled_student_sentiment_classifier("私はこの映画が大好きで、何度も見ます!")
>> [[{'label': 'positive', 'score': 0.9342429041862488},
{'label': 'neutral', 'score': 0.040193185210227966},
{'label': 'negative', 'score': 0.025563929229974747}]]

Hugging Face for example has >400000 models with code & APIs. In the internet world we will define skill as the ability to execute a specific task using such an API/ Code block.

In our experiment, the Autonomous AI Agent was prompted to explore the repository of Hugging Face, identifying skills that would equip it to analyze a given task, check for existence of executable code & note down that it has a new skill. The first task provided was to give insights on a “video of a horse running”. The agent running on Google Gemini (was also run on GPT4 to ensure repeatability) defines a list of skills it would require, searches Hugging Face, looks for code/ APIs and lists it down. An API call that returns a success can be considered as a skill that works.

The Dynamic Learning Matrix

The AI’s quest for knowledge was cataloged in a matrix that outlined each skill/ sub task, its source code & models and proficiency level/ complexity. This matrix was the AI’s roadmap, guiding it through the process of skill acquisition.

Skills Acquired to analyze Video of a Horse Running is provided below:

Pic by Author

Expanding the AI’s Horizon

The AI didn’t stop there; it went on to identify and learn skills for analyzing images, understanding blogs on coding techniques, and dissecting Spotify playlists. 4 diverse tasks were provided to the agent and it learnt more than 30 skills. Each of these domains required a unique set of skills, all of which were meticulously documented:

Pic by Author

The Visualization: Data-Driven Insights

The line charts attached below present a visual narrative of the AI’s learning trajectory. The x-axis represents the order in which the skills were learned, while the y-axis quantifies the weighted cumulative proficiency (defined basis complexity), showcasing the AI’s evolving expertise.

Pic by Author

Below is the same chart split as sub plots to show how our AI Agent learnt skills of various complexities.

Pic by Author

Skill Diversity

The range of skills the agent acquires is amazing. For the first task it acquires audio analysis to identify the sound of horse running while it doesn’t do this for the “Image of Arsenal football star” as there is no sound in an image unlike a video. Google Gemini/ GPT4 LLMs are in-built with this ability to decipher & that makes this all the more fascinating. The third task, insights from a “blog on coding techniques”, prompted the agent to acquire not only NLP skills like topic extraction and sentiment analysis but also code style checking, anticipating the presence of code snippets in the blog — truly remarkable.

The Proclamation: The Future is Self-Learning

This exploration into the realm of autonomous AI agents demonstrates a significant leap in our understanding of AI’s capabilities. These agents promise a future where AI can not only carry out complex tasks but also define the scope of its objectives and independently charter the course of its own learning journey. It can acquire skills from the broader internet & evolve.

This shift also has profound implications for software architectures. Today one can think of various features in a software as skills. These are pre-defined and shipped. In the near future there will be a bunch of autonomous agents that given an intent/ objective will learn the relevant skills & execute. Nothing needs to be pre-defined. The architecture of every software will fundamentally change.

As we stand on the precipice of this new era, we can reimagine every machine (software or hardware) to become self-directed autonomous AI agents. The world will have self-updating apps, evolving social media platforms, adaptive to research healthcare, infinite story gaming etc. Their ability to define tasks and acquire new skills autonomously is not just a testament to technological progress but a beacon of a future where AI’s potential is truly limitless.

“Autonomous AI Agents won’t just be tools in our hands; they will become the architects of a new era, where every facet of technology, from the smallest line of code to the most complex systems, is dynamically conceived, executed, and evolved by these self-learning entities.” — Author

Visit us at DataDrivenInvestor.com

Subscribe to DDIntel here.

Have a unique story to share? Submit to DDIntel here.

Join our creator ecosystem here.

DDIntel captures the more notable pieces from our main site and our popular DDI Medium publication. Check us out for more insightful work from our community.

DDI Official Telegram Channel: https://t.me/+tafUp6ecEys4YjQ1

Follow us on LinkedIn, Twitter, YouTube, and Facebook.

--

--

Technology evangelist and entrepreneur who has built multiple commercial high tech businesses. Deeply passionate about AI and music.