The “Nvidia & Langchain AI Agents contest” concluded on Tuesday at 2 in the morning, and I’m proud that I crossed the finish line 3 hours early with a little something:
Introducing 𝑨𝒊𝒍𝒆𝒆𝒏 𝟐: an AI office agent and my entry into the ‘NVIDIA and LangChain Generative AI Agents Developer Contest’ in the Small Language Model category. AI agents harness the reasoning, planning, and tool-use capabilities of language models to autonomously execute tasks. Aileen 2 amplifies this functionality by integrating visual perception, enabling it to effectively navigate and summarize legislative proceedings from Germany’s parliamentary TV Mediathek website — the initial use-case showcasing the system.
Submission materials:
- LinkedIn post
- YouTube video
- Github repo
- (PaliGemma gated checkpoint shared privately)
It’s not as fanciful as I wanted it to be originally as I hit some dead-ends and had to make some cuts in the interest of making the submission deadline, but I learned a lot along the way. As one of the follow-up tasks, I want to investigate why the PaliGemma foundation model gave me different results on Apple Silicon vs. Nvidia RTX.
[Update] 🏆