Dispatches from the Future: Meta Connect 2024 Highlights from Karthik Kannan, CTO at Envision

October 2, 2024
Alt: Me in a navy blue suit standing in front of a large "CONNECT" sign with colorful abstract designs. Next to him is a decorative stuffed llama with colorful tassels. The setting appears to be an outdoor area with plants in the background.

Introduction

The past week has been eventful, a bit like stepping out of a time machine.

I just wrapped up my first ever Meta Connect in sunny San Francisco. It's a defining moment for Envision as we're partnering with AI at Meta in bringing AI to over 300 million blind and low vision people, across the world.

That's not all, there was a slew of announcements from open source AI to Mixed Reality Glasses that felt like the beginning of a new era.

It's hard to capture the excitement around the event but here are my main highlights!

My Agenda

Alt text: Conference presentation slide stating "Glasses are a new AI device category".
Day 0 (24/09):

Meta kicked it off with an exclusive reception, gathering the Llama team and cutting-edge AI developers. The room buzzed with innovation, from AI for Indian farmers to AWS-challengers slashing intelligence costs.

The unifying theme at the dinner was how much innovation open source AI fosters. Open source AI can be fine tuned for niche use cases, often at 1/10th of cost of deploying proprietary AI. This wasn't just Meta's vision—Amazon, Salesforce, and NVIDIA executives I met during the dinner, echoed their commitment to open source.

Big thanks to Prince Gupta and Helen Suk , our partners at Project Aria & Llama team for the invite. What exactly are our partnership plans? Stay tuned!

Day 1 (25/09):

Sunrise saw us joining the eager queue for the 10 AM keynote. We had front row seats to the future unfolding.

At Envision, we've been working on smart-glasses for people who are blind or have low vision for over 7 years. We’ve always said that this is the best way to experience AI is through smart glasses, that reality will start unfolding right now.

Karthik Mahadevan & I sitting in an audience at the very front row of the Connect keynote. We're wearing blue Connect lanyards.
Karthik Mahadevan & I sitting in an audience at the very front row of the Connect keynote. We're wearing blue Connect lanyards.

A highlight: AI at Meta spotlighted our work during the main Llama keynote. They showcased how Envision harnesses multimodal Llama to push accessibility boundaries, featuring us in a detailed blogpost.

Alt: A presentation slide showcasing Meta's AI technology. It features images of a woman wearing smart glasses, with a text overlay showing an AI-generated description of the scene: "It's a beautiful day and you can see a sunny Seattle skyline with modern buildings in the background." The slide demonstrates AI-powered scene description capabilities.

Evening of Day 1 was me presenting a technical paper outlining our work with Llama.

It was a personal stretch— learning to craft a robust academic poster in just a week. We got a tremendous response from all Connect attendees who dropped by, including folks from AWS, Citibank to bring Envision into the workplace.

Alt: Me standing at a table with a laptop, demonstrating to another attendee wearing a yellow shirt and glasses our work with the Project Aria Glasses.

Day 2 brought a more relaxed vibe. I spent time connecting with new acquaintances from the conference and exploring the VR demos. Despite being firmly in the 'in-person board games' camp, I was impressed by the quality of VR entertainment. The immersive experiences on display highlighted the technology's rapid advancement, blending realism with imagination in compelling ways.

The Big Takeaways

🕶️ Project Orion

Even though we've been tracking the evolution of these glasses for a while, I still find myself amazed. Orion is truly a marvel that we'll someday take for granted. These glasses are fantastic for accessibility because they offer an unobtrusive, hands-free way to experience independence.

One of the biggest challenges is perfecting the display while keeping the glasses lightweight.

Imagine this: a blind person able to interact with surfaces like touchscreens that were once inaccessible, guided by audio beacons overlaid through AR glasses.

Even if these glasses are still years away, the journey to create them is already delivering so many useful advances for the blind and low vision community. Take Envision Glasses, for example—they're a fantastic piece of hardware that's both easy to use and powerful enough to make a real difference for users today. It's only onwards and upwards from here.

Project orion

🦙 Llama 3.2 1B & 3B

Meta's latest Llama model introduces multimodal capabilities, combining both vision and text components—a significant leap over the text-only models.

Being completely open source(open weights) with a permissive license means developers can fine-tune it however they want. For me though, the standout feature is the smaller 1B and 3B models that perform incredibly well and can be deployed directly on devices like smartphones and laptops.

A powerful trend in AI is that the cost of deploying models is dropping to near zero. Sam Altman's proclamation that "intelligence will be too cheap to meter" is quickly becoming reality.

The 1B and 3B LLaMA models can be deployed at no cost, yet they enhance the entire AI pipeline by performing small, specialized tasks swiftly. And let's not forget, compact models like Meta's LLaMA and Google's Gemma are huge wins for privacy.

Envision plans to tap into Llama 3.2's vision capabilities to build specialized models that can, for example, better understand document layouts including charts, figures, images and tables present in them.

What's also exciting is that Llama 3.2 seems to be the first series of models trained on images from smartglasses, making it a perfect fit for Envision.

Alt: A graph titled "LLAMA 3 OFFERS THE BEST PRICE-PER-PERFORMANCE OF ANY MODEL" displayed on a large screen at a conference. The graph shows the cost per million tokens for various language models over time, with Llama-3 models shown as the most cost-effective.

The graph above is one of the most compelling visuals about AI you'll see this year. It highlights how the cost of intelligence, through a combination of hardware advances and open source development, has dropped by a factor of 240.

The Random Bits

It's not all work at tech conferences—I enjoy the random bits just as much.

🍪 This year, I indulged in Llama-themed and AI-generated cookies, and we even had a standup comedy show entirely orchestrated by Llama & 2 NYC comics!

lt: A hand holding a clear plastic bag containing a decorated sugar cookie in the shape of a llama. The cookie is white with colorful icing details, including a green and blue saddle. In the background, there's a wicker basket with more packaged cookies and a small ceramic llama figurine on a wooden surface.

🖥️ Meta's shuttles are coder-friendly with monitors and HDMI cables lying around; want to keep working on your way to work? Just plug in, connect to the local wifi and start coding... like me 🤣

Alt: The interior of a van or mobile workspace, showing a desk setup with two computer screens. One screen displays code, while the other shows text. A laptop is also visible. The workspace is positioned in front of a large window offering a view of an outdoor area with trees and buildings in the distance. The van's interior features modern amenities and appears to be a custom mobile office setup.

✨ I absolutely loved the Meta swag this year. Attendees could ask Meta AI to generate an image based on any prompt and have it printed on a tote bag. My prompt was two toddler AIs—one representing the East and one the West—hiking up a Californian landscape together. It wasn't any different than what I've seen with other image generation tools, honestly, but it's a nice conversation starter 🤷

Alt: A Meta Connect event badge and lanyard for Karthik Kannan from Envision AI, along with event pins and a card featuring two cartoon robots in a scenic forest setting at sunset.

Looking forward

2025 will be the year of AI on Glasses—a culmination of the past seven years we've dedicated at Envision, championing AI on glasses as the ultimate accessibility platform. Meta Connect 2024 marks the beginning of a tectonic shift, with Project Orion and Llama 3.2 standing out as game-changers for me.

Over the week, I connected with many more folks in the Bay Area. There's an electrifying energy in the air, a sense of great possibilities. As we stand on the brink of another revolution, it feels like we're not just witnessing the future—we're crafting it.