• Human In The Loop
  • Posts
  • Open-source AI rivals 🚀, instant pro videos 🎬, Copilot’s 3D avatars đŸ•ș

Open-source AI rivals 🚀, instant pro videos 🎬, Copilot’s 3D avatars đŸ•ș

đŸ› ïž Product Updates

Text

Google's Gemini AI is now in Gmail, offering email composition, thread summarization, and smart replies that perform solidly in early testing. The integration aims to streamline inbox management with features like cleanup prompts and enhanced searching capabilities. While the AI's response quality for composition and summarization impresses, users report significant inconsistencies with search functions and inbox management tasks, especially for complex queries across large email archives. Currently in its testing phase, Gemini in Gmail shows promise but remains unpredictable—signaling Google's commitment to AI-powered productivity while highlighting the work still needed before users can fully rely on these features.

DeepSeek's R1-0528 update has dramatically narrowed the gap with proprietary AI giants like GPT-4 and Gemini 2.5 Pro, while maintaining its open-source advantage. The 685-billion-parameter model now delivers comparable performance on math, coding, and logic benchmarks, with enhanced chain-of-thought reasoning and reduced hallucinations. What truly distinguishes this release is its full MIT-licensed accessibility—complete with weights, training recipes, and documentation on GitHub—allowing organizations to deploy locally or scale via APIs without the usage restrictions and token fees that plague proprietary alternatives.

ChatGPT has rolled out Operator, a new flight-booking assistant that seamlessly integrates with Bing Travel. The tool allows Pro users to simply describe their travel needs—like "direct daytime flight to Hong Kong for five with checked bags"—and Operator handles the rest, asking clarifying questions and filtering options in seconds. Currently available as a research preview for US subscribers, Operator works through its own browser interface, potentially bypassing login hurdles while uncovering hidden deals and planning complex multi-city itineraries—marking a significant step toward OpenAI's vision of AI as an everyday personal assistant.

xAI is rolling out a trio of enhancements to Grok Web, bolstering its daily utility and creative capabilities. The updates include a whimsical "Stars" idle animation, seamless Google Calendar integration for viewing events and reminders, and an "Images Explorer" for searching and organizing visual content across categories. These additions complement Grok's recently expanded capabilities in image generation and external tool integration, signaling Elon Musk's determination to transform Grok into a comprehensive AI assistant that balances productivity features with creative tools to challenge established competitors in the increasingly crowded AI platform space.

Google's Gemini 2.0 Flash model now integrates seamlessly with LangChain and Jina Search, enabling developers to build AI assistants capable of retrieving up-to-date information with proper citations. The new toolkit—requiring langchain-community 0.2.16 or higher—allows the AI to dynamically determine when external searches are necessary and incorporate real-time results into responses. This powerful combination addresses one of AI's persistent challenges: accessing current information beyond training data cutoffs. By configuring the model with a low temperature setting (0.1), developers can create assistants that deliver factually grounded answers while maintaining conversational fluidity.

OpenAI has enhanced ChatGPT with five power-user features that transform it from chatbot to productivity assistant. The desktop app now offers screenshot integration and voice mode, while the new Projects feature lets users organize related conversations with shared context. Temporary Chat keeps one-off queries from cluttering your history, and model selection (GPT-4o, GPT-4.5, or o3-mini-high) helps optimize for different tasks. Most impressive is Canvas, which enables side-by-side collaborative editing—letting you refine AI-generated content without starting over with each revision. These upgrades align with OpenAI's 2025 roadmap to position ChatGPT as a comprehensive digital agent.

Microsoft is quietly testing "Live Portraits" for Copilot, an experimental feature that would allow users to select visual styles for male and female AI assistant avatars. Currently in early development, the interface leads to placeholder voice conversation screens, suggesting Microsoft's vision for more personalized AI interactions. Internal references to "3D generations" hint at dynamically expressive characters that could enhance voice-based engagement. This development appears to be evolving alongside the rumored "Copilot Characters" concept, potentially offering users a spectrum from predefined personas to fully customizable 3D avatars during web-based Copilot conversations.

Audio

ElevenLabs has supercharged its AI voice agents by integrating Anthropic's Claude Sonnet 4 model, delivering conversational experiences with enhanced intelligence and responsiveness. The upgrade brings sophisticated instruction-following for complex commands, improved contextual memory for natural dialogue flow, and reliable API-driven tool usage for real-world tasks like flight status checks. Developers can now build more capable voice applications with less time spent on workarounds, accessing Claude Sonnet 4 directly through ElevenLabs' dashboard or implementing it via a single line of HTML. This integration positions ElevenLabs at the forefront of conversational AI, blending ultra-realistic voice synthesis with next-generation language understanding.

ElevenLabs has unveiled a multimodal conversational AI system that seamlessly blends voice and text inputs, tackling a persistent challenge in voice-only interfaces. The new system lets users dynamically switch between speaking and typing—particularly valuable when handling tricky alphanumeric data like email addresses or credit card numbers that often trip up voice recognition. Available through HTML widgets, SDKs, and WebSocket for real-time communication, the feature builds upon ElevenLabs' voice technology that already supports 32+ languages. This pragmatic enhancement promises to boost interaction accuracy and completion rates across customer service, healthcare, and financial applications.

Video

CapCut's new desktop AI Video Generator transforms how creators produce professional videos without editing expertise. The three-step tool converts text prompts into complete videos with visuals, transitions, text-to-speech, music, and effects in minutes. Users simply input their prompt, select a style (cinematic, vlog, business), and refine the AI-generated draft before exporting. The feature represents a significant leap for content repurposing, allowing blogs and articles to become engaging videos with minimal effort. Combined with CapCut's other AI capabilities like Voice Changer and Auto Captions, it's becoming a comprehensive solution for time-conscious creators.

Microsoft has just added OpenAI's Sora text-to-video model to Azure AI Foundry, creating a dedicated "video playground" for developers and businesses. This controlled environment lets users experiment with Sora's capabilities, refine prompts, and test video generation for specific applications—from marketing content to realistic simulations. The integration aims to streamline content creation workflows, particularly benefiting entertainment, advertising, and education sectors. By providing early access within this structured framework, Microsoft balances innovation with responsible AI development while strengthening its position in the generative AI landscape.

Image

Black Forest Labs has unveiled FLUX.1 Kontext, a powerful suite of AI image generation models that's turning heads in the generative AI space. The new family delivers high-quality visuals with exceptional prompt adherence and renders photorealistic images up to 8x faster than leading competitors. The suite includes specialized variants—[pro] for consistent character development across multiple turns, and [max] for speed and precision. Available on Replicate with 200 free credits for newcomers, this release from the German startup (founded by Stability AI veterans) signals a serious challenge to established players in the increasingly competitive AI image generation market.

Leonardo.ai just unveiled Omni Editing, an advanced AI image modification tool powered by FLUX.1 Kontext and OpenAI's GPT-Image-1. This integration brings real-time editing capabilities through a streamlined prompt bar, allowing users to instantly adjust colors, styles, and lighting with natural language commands. The standout feature? GPT-Image-1 enables blending elements from up to four source images—perfect for maintaining consistency in fonts or characters across projects. With multi-image support for FLUX.1 Kontext coming soon, Leonardo.ai positions itself at the cutting edge of AI image manipulation tools for both casual creators and professionals.

đŸ§Ș Use Cases

Ryan Carson unveiled a streamlined three-step AI coding workflow for solo founders, eliminating the need for large development teams. The system leverages specialized AI tools like Cursor and Model Control Plugins to transform detailed product requirements into functional code through context definition, task automation, and iterative feedback. By breaking complex projects into AI-manageable chunks, founders can now build sophisticated products while maintaining strategic oversight. Carson's approach tackles one of tech entrepreneurship's biggest barriers—development resources—making product creation more accessible to individual innovators in an increasingly AI-augmented development landscape.

OpenAI's GPT Store is showcasing specialized AI assistants that transform how users tackle specific tasks. Five standouts include Game Time, which demystifies complex game rules; Sora Video Prompter, crafting detailed prompts for OpenAI's video generator; Personal Color Analysis for fashion advice based on uploaded photos; Whimsical Diagrams for creating visual frameworks; and Movie Recommendations, which conducts nuanced conversations to deliver personalized entertainment suggestions. While the store contains its share of novelties, these purpose-built GPTs demonstrate how tailored AI models can deliver genuine utility in ways generic chatbots simply cannot.

💡 Insights

Coding

GitHub Copilot and Tabnine are carving distinct paths in AI coding assistance, with recent testing across ten real-world scenarios revealing their specialized strengths. Copilot, now featuring GPT-4.1 and Claude 3.5 Sonnet integration, excels at generating creative, context-aware code with polished documentation in popular languages at $10/month. Meanwhile, Tabnine's focus on privacy—offering offline modes and private codebase training—makes it the go-to for security-conscious enterprises, with broader language support starting at $39/month. The competitive divergence highlights AI's evolution beyond generic assistance to targeted solutions addressing specific developer workflows and organizational requirements.

GitHub CEO Thomas Dohmke envisions AI as programming's evolution, not its replacement. Speaking about tools like GitHub Copilot, Dohmke argues that AI elevates software engineering by handling repetitive tasks—allowing developers to maintain their creative "flow state" and focus on complex problem-solving. By enabling engineers to describe intent in natural language, AI shifts programming to a higher level of abstraction. Though AI can generate functional code independently, Dohmke emphasizes human oversight remains essential for security and efficiency, predicting engineers will spend more time on system design and refining AI outputs rather than mundane coding tasks.

Text

Google Gemini now outperforms ChatGPT in meal planning capabilities, according to recent user testing. While ChatGPT offers a more visually appealing format with colorful icons and concise lists, Gemini delivers superior practical value through comprehensive preparation instructions and detailed guidance. For single meal recipes, Gemini provides thorough breakdown and cooking methods, while its weekly meal plans include specific preparation steps and nutritional insights. Both platforms continue evolving their approaches to everyday tasks, with Gemini leveraging Google's ecosystem advantages and ChatGPT focusing on interface simplicity and shopping list generation.

⭐ Reviews

Google's new AI Mode for Search falls short against competitor Perplexity in early comparisons. Despite Google's data advantages, the implementation delivers underwhelming results—suggesting outdated phones when asked for sub-$1000 options and presenting travel itineraries as dense text blocks without visuals. Users report a cluttered interface with shopping links dominating results, while Perplexity offers cleaner layouts with tables, images, and map views. Google's awkward workflow requires navigating to separate sections, with difficult-to-access chat history compounding usability issues. While Google's iteration speed suggests improvements may come quickly, Perplexity currently provides a more mature, intentional AI search experience.

 

Reply

or to participate.