My journey in understanding AI began with a simple yet daunting task: integrating an AI assistant into my B2B marketplace platform. I was a fresh-faced programmer, eager to infuse our product with the magic of machine intelligence. The initial vision was straightforward – a chatbot to help users navigate listings, answer FAQs, maybe even negotiate deals. But very quickly, reality set in. In our early tests, the AI felt amnesiac. It would cheerfully answer a question about a product, but then act as if it never had that conversation moments later. It became apparent that a truly functional AI assistant couldn’t just be a stateless responder; it needed to behave like a state machine, carrying forward some memory of interactions. In other words, our AI needed persistent memory – a context that updated after every user query and response, just as a human would remember the gist of an ongoing discussion. This was my first major realization, and it hit hard: large language models (LLMs) like the ones powering our assistant “do NOT inherently remember things” on their own blog.langchain.dev. To avoid being that forgetful coworker who asks the same questions over and over, our AI had to be engineered with an external memory mechanism from the ground up.
Grappling with Memory: Beyond the Context Window
Emboldened by this realization, I dove into the world of AI memory solutions. I wasn’t the first to face this problem, of course. Entire libraries and frameworks had sprung up to tackle it, with LangChain being one of the most prominent. LangChain provided “memory” components that could store conversation history and inject it into each prompt. On paper, it sounded like exactly what I needed. I imagined plugging in LangChain’s ConversationBufferMemory and solving our issues overnight. In practice, I ran into walls. The most glaring issue was the context window limit. Even with the latest models boasting 16k or 32k token contexts, a busy user could blow past that with a long chat or detailed product specs. Sure enough, simply keeping every message in memory quickly led to hitting context size limits, even on advanced LLMs pinecone.io. The framework dutifully truncated old messages when the buffer got full, but this meant our AI started forgetting earlier parts of the conversation – the very thing I was trying to prevent!
I then experimented with vector databases and retrieval-augmented generation for long-term memory. The idea was to embed conversation snippets and store vectors, retrieving them when relevant. This added complexity (stood up a Pinecone DB, dealt with embedding models) and tuning it was non-trivial. I chuckle now recalling how I tried to get LangChain’s SummarizerMemory to condense old chats into summaries. The summaries often ended up so abstract that they were useless as recall. After weeks of tinkering, I had to admit: our memory solution was unreliable. LangChain’s abstractions were powerful but overwhelming; the modular design combined with external services introduced so many points of failure that it felt like wrestling an octopusblog.milvus.ioblog.milvus.io. Each new approach – whether a different memory class or a custom hack – led to either insane token usage, latency issues, or subtle bugs where the AI would contradict itself because it forgot a detail that was genuinely conveyed earlier. As one developer aptly put it, these LLM tools often end up needing “heavy oversight, like mentoring an overenthusiastic junior developer”linkedin.com. I felt exactly like a mentor to an AI intern with severe short-term memory loss.
LangChain did teach me one valuable concept: memory is application-specific. Their blog pointed out that what to remember (and how to use it) depends on the use-case – a coding assistant recalls different things than a legal Q&A botblog.langchain.dev. I realized I needed to define clearly what our agent should remember about marketplace interactions (user preferences? prior products viewed? common clarification questions?) and design memory around that. Despite my struggles, this insight reframed memory not as a one-size module, but as a tailor-fit component of the agent. I didn’t abandon the quest – far from it. But I stepped back from off-the-shelf solutions and started devising a custom memory module for our AI, one that stored key user preferences and session data in our own database after each interaction. In essence, I was manually implementing the state machine: the AI’s state transitioned and persisted between turns. This was messy but enlightening. By the time I had a rudimentary persistent memory working, I had developed a healthy skepticism for “plug-and-play” memory solutions and a deep appreciation that true long-term memory in AI is still an unsolved challenge. My iterative, trial-by-fire approach to memory laid the groundwork for ideas I’d refine much later (more on that at the end of this story).
Chasing the New Coding Agents: Cursor, Copilot, Replit, Windsurf – Hype vs. Reality
Around the same time, the AI world was buzzing with the rise of AI coding assistants – tools that promised to revolutionize programming itself. As a young developer, I was naturally curious (and admittedly, a bit envious). Products like GitHub Copilot had already gained traction, and soon more advanced “coding agents” appeared: Cursor (an AI-paired code editor), Replit’s chat-based programmer (part of their Ghostwriter suite), a new open-source IDE assistant called Windsurf, and various others. These systems weren’t just autocomplete; they claimed to understand your project, make multi-file edits, debug, even run tests. I eagerly tried several of them, dreaming of an AI pair-programmer who could take on boring tasks and let me focus on creative coding. The results were… mixed.
First, the memory issue reared its head here as well. In theory, a coding agent working in an IDE should have a persistent workspace context – it has access to your files, so it shouldn’t forget function names or earlier steps it took. In practice, I saw the same kind of forgetfulness I’d battled in chatbots. My experiments with Cursor were especially telling. I’d watch it generate a chunk of code, create a debug log file to record some output as I asked – and then in the next step it would totally ignore that log file it just wrote, acting as if it didn’t exist. In one session, Cursor dutifully saved HTML output to a file as I requested, then minutes later it started printing the HTML to console, “figuring out the format” from scratch, not recalling the file it had created. It was maddening and oddly comical – the AI was literally forgetting what it did 5 minutes ago. I encountered similar memory blips with Replit’s agent; a Reddit user’s review echoed what I saw: the agent “starts very strong, but has trouble finishing any task as the sessions run out of memory.” In longer coding sessions, these tools would lose track of the project goals or past fixes, sometimes looping or suggesting things we’d already done. Clearly, robust long-term memory remained an open challenge in agents, even those embedded in coding environments.
The second problem I noticed was a misalignment with the user base and their real needs. This one is a bit more subjective, but let me explain. A lot of these AI coding assistants felt like they were designed to show off impressive capabilities rather than to smoothly augment a developer’s day-to-day workflow. For example, I loved that Cursor and Windsurf could generate entire functions and even multi-file scaffolding from a prompt. But when it came to debugging or refining that code, they often stumbled. They didn’t truly collaborate in the way an experienced human might; they’d either regurgitate documentation or confidently make changes that broke something else. A LinkedIn tech post I read compared these agents to overenthusiastic junior developers – high energy, fast output, but lacking judgment. That struck a chord. These tools sometimes felt misaligned with the pace and caution of professional development. They might appeal to a newbie looking to build an app quickly, or to a hacker automating small tasks. But in a production codebase, their unfiltered, sometimes reckless suggestions (like duplicating code or mis-refactoring in the Cursor examples) made me and my teammates uneasy. The user base of professional developers values reliability and predictability; having an AI dive in and make changes “autonomously” sounds cool but can violate the expected workflow. I realized that for many devs, an AI that suggests and explains is welcome, but one that silently edits multiple files or introduces unknown side-effects – not so much. The design of these agents hadn’t fully reconciled with what users (developers) are comfortable with. In short, there was a misalignment between the hype of autonomy and the reality of what users find useful.
Thirdly, I observed these coding agents had restricted module workflows – by which I mean they were constrained in the types of tasks or “modules” they could perform in sequence, limiting their usefulness. For instance, both Cursor and Windsurf touted “agent” modes, but as one reviewer noted, it was questionable if they were truly agents in an autonomous sense builder.io. They often could generate code across files (great!), but they didn’t have a robust feedback loop. If I asked, “Now run the project’s tests and fix any failing cases,” they typically couldn’t (at least not without clunky workarounds). They also couldn’t easily step outside the IDE’s confines – for example, opening a browser to verify something, or updating a config on my behalf. Each tool had a predefined set of things it could do (write code, maybe read error output, suggest a commit message, etc.), but if you needed something just a bit beyond those built-in modules, you were out of luck. In essence, the workflows felt rigid. One blog pointed out the lack of a “robust debugging loop” in these products – they don’t iterate until correct by themselves. A true agent, as the article argued, would try an approach, evaluate the result, and iterate. Cursor and its peers didn’t really do that; if the code they generated had a bug, resolving it was largely back on me. It was clear that the promise of a self-improving coding agent was still unfulfilled – their autonomy was constrained by their workflow design.
Despite these critiques, I don’t mean to sound pessimistic. On the contrary, I was (and remain) excited by these tools. I saw glimpses of what could be the future – a world where AI agents handle the boilerplate of coding, or where I could high-level instruct “build me a simple analytics dashboard” and the agent handles everything from setting up the project to writing tests. Those glimpses kept me motivated. In fact, these experiences taught me valuable lessons about how memory, alignment with user needs, and flexible autonomy are key to any AI agent’s success. It also got me thinking about the bigger picture: if small startups could build these semi-functional agents, what could the tech giants accomplish with their resources? This train of thought naturally led me to consider the positions of companies like Microsoft, Google, and others in the “agentic AI” race.
The Unlikely Kingmaker: Microsoft’s Overlooked Advantage
By mid-2024, one thing became obvious to me: Microsoft was quietly assembling a fortress in the AI-driven development space. As a young programmer, I had grown up with VS Code and GitHub – indispensable tools in our community. Microsoft owned both. And then there was GitHub Codespaces, essentially cloud-hosted VS Code environments, and Azure’s growing portfolio of AI services. I started to suspect that if Microsoft could seamlessly integrate all these pieces – GitHub’s code repositories and issue trackers, VS Code’s editor and extensions, Codespaces’ cloud dev setups, and Azure’s AI models (including OpenAI’s models which Microsoft had exclusive access to) – they could dominate the entire AI developer lifecycle. Imagine: you open your project in VS Code, and an AI (like Copilot on steroids) already knows your project context from GitHub, can spin up a cloud environment instantly (Codespaces), can run and test in the cloud, and uses powerful models to not only suggest code but manage project tasks. This one company could provide the end-to-end pipeline, from coding to deployment, all enhanced by AI at every step. It was a chilling and exciting thought.
I wasn’t alone in that realization. Hints of this strategy surfaced in discussions online. Some pointed out that Visual Studio Code, while ostensibly open-source, had certain closed components – possibly to ensure Copilot and Microsoft’s cloud integrations worked better on the official version news.ycombinator.com. Conspiracy or not, it’s true that Copilot worked out-of-the-box in VS Code, and not as smoothly elsewhere for a while. Microsoft was positioning their editor as the place where AI coding happens. There was even speculation (half-joking) that Microsoft’s true aim was to “watch every keystroke, borrowing your code… through the editor and GitHub” news.ycombinator.com – a reminder of the data control a fully integrated ecosystem would give them. Setting aside the cynicism, I saw a more straightforward competitive angle: if Microsoft unified GitHub, VS Code, Codespaces, and Azure AI, it could create an AI development experience others would struggle to match. None of the startups like Replit or Cursor had all those pieces – they either lacked the vast code knowledge of GitHub, or the polished editing experience of VS Code, or the compute infrastructure of Azure. Even giants like Google didn’t have a popular code editor or something like GitHub’s network effect among developers.
This “aha” moment made me slightly wary of investing too heavily in the smaller tools. Why learn the quirks of Windsurf or a niche AI IDE if, potentially, a year from now VS Code offers the same or better capabilities natively? And indeed, Microsoft was moving fast: GitHub Copilot X was announced with chat and CLI integration; Azure was offering OpenAI model APIs; Windows was adding AI features in the terminal and elsewhere. It felt like Microsoft was the sleeping giant in AI agents for devs – not as flashy as OpenAI’s ChatGPT launch, but slowly knitting together an unrivaled toolkit. I began to discuss this with peers, almost wanting to warn the cool new AI startups: “Watch out, if Redmond gets its act together, they have an all-in-one advantage.” If I, just a curious dev far from the centers of power, could see it, surely those companies saw it too. I recall an essay noting the “last 5 years of Microsoft ain’t gonna be the next 10 years” – hinting that Microsoft’s trajectory was changing. For once, this might mean steering into a more closed, vertically integrated approach (quite unlike the old embrace-open-source Satya image). It made sense: owning the whole stack could yield huge rewards in the AI age.
As a result, I slightly shifted my strategy. While I kept playing with new agents, I also doubled down on mastering the VS Code ecosystem and GitHub’s emerging AI tools. It seemed likely the eventual “winner” platform for coding AI would be linked to those. My broader takeaway here was that in the AI race, having all the pieces (data, tools, distribution) is incredibly strategic. This realization soon extended beyond coding. What about agentic AI in general, not just for coding? What pieces would a company need to dominate that? This question sent me down a more theoretical and philosophical path, pondering the foundations of agentic systems and who has them.
Foundations of Agentic AI: A Personal Deep-Dive
I started researching what makes an AI system truly agentic – meaning able to act autonomously in pursuit of goals, across diverse tasks. In essence, what would it take to build an AI “agent” that feels more like a coworker or assistant that can do things for you, as opposed to just answering questions? My journey here was both technical and philosophical. I read papers, tried small experiments, and even revisited some classic AI thinking. One influential idea came from the realization that humans possess different kinds of memory and reasoning; one paper (CoALA, I recall) mapped human procedural, semantic, and episodic memory to agent memory types blog.langchain.dev. It dawned on me that an agent needs multiple layers of memory: short-term for immediate context, long-term for knowledge, and maybe a working memory for planning. This reinforced what I had learned hands-on. It wasn’t enough to only bolt on a vector database and call it “memory” – an agentic AI likely needs a spectrum of memory systems working in harmony (I’ll return to this idea in my conclusion, because it became a pet project of mine).
On the philosophical side, I stumbled on a compelling warning from investor Peter Thiel. He argued that AI, unlike crypto, is a centralizing technology – it tends to concentrate power rather than distribute it hughhewitt.com. That stuck with me. If Thiel was right, whoever built the dominant agentic AI might wield enormous centralized power – knowing everything you ask, controlling which information or actions the AI takes on your behalf, etc. It’s both an opportunity and a risk. Thiel even fretted about governments using AI for totalitarian control, and emphasized being more worried about “surveillance AI” than hypothetical AGI doom scenarios. Indeed, one can see why: an agentic AI that becomes your life’s digital assistant could, if misused, track and influence you in subtle ways. This made me realize that the architecture of agentic systems shouldn’t be overly centralized in control – a healthy ecosystem might need open alternatives, user control, and transparency to avoid a single entity (be it a company or government) from having a monopoly on “intelligent agents”.
Bolstered by this perspective, I spent evenings concocting what I considered the core pillars of an agentic system. I asked myself: what are the fundamental components without which an AI agent cannot function effectively or safely? After much thought (and scribbling diagrams on my dorm whiteboard), I identified four pillars: Knowledge Base, Search Engine, Network, and User Interface. These corresponded to (1) having a solid foundation of static knowledge, (2) the ability to query dynamic information, (3) access to real-time context or social data, and (4) a way to interact with users fluidly. In my view, any serious agent would need all four. I’ll explain each pillar in a moment – why it’s critical and who (which companies or projects) have strengths in it. This personal research was my way of structuring the chaotic landscape. It also helped me gauge where various AI players stood – who was missing a pillar, who had a full stack, etc. Little did I know, around the same time, industry developments were mirroring these thoughts (Google, for instance, started talking about “agentic AI” with their Gemini launch theverge.com, which made me grin that I was on the right track).
Before diving into the four pillars, I want to mention an inspiring development that occurred along my exploration: DeepSeek R1. In late 2024, I got wind of an open-source project called DeepSeek releasing a model (they codenamed it “R1”) that purportedly rivaled the capabilities of the best proprietary models. Skeptical but intrigued, I tried it out once it was released (early 2025). To my surprise, the claims held up: DeepSeek-R1 achieved performance comparable to OpenAI’s latest (the so-called GPT-4.1 or “o1”) on math, code, and reasoning tasks, and it was fully open-source huggingface.co. This was a big deal – it suggested that top-tier AI might not remain the exclusive domain of a few trillion-dollar companies. The project’s technical report explained how they used large-scale RL to imbue the model with reasoning skills, and they open-sourced not just the model but also smaller distilled variants. Seeing open-source innovation at this level inspired hope in me: maybe the future of agentic AI won’t be entirely centralized after all. If communities can rally to build models like DeepSeek R1, then the playing field might stay more balanced. I imagined a future where I could download a powerful “agent brain” and run it on my own server, plugging in the four pillars I identified, without needing to rely on a big corporation’s cloud. DeepSeek R1 was a bright light in that direction, and it energized me as I continued my agent research.
Now, with that context, let me break down the four foundational pillars I see for a good agentic system – and how they manifest in today’s technology landscape.
Figure: An agentic AI system (top) supported by four key pillars: a Knowledge Base for static info, a Search Engine for dynamic queries, a Network for context-rich real-time data, and a User Interface for human interaction. The agent draws on all four to perceive, reason, and act effectively.
1. Knowledge Base
The Knowledge Base is the pillar of static, structured information. This includes things like encyclopedic knowledge, textbooks, databases, company documentation, personal notes – essentially, content that doesn’t change every second and can be organized for efficient retrieval. Why is this critical? Because any intelligent agent needs a solid ground to stand on. Humans have long-term memory of facts and experiences; analogously, an AI agent should have access to a repository of vetted, reliable information. If I ask my agent “What’s the capital of Australia?” or “What products does our company offer in the Europe market?”, I expect an immediate correct answer – not for it to go search the web every time. That’s the role of the knowledge base: to serve as the agent’s internal library.
Most current AI assistants actually rely heavily on their knowledge base – though it’s often baked into the model’s training data (for LLMs like GPT-4, the “knowledge base” was essentially the entire internet up to a cutoff date!). But there are more explicit forms. For example, tools like Wikipedia or proprietary databases can be hooked up. In enterprise settings, the knowledge base might be your Confluence wiki or manuals. The key is that this information can be structured (in tables, graphs, documents) and often curated. Unlike the open web, it’s a finite set of content that an agent can trust to some degree.
When I think of who excels in this pillar, one obvious answer is Wikipedia for general knowledge – it’s static, vast, and structured via hyperlinks and categories. Many AI systems use Wikipedia as a backbone for factual Q&A. Also, databases like WolframAlpha (for factual queries, math, science) come to mind – though not as commonly talked about in the agent space, they represent structured knowledge that can be queried. On the personal side, I’ve been carefully curating my own notes and documents, hoping to one day plug them into an AI assistant as a personal knowledge base. I envision a not-too-distant future where each user might have their own private knowledge base (their emails, files, etc., that they permit an agent to use) so the agent can recall your specific facts (“your flight last week was on Tuesday at 5 PM”) as readily as general facts.
One challenge with knowledge bases is keeping them updated (they can become stale), but that’s where the next pillar comes in – search. However, the structured nature of the knowledge base also means an agent can do more than retrieve – it can reason over it. I’ve seen early examples where an agent uses a SQL database as a knowledge base, running queries to answer analytical questions. This structured querying is powerful and something purely generative models struggle with (they’d rather guess than precisely query). Thus, a good agentic system should integrate structured knowledge bases and use them appropriately – e.g., do arithmetic or lookups via a database rather than trying to have the neural net approximate it. My own attempts with this were modest (I had my marketplace bot fetch product data from our DB instead of relying on memory), but even that made it far more accurate on inventory queries.
In summary, the Knowledge Base pillar gives the agent depth and consistency. It’s the long-term memory and reference section. Without it, an agent either hallucinates or has to constantly rediscover facts. With it, the agent gains a reliable backbone of understanding about the world or its domain.
2. Search Engine – Dynamic Data Retrieval
If the knowledge base is the agent’s memory, the Search Engine is its vision and hearing – the way it actively gathers up-to-date information and scans the vast, unstructured world of data out there. Even a brilliant knowledge base will have gaps or outdated info. For anything current (“What’s the weather now?”, “Latest price of Bitcoin?”, or “Find me reviews of this new product”), the agent should turn to a search capability. In my view, an agent without search is like a scholar locked in a library that was last updated a year ago – knowledgeable but oblivious to new developments.
The search engine pillar involves a whole infrastructure stack: web crawlers to gather data, an indexer to organize it, ranking algorithms (often learning-based these days) to surface the best results, and increasingly, an integration with browsers or APIs to fetch the actual content behind links. This is an area where Google is king. It made me appreciate just how formidable Google’s infrastructure is; they have “crawlled” and indexed hundreds of billions of pagesinformationweek.com, and maintain that index in (near) real-time for news. An agent tapping into Google (or Bing) gains an instant knowledge of the live internet that no static model can match. This is why even ChatGPT had to add a browsing plugin, and why Bing integrated GPT-4 into its search – combining LLMs with search is potent.
Google’s own evolution has been fascinating. They launched NotebookLM, an “AI notebook” that can incorporate search results and your documents to help answer questionsen.wikipedia.org. And then, with their Gemini model in late 2024, they introduced Deep Research (internally also called “DeepSearch” by some) which effectively uses Gemini to autonomously search the web and compile answerstheverge.comtheverge.com. As one Verge article put it, Google’s new AI could “scour the web on your behalf and write a detailed report”theverge.comtheverge.com. This is precisely an agentic behavior: the AI decides what to search, clicks through results, and synthesizes an answer with sources. In tests, raters preferred Google’s Gemini 2.5 Deep Research outputs over other solutions 2-to-1blog.google – a testament to Google’s strength in this pillar. They have the search infrastructure (the best index, arguably) and they are marrying it with LLMs effectively.
But it’s not just about web search. In a broader sense, this pillar is about an agent’s ability to query any external system for information: be it a web search, a database query, or an API call. For instance, an agent might use a search module to find relevant emails in your inbox when you ask “When did John confirm the meeting?”. That’s searching a personal index. Or it might search a graph database to reason about a knowledge graph, etc. The general principle is the agent isn’t stuck with what it internally knows; it can actively seek information.
One must note, search needs to be combined with verification. A smart agent should cross-check or quote sources (as Bing Chat started doing with citations). My experiments with hooking up an LLM to Google Search via API were eye-opening – the model could quickly get lost or spam queries without good planning. Google’s infrastructure advantage here isn’t just data, but also things like safe search filtering, and understanding of which sources are reliable (their ranking algorithm encodes a ton of human knowledge on quality). Even emerging tools like Perplexity.ai’s engine or OpenAI’s own “browse” mode use Google/Bing under the hood because building a good search index from scratch is a herculean task. This made me realize: any team aspiring to full agentic AI needs either a partnership with a search provider or to develop a specialized search for their domain.
To recap, the Search Engine pillar gives an agent currency and reach – access to the latest information and the ability to find needles in the haystack of data. An agent uses search not just to answer questions, but to gather evidence, discover new knowledge, and stay aligned with reality (reducing hallucinations by checking facts). This pillar, coupled with the knowledge base, forms the agent’s information diet – one static and structured, the other dynamic and broad.
3. Network – Social and Real-Time Context
The Network pillar is a bit more abstract, but I consider it the layer of real-time, context-rich interaction and social intelligence. What do I mean by this? Essentially, the agent’s connection to other entities – whether other agents, other people, or live data streams – and its ability to understand the subtleties of those interactions. This pillar encompasses things like social media feeds, chat networks, sensor networks (if you think IoT), or multi-agent communications. It’s about being plugged into the world’s current pulse and the messy realm of human context.
Why is this separate from the search engine? Because not all information is easily searchable or static. Some of it is the ongoing conversation. Think of Twitter (now X) – a firehose of real-time chatter; or Slack/Discord – where group conversations happen; or even the environment of a game or simulation with multiple agents interacting. To be truly agentic and helpful, an AI might need to tap into these streams. For instance, an agent that acts as your social media manager must read and react to tweets. An agent that’s a stock trading assistant might monitor market sentiment on news and Twitter. The Network pillar represents this capability.
From a technical standpoint, this is the hardest layer in my opinion. It’s one thing to fetch static info; it’s another to interpret context and dynamic human behavior. This is where social intelligence comes into play – understanding slang, trends, behavioral cues, even empathy. I recall reading that ChatGPT-4 outperformed some humans on certain social reasoning testsmedium.com, which is impressive, but still, AIs often spectacularly fail at understanding context or unspoken implications. Engineering social intelligence means delving into psychology and sociology. Human relationships, cultural nuances, timing, humor – these are complex. No surprise, many think this is the last frontier AI will conquer.
I saw this difficulty first-hand when experimenting with hooking an AI agent into a Slack channel for our team, as a kind of helper bot. The bot could answer questions fine (knowledge/search), but when people started joking or when two people gave it conflicting instructions, it got very confused. It lacked the nuance to navigate even a small team’s social dynamics. Similarly, consider personal assistant AIs: to schedule a meeting, the AI might need to negotiate a time between multiple people. That’s a social process (who is higher priority to accommodate? what implicit preferences does your boss have? etc.). Hard stuff!
However, it’s essential. Social and networked data provide context that pure documents can’t. It’s not just about being polite or empathetic (though that’s part of user experience), it’s also about understanding intent. A user’s tweet saying “Ugh, I’m so done with today” – should an agent interpret that as frustration and maybe proactively offer help or words of comfort? Possibly, but only a very attuned one could.
In terms of major players, this Network pillar is one where no single company has a clear lead, but some have unique assets. X (Twitter), if leveraged by Musk’s xAI for example, could feed into an AI system (“DeepSearch” was rumored from xAI to use Twitter’s data – perhaps that’s what the prompt references). Meta with Facebook/Instagram data, or WeChat in China, similarly have rich social graphs and content. But privacy and ethical issues loom large here – an AI that’s too plugged into personal networks could be creepy or manipulative. This might be why we haven’t seen as much direct integration of LLMs with personal social data yet (aside from some chatbots in Messenger/WhatsApp experiments).
It’s worth noting that this pillar also implies multi-agent systems – networks of AIs. Some researchers are letting LLMs talk to each other to solve problems, essentially forming a networked society of minds. There was that famous paper where AIs role-played villagers in a simulated town and produced very human-like social interactions. Those kinds of experiments show the potential, but also complexity: you might then need an agent-of-agents to monitor emergent behavior (so they don’t, say, conspire to do something unexpected!).
Ultimately, I labeled this pillar as the hardest to engineer because it combines real-time data overload with the deepest levels of human complexity (emotions, relationships, context). It’s an area that might require breakthroughs not just in AI algorithms but in understanding human behavior. For now, I think of it as the differentiator between a merely useful assistant and a truly compelling companion. The latter would need social intelligence, not just factual intelligence.
4. User Interface – UX & Interaction
Last but not least, the User Interface pillar – the point of contact between the human and the AI. Even the smartest agent is pointless if a user can’t effectively interact with it. This pillar is about UX design, modalities of interaction, and overall user experience. Early on, I took for granted that a chat box (like ChatGPT’s interface) was the pinnacle of AI UX. But as I explored more, I realized UI can make or break the usefulness of an agent.
Consider how different it feels to use voice (like talking to Siri/Alexa) versus typing to ChatGPT versus clicking buttons in a UI (like choosing from suggestions). Each modality has strengths and weaknesses. A voice agent is hands-free and natural in some contexts (driving, cooking), but terrible for complex outputs (it’s exhausting to listen to a long answer or impossible to display a chart via voice). A text chat is great for flexibility but can be inefficient for certain tasks (e.g., editing a large text document via chat commands is not ideal; a graphical UI is better). Therefore, an agentic system should ideally support multiple UI modes: chat, voice, visual display, perhaps even GUI manipulations.
OpenAI’s Canvas was something I heard whispers about – an experimental interface where you could have a canvas to place text, images, and have an AI assist you in a more freeform way than linear chat. I recall it being mentioned in developer circles as a next step beyond the plain chat box. The idea of a canvas or whiteboard-style interaction with an AI (where the AI can draw diagrams or you can organize thoughts spatially) is super exciting. It could be great for brainstorming or design, for example.
Anthropic’s Artifacts was another rumored project (the prompt references it) – perhaps a way to store and visualize the “artifacts” of an AI’s reasoning (like the chain-of-thought or references it used). If real, that indicates a focus on transparency and UI for AI reasoning. It aligns with the notion that users might trust AI more if they can see its work, like an interactive notebook.
Then there’s Apple – who, despite being relatively quiet on generative AI until 2024, has a massive UI advantage. Apple’s whole ethos is great user interface. With rumors of them working on on-device LLMs and a unified Apple Intelligence system announced in iOS 18apple.com, I thought Apple could lead on intuitive AI UX. They introduced features like system-wide writing assistance and an on-device personal context engineapple.com. Apple touted privacy (doing a lot on device) and seamlessly blending AI into apps, “to help users do the things that matter most” as Tim Cook saidapple.com. However, as of now, Apple hasn’t delivered a front-and-center AI assistant beyond the familiar (and limited) Siri. Siri did get an AI makeover with some generative capabilitiestechcrunch.com, but Apple’s been cautious. There’s potential: imagine an LLM-powered personal agent on the iPhone that can see through your camera, speak in Siri’s voice, and manipulate your apps – an agent that could, say, compose a message, book a calendar event, or create a Pages document by conversation and touch. Apple could revolutionize UI there, but whether they will is the question. So far, it seems they are incrementally adding AI rather than launching a bold new agent interface.
In the startup realm, a company called XAI (Elon Musk’s AI venture) supposedly was building something called “DeepSearch” or a new kind of AI-centric social network. If they combine it with Twitter/X, we might see novel UI paradigms – like an AI that lives in your social app, or a feed that is a conversation with an AI about the world’s news (a bit like character AI bots, but for news commentary? Pure speculation here).
From what I’ve seen, the UX imperative cannot be understated. Many users bounce off AI tools if they’re too hard to use or don’t integrate into their flow. For instance, developers loved GitHub Copilot because it integrated into VS Code quietly – no extra UI, it just completed code as you type (genius design for that context). On the flip side, I’ve seen potentially great AI tools falter because the interface was clunky (too many steps, or you had to copy-paste back and forth between the AI and your work). As we move forward, I suspect we’ll see a lot of innovation in UIs: things like AI copilots that sit alongside your applications (the way Microsoft is embedding Copilot across Office apps), mixed reality interfaces (talk to an agent that appears as an avatar in AR glasses?), or even physical embodiments (home assistant robots) which then open a whole other can of worms in UI design.
To conclude this section: a good agentic system is supported by all four pillars – a strong knowledge base, a powerful search engine, networked social/context awareness, and a user-friendly interface. You really need all of them to deliver a magical experience. As I assessed different players: Google, for example, has Knowledge (their index + Gmail/Docs for personal), Search (obviously), some Network (YouTube, maybe limited social though they failed at Google+), and decent UI experiments (Assistant, and now Gemini’s interfaces). Microsoft has Knowledge (via GitHub, and LinkedIn data too), Search (Bing, not as strong as Google but decent, plus they now integrate GPT into it), Network (LinkedIn could be an asset; they have Xbox Live too for another kind of network; and in enterprise, Teams data), and UI (Windows, Office – very strong existing UIs to integrate AI into). OpenAI had the model and knowledge (training data) but no native search or network or established UI platform – they’re trying to build those from scratch or through partnerships. These comparisons got me thinking about who is poised to lead or lag in the coming years of AI agent development. Which brings me to some predictions and opinions on the major AI players, based on what I’ve observed up to 2025.
The Road Ahead: Major Players and Predictions
Surveying the AI landscape as a curious developer and observer, I can’t help but be excited by some players and concerned for others. Here’s my personal take on who’s doing what right (and wrong), framed by the insights I’ve gathered on memory, agentic system pillars, and industry moves. These are, of course, just my opinions – but I’ll back them with reasoning and the trends I’ve witnessed.
Google’s Renaissance – Gemini 2.5 Pro and the Deep Search Turnaround: Just a couple of years ago, the narrative was that Google was on the back foot – surprised by OpenAI’s ChatGPT, scrambling with Bard which initially underwhelmed. But wow, how they turned things around by late 2024 and 2025. Google’s Gemini 2.5 Pro (their latest large model as of now) is an absolute beast. By integrating it deeply with their search and productivity ecosystem, Google managed a one-two punch: they regained the quality crown in model capability and leveraged their existing strengths. The introduction of Deep Research (aptly nicknamed Deep Search by many) was a game-changer. It signaled Google’s commitment to agentic AI – letting their AI actually perform tasks like a research assistant. As we discussed, raters prefer its output to others by a large marginblog.google. And features like generating reports with sources, or NotebookLM’s ability to crawl added URLsreddit.com, show Google melding search with reasoning impressively.
I openly praise Google’s strategy here. They played to their pillars: Search (best in class), Knowledge (they have tons of data and knowledge graphs), UI (they built sensible interfaces like an improved chat in Search, and NotebookLM which is quite user-friendly for document analysis). The result is that Google feels relevant and perhaps leading again in AI. Anecdotally, I’ve seen previously skeptical peers start using Google’s AI features and saying they sometimes prefer it to ChatGPT, especially for research-heavy queries. It’s not perfect, but Google’s also deploying it widely (Android phones have gotten AI features, and there’s talk of Gemini Assistant across all Google services). In a way, Google is making AI a pervasive layer rather than a destination. That’s smart for user retention – instead of trying to lure people to a separate chatbot site, they inject AI into the familiar tools people already use (search, docs, etc.). This frictionless approach, combined with their technical leaps, has truly been a turnaround from the disarray post-Bard launch.
Anthropic’s Uncertain Path – Claude and a Possible Amazon Acquisition: I have a soft spot for Anthropic – they are like the earnest research lab turned startup, always talking about AI safety and constitutional AI, and their model Claude is quite good (especially v2, which excels at lengthy, thoughtful responses). However, in the race of giants, I fear Anthropic is in a weak position. They simply don’t have a distribution channel to match a ChatGPT or a Google. They partnered with Amazon (which invested ~$4B in them), making AWS their primary cloud and promising tight integration. This partnership hints at what might come: I wouldn’t be surprised if Amazon eventually acquires Anthropic outright. In fact, recent moves show Amazon increasing its ownership stake – converting investment notes into equity geekwire.com – which is often a prelude to a fuller merger. Amazon needs a flagship model, Anthropic needs a platform; it’s kind of a match made by necessity.
From a strategic view, Anthropic aligning with Amazon could bolster AWS’s AI offerings (Claude as an AWS service to rival Azure’s OpenAI service). But I also think it’s a bit sad, because it means one less independent player. Anthropic, despite a $61B valuation in funding rounds cnbc.com, may not remain truly independent if they can’t find a massive user base or revenue stream on their own. Claude’s positioning has also been a bit confusing – it’s pitched as safer and more steerable, which is great, but they lacked a broad consumer product. There’s Claude Pro via their API and a Slack app, but no “Claude chatbot app” that went viral with millions of users. So their brand recognition among non-AI folks is low.
Given this, I predict Anthropic gets absorbed by Amazon within the next year or two. Amazon will integrate Claude into Alexa (reports already suggest they plan to do so geekwire.com) and into AWS offerings. This could actually breathe new life into Alexa by making it much smarter and more conversationally fluid. But it also means Anthropic’s more idealistic mission might get subsumed under Amazon’s more utilitarian goals (selling more Echo devices, keeping developers on AWS). We’ll have to see. Perhaps Anthropic will surprise by rapidly improving Claude to leapfrog others, but as a small(ish) player up against giants, the safer bet for them might be an acquisition exit.
AWS and the Missing Pillars – Why Amazon is Falling Behind: Extending the above, let’s talk about Amazon/AWS more broadly. It’s ironic – Amazon was a pioneer in AI assistants with Alexa, and AWS is huge in providing AI infrastructure – yet, Amazon themselves don’t strongly own any of the four pillars I identified. They don’t have a broad knowledge base or consumer data repository (they have shopping data for sure, but not general world knowledge like Google). They have no mainstream search engine (Alexa’s “search” is basically Bing under the hood for web queries, I believe). Their network/social presence is virtually nil (Twitch maybe, but that’s niche; they shuttered some social attempts). And user interface… well, Alexa is their UI, but it stagnated for years, and they don’t have a smartphone or OS platform beyond Echo speakers and Fire TV (which have limited reach).
Because of this, I see AWS falling behind in the AI platform race despite being a leader in cloud. AWS’s bet has been to be the neutral infrastructure, hosting others’ models (they offer HuggingFace, Anthropic, Stability AI models on AWS). That’s fine for serving enterprises, but it doesn’t create a consumer-facing agent of their own. In a world moving towards agentic AI, Amazon risks being just the invisible backend, while Apple, Google, OpenAI/Microsoft take the user mindshare with agents and assistants.
One example: Amazon’s own employees reportedly lamented Alexa’s decline, calling it a “colossal failure” in terms of missed potential (remember those news pieces about Amazon cutting a lot of the Alexa team around 2022?). They have since tried to revamp Alexa with a new LLM (possibly Claude). But I suspect it might be too little too late to capture developer excitement or consumer imagination. Alexa works for turning on lights and playing music, but people don’t see it as a general AI assistant in the way they see ChatGPT or even Siri now with new updates.
Without ownership of foundational pillars, Amazon will have to partner aggressively (like with Anthropic) and even then they might just be catching up. Another pillar: user data. Amazon doesn’t have the kind of data Google or Meta have on user behavior (aside from shopping habits). Data is fuel for model training and personalization. Amazon’s strength is ecommerce and cloud – how do those translate to AI agents? Possibly shopping assistants (yes, I can imagine a great agent that knows your Amazon orders and helps plan purchases or support issues). But again, that’s a narrower use-case compared to something like a full personal assistant.
In my frank opinion, AWS’s AI strategy feels half-hearted. They talk about being the toolkit (SageMaker, Bedrock, etc.), but I don’t see them leading on innovation, more like following what’s proven. Unless they fully integrate Anthropic and really pour investment to create an “AWS Agent” that ties into business workflows (they could try something with Amazon’s office suites or Slack competitor Chime – but those aren’t strong offerings now), they’ll be more of a platform than a player. And platforms can be profitable, but they don’t get the buzz or necessarily the long-term differentiation once commoditized.
OpenAI’s Next Moves – Tooling, Stickiness, and Maybe Going Social: OpenAI, the prodigy that set this all in motion, now finds itself in an arms race with the big boys. ChatGPT was a revelation – it amassed 100 million users faster than any app in history at the time. But that initial explosion has tempered. As referenced by traffic data, ChatGPT’s usage plateaued and even dipped after the peak in early 2023thewrap.com. Some of that was attributed to seasonality or competition, but it highlighted a key issue: user stickiness. Many people tried ChatGPT, were amazed, but then what? Did it integrate into their daily routine or work? For some yes, but for many, it remained an occasional tool, not a must-have.
OpenAI seems acutely aware of this and has been on a tooling blitz to increase engagement and “lock-in” users. They launched the ChatGPT plugin store, letting the AI use third-party tools from browsing to food ordering. They released an API so developers embed ChatGPT in their apps (indirect stickiness). They added features like custom instructions (to personalize responses a bit). And notably, they introduced ChatGPT “GPTs” – basically letting users create mini-agents with custom instructions or personas and share them. This last one is interesting: it could evolve into a kind of social network of AI bots, where people create and follow bots (e.g., someone publishes a “Travel Advisor GPT” and others use it). I suspect OpenAI might lean into this, effectively making a community or marketplace around ChatGPT. That starts looking like a network effect, which increases stickiness – you use ChatGPT not just to ask questions, but to find useful GPTs others made, or even to see what others are querying (imagine a feed of interesting prompts or AI-generated content – that’s basically a social network of AI usage).
There are even whispers (some from Altman’s talks) that OpenAI might consider some sort of “LinkedIn of AI” or a professional network where AI helps connect people, or a platform where your AI agent interacts with others’. They haven’t done that yet, but I wouldn’t be surprised if an OpenAI-branded consumer app beyond ChatGPT emerges, something that tries to capture more of users’ time and data. It could be an AI-centric knowledge network (sharing prompts, tips), or something like a meta-app that orchestrates other apps for you (like an AI layer on your phone that you’d stick with for convenience).
Another tactic: acquiring complementary tech. The prompt specifically mentions OpenAI’s potential acquisition of Windsurf (the AI code editor). That could make sense – GitHub is with Microsoft, Replit is independent (and a partner of Google’s per some announcements), Windsurf is up-and-coming. If OpenAI grabbed an IDE like Windsurf, they could incorporate GPT-4 (or GPT-5 in future) deeply into a coding environment they own, giving them a platform to rival VS Code/Copilot. OpenAI did invest in an AI design app (Figma competitor) and other tools, so they are thinking beyond just chat. I can see them making targeted acquisitions to fill gaps: perhaps an AI note-taking app (to integrate GPT into personal knowledge workflow, rivaling NotebookLM), or an AI agent framework like Automaton or something to make multi-step processes easier for users.
The reason behind all this: OpenAI knows that currently AI users are not very sticky. People can swap between assistants easily (try Bard today, ChatGPT tomorrow, etc.), especially since a lot of usage is one-off queries (like search). There isn’t yet the equivalent of a “social graph” or personal investment that locks someone in. Compare it to, say, Gmail – you won’t switch email providers daily because your history and contacts are there. But you might switch AI tools on a whim if one gives better answers, since your data isn’t strongly tied up in any single one (not yet, anyway). This is why personalization is minimal so far – and that’s an opportunity. The first AI assistant that really knows you (with your permission) – your preferences, your style, your context – will become much stickier. OpenAI likely wants to be that, before the likes of Apple or Google swoop in with device-level personalization. So I predict OpenAI will push features that let ChatGPT remember more about you (safely and optionally) and integrate with things you care about (your calendar, your tasks, etc. – they have plugins for some of this already). They want to graduate from “stochastic parrot” that resets every new chat to a continuous assistant in your life. That’s how they keep you around.
In fact, Sam Altman once said (paraphrasing) that he sees AI as a new platform beyond mobile – whoever controls the AI assistant platform could be as powerful as Apple or Google are in mobile. OpenAI probably harbors ambitions to be that platform (with Microsoft’s backing). Whether they succeed against the platform giants is a big open question. It might require them to do things more like a consumer tech company (build apps, handle lots of user data carefully, respond to feature requests quickly) – areas where Google and Apple have way more experience. Nevertheless, I wouldn’t count them out. They are moving fast and have the brand name for AI right now.
Lastly, I want to touch on one player not explicitly asked about but looming: Meta (Facebook). Meta open-sourced Llama, which democratized model access, and they have massive social data. They’re a wildcard. If Meta decided to really bridge their social networks with LLM smarts (beyond just chatbots in Messenger), they could activate a huge user base. Imagine an AI that helps you write Facebook posts or an AI friend in WhatsApp. They’ve trialed some novelty chatbots with personas (like Snoop Dogg as a Dungeon Master AI in Messenger), but nothing earth-shattering yet. However, Meta’s openness (releasing models free) might earn them goodwill and an ecosystem of developers building on their models. In a way, Meta and OpenAI are opposites culturally (open vs closed). It’ll be interesting which approach leads to more stickiness – open models integrated everywhere (but maybe white-labeled, not Meta-branded) or closed but user-facing products.
In summary, I foresee Google continuing a strong push, possibly regaining a lead with Gemini and integrated deep search features; OpenAI expanding horizontally (tools, features, maybe vertical integrations) to maintain its lead and become stickier; Anthropic likely folding into Amazon to strengthen AWS (but not upsetting the balance too much at the consumer level immediately); Amazon playing catch-up, maybe surprising us if they launch a compelling Alexa 2.0 with Claude, but otherwise focusing on enterprise. Meanwhile, users will benefit from all this competition – but they might also feel overwhelmed. It’s reminiscent of the early smartphone wars: lots of players and approaches, but eventually a few ecosystems emerged. I suspect in a few years, we’ll have a clearer winner of the “personal AI agent” battle, and it may be the one that best combines technology, data, and user-centric design (those darn pillars again!).
Looking Ahead: My Journey Forward (and the Quest for a Better Memory)
Writing this narrative, from my first stumbling attempts at a stateful chatbot to examining the strategies of AI titans, has been a journey in itself. I feel like the curious young programmer I was a couple years ago has gained not just technical know-how, but a deeper appreciation for the interplay between technology and people. AI isn’t just code and models; it’s about how humans use it, how we embed our knowledge and ourselves into it, and what future that creates. It’s both thrilling and daunting.
On a personal note, I’m more excited than ever to contribute to this future. The shortcomings I encountered – especially in AI memory – have become the seeds of my own project. In spare hours, I’ve been prototyping a novel memory engine for AI agents. Drawing inspiration from human memory theories and my trials with LangChain and vector stores, I’m implementing a system with three layers of memory:
- Short-Term Memory (STM): This is like the immediate context window – it holds the latest conversation or task information, akin to what current chatbots already handle. I’m making it chunk-based, so it can dynamically choose what recent pieces are relevant to keep.
- Contextual Memory: I define this as the intermediate layer – it’s situation-specific memory that might last across a session or a particular context. For example, if the agent is helping plan a vacation, the contextual memory holds trip preferences, budget, airline choices, etc., throughout that session. Technically, I’m using a local vector database that updates as the session progresses, along with some symbolic tags (like “user_prefers = window seat”). It’s like the agent’s working notes that persist during the context.
- Long-Term Memory (LTM): This is the enduring knowledge base of the agent – accumulated facts about the user, the world, past dialogs that were important, outcomes of previous tasks. I’m experimenting with a hybrid of a knowledge graph and an episodic memory store (summaries of past interactions). The idea is to periodically distill the contextual memory from a session into a concise record and save it here, so the agent can recall “last month I helped you with a similar issue” or retain improvement feedback over time. This also includes plugging in static knowledge bases (so the line blurs between long-term “learned” knowledge and stored facts).
The real challenge is getting these layers to work together seamlessly – when a new query comes in, the agent should 1) recall relevant long-term info (from LTM), 2) bring in the important contextual memories if it’s a continuing topic, and 3) process everything with the fresh short-term input. It’s an architecture that I hope can make an AI feel more coherent and personalized over time, without running into the token limits so quickly.
I’m excited to report some early wins: my prototype agent, when given a multi-turn technical troubleshooting task, was able to recall a user’s specific software environment from an earlier conversation (thanks to LTM) and also remembered to check a detail the user had emphasized at the start (stored in contextual memory) even 10 turns later when it became relevant again. It felt almost like it had attention in the human sense – not in the transformer sense of the word, but actually paying attention to what matters across time. There’s much to refine – managing memory growth, forgetting irrelevant stuff automatically, ensuring privacy and user control over what it remembers – but I see a glimmer of what could be a more persistent AI companion.
Throughout this journey, one thing that has kept me going is an innate sense of curiosity and optimism. The AI field moves at breakneck speed, yes, and it can be overwhelming. But it’s also an open playground for innovation. I often remind myself of how far things have come in just a short time: it’s 2025 and I can talk to an AI about my day like it’s a friend, or have it write code with me as if pair programming. Limitations aside, that was sci-fi not long ago. Now, with new memory systems, better alignment, and more integration, who knows what the next few years will bring? Perhaps I’ll have an AI mentor as well as an AI intern; perhaps agents will collaborate with us in art, science, and every discipline, unlocking creativity and productivity in ways we can’t yet imagine.
As a young developer in this whirlwind, I feel like an early traveler in a vast, unexplored land – each experiment, each project is like charting a new map. There will be pitfalls (and I’ve fallen into many!), there will be wrong turns (I’ve taken a few), but every step teaches something. And as I continue my journey, I carry with me the lessons from building state machines out of LLMs, the insights from critiquing cutting-edge tools, and the inspiration of a future where AI agents truly augment human potential.
I’m excited for that future. I’m working to shape it in my own small way, one memory improvement at a time, one agent experiment at a time. And I’m grateful to be a part of this grand adventure. After all, it’s not every day you get to watch (and help) the dawn of a new era in technology. With wide-eyed optimism and hard-earned pragmatism, I step forward – ready to build, learn, and collaborate in forging the next generation of AI. The journey continues, and I, the curious explorer, can’t wait to see what’s next.