Our startup has been building agents inside of a workspace for the past few months. We had our first version about nine months ago, and the UI looks almost identical to OpenAI's.
That said, it's going to be very interesting to see if one AI Agent platform becomes the standard, or if people end up having different AIs for different platforms.
On our end at least, we are planning to keep our agent builder, even if we end up using OpenAI on the backend. At the very least we will at least maintain the agency to utilize another platform if we want.
So I guess I wasn't the only one who thought the next big thing was a service for building agents. Turns out, OpenAI has been working on almost the same concept.
Does anyone want to buy agentnexus.space?
Just kidding. I still think I need to work on agents that build agents. Even if something similar is built into ChatGPT. I think.
I’ve found the OpenAI assistants API not really up to snuff in terms of predictable behavior yet.
That said, I’m very bullish on agents overall though and expect that once they get their assistants behaving a bit more predictably we will see some cool stuff.
It’s really quite magical to see one of these think through how to solve a problem, use custom tools that you implement to help solve it, and come to a solution.
I'd agree that what is available publicly isn't anything that hasn't been in wide discussion for an agent framework since maybe ~march/april of this year, and many people had just hacked together their own version with an agent/RAG pipeline and API to hide their requests behind.
I'm very sure anything revolutionary would have been more of a leap than deeply integrating a agent/RAG pipeline into the OpenAI API. They have the compute...
Introducing Open Agent Studio
We were the first startup approved by openAI to sell GPT-3 for automation in August 2021. Two years ago, we started working on solving fundamental roadblocks for general agents that break all other RPA tools today. It finally works-- we've published a new multi-modal model that enables building future-proof agents that are robust to even design changes.
Agentic Process Automation (APA)
We introduce powerful new RPA concepts like "Semantic Targets" in simple language that are more robust and easier to use than the previous generation of brittle code selectors.
Agent Recorder
Records clicks/mouse movement/keypresses to re-build the automation graph for you with accurate semantic targets in english, making it as easy as possible to build agents and edit them in simple language.
Live Agents
Automate common processes and trigger them from visual queues across your business with agents that intelligently suggest automations from your library based on the context of the screen.
Prompt To No-Code Graph
Type open-ended automations as prompts to generate custom no-code graphs. Our framework has access to install every open source python library, so you can ask for wild custom automations like generating movie trailers. Our generalized agent state machine can perform arbitrary tasks like booking a flight from Boston to LA, buying products on Amazon, and it even takes a different path each time since it considers each step using GPT-4 Vision.
Target Markets Untouched By AI
Open Agent Studio is not just another co-pilot--it's a no-code co-pilot builder that enables solutions that're impossible in all other RPA tools today. Our customers have a head start over the new few months to target markets previously untouched by AI with their deep industry insight. Subscribers have access to a free 4 week course, which teaches how to evaluate product ideas and launch a custom agent with an enterprise-grade white label.
Key technical breakthroughs
Semantic targets are a 100% replacement for previous targeting strategies, solving a problem that frequently breaks all other RPA tools today when services update their designs.
Our own multi-modal model Atlas-2, outperforms all public UI models today using a mostly synthetic dataset we generated to get our accuracy from 95-100% for detecting UI elements.
Websocket server integration with the browser for advanced automations, including scraping links using semantic english targets.
Ships with multiple unlimited free open models like Llama 2, Mystral 7B, and top performing models like GPT-4 Vision and Claude 3 Opus.
We're excited to see what will be built and appreciate your feedback on how to make it even better.
We've been exploring the space of AI agents and as part of our research we've made a list of both open source and closed source AI agents.
We've tried to include mostly just agents but the lines sometimes get blurry as it's hard to sometimes distinguish agents from SDK.
At the moment, a lot of these agents are still in the "toys" stage but we think the future looks very promising. Especially some of these agents are already pretty helpful.
Curious to hear what has been yours experience with agents?
We're very aware that we'll need great agents to be able to compete with Devin and others. We're currently setting up evaluation pipelines to evaluate various agents against SWE-bench.
Our thesis is that a community experimenting with various agents and agent architectures will outpace a private company on a single track. We're building the notion of an "agent hub" out of the gates--anyone can plug into the Agent interface and contribute their work. We're also discussing how to build a meta-agent, which farms out specific tasks to sub-agents.
It's early days though--we've only just gotten things wired together in a sort-of working demo. Stay tuned!
Yes, this will be an interesting next experiment - adding agents with additional tools (also for example access to internal APIs) will be quite powerful.
Multi-agent-setups with LLMs (AutoGPT for example) were hyped some weeks ago. And OpenAI with their specialized Bots(?) or what they introduced recently goes in the same direction.
Last week, The Information reported that OpenAI is developing an AI agent to “automate complex tasks by effectively taking over a customer’s device”.
It fits into a broader trend, similar to what the Rabbit R1 is doing, which is teaching computers how to use our devices for us: perform cursor movements, click buttons, etc. Now, you might think that is a rather bad idea, and it probably is.
That said, the emergence of AI agents, powered by generative AI, represents a new generation of assistants that are more flexible, can do stuff for you, and in the near future may even act on your behalf.
I am slowly coming around to the same conclusion. It isn't always clear how some agent types are different from others. Sometimes the prompts expect JSON blobs and sometimes they expect something else. I tried it out because I could see the potential, but I dont think it's architect-ed in a way that is suitable for things beyond simple PoCs.
It would probably be much better to start with the basic OpenAI API and then build on top of it.
What I find particularly frustrating is the difficulty in easily interfacing with my existing python tools (not like add two numbers, but somewhat complex analytics on top of structured data). If anybody has any success with interfacing with existing tools/scripts, would love to know how people are going about doing it.
“OpenAI do all the work” is a very wrong claim in this case.
It’s true that most of the complexity are solved by using LLM, but it’s not everything. There are still a good amount of work needed to be done if you want to build an agent (or even an AI wrapper, if you’re implying it).
The idea of making agents like CrewAI easier to quickly prototype is great.
It’s unclear though from the landing page how this is different from Make. All of the examples are things that could be done using the OpenAI node in Make / Zapier / n8n.
That said, it's going to be very interesting to see if one AI Agent platform becomes the standard, or if people end up having different AIs for different platforms.
On our end at least, we are planning to keep our agent builder, even if we end up using OpenAI on the backend. At the very least we will at least maintain the agency to utilize another platform if we want.
reply