Be a part of our every day and weekly newsletters for the newest updates and unique content material on industry-leading AI protection. Study Extra
The sleeping big has awoken!
For some time, it appeared like Amazon was enjoying catchup within the race to supply its customers — notably the tens of millions of builders constructing atop Amazon Internet Providers (AWS)’s cloud infrastructure — compelling first-party AI fashions and instruments.
However in late 2024, it debuted its personal inside basis mannequin household, Amazon Nova, with textual content, picture and even video era capabilities, and final month noticed a brand new Amazon Alexa voice assistant powered partly by Anthropic’s Claude household of fashions.
Then, on Monday, the e-commerce and cloud big’s synthetic normal intelligence division Amazon AGI has introduced the discharge of Amazon Nova Act, an experimental developer equipment for constructing AI brokers that may navigate the net and full duties autonomously, powered by a customized, proprietary model of Amazon’s Nova massive language mannequin (LLM). Oh, and the usual developer equipment (SDK) is open supply beneath a permissive Apache 2.0 license, although the SDK is designed to work solely with Amazon’s in-house customized Nova mannequin, not any third-party ones.
The aim is to allow third-party builders to construct AI brokers able to reliably performing duties inside internet browsers.
However how does Amazon’s Nova Act stack as much as different agent constructing platforms on the market in the marketplace, similar to Microsoft’s AutoGen, Salesforce’s Agentforce, and naturally, OpenAI’s just lately launched open supply Brokers SDK?
A unique, extra considerate method to AI brokers
Because the public rise of huge language fashions (LLMs), most “agent” methods have been restricted to responding in pure language or offering info by querying information bases.
Nova Act is a part of the bigger {industry} shift towards action-based brokers—methods that may full precise duties throughout digital environments on behalf of the person. OpenAI’s new Responses API, which provides customers entry to its autonomous browser navigator, is one main instance of this, which builders can combine into AI brokers by the OpenAI Brokers SDK.
Amazon AGI emphasizes that present agent methods, whereas promising, wrestle with reliability and infrequently require human supervision, particularly when dealing with multi-step or advanced workflows.
Nova Act is particularly designed to handle these limitations by offering a set of atomic, prescriptive instructions that may be chained collectively into dependable workflows.
Deniz Birlikci, a Member of Technical Employees at Amazon, described the broader imaginative and prescient in a video introducing Nova Act: quickly, there might be extra AI brokers than individuals looking the net, finishing up duties on behalf of customers.
David Luan, VP of Amazon’s Autonomy Staff and Head of AGI SF Lab, framed the mission extra instantly in a latest video name interview with VentureBeat: “We’ve created this new experimental AI mannequin that’s skilled to carry out actions in an online browser. Basically, we predict that brokers are the constructing block of computing,” he stated.
Luan, previously a co-founder and CEO of Adept AI, joined Amazon in 2024 as a part of an aqcui-hire. Luan stated he has lengthy been a proponent of AI brokers. “With Adept, we had been the primary firm to essentially begin engaged on AI brokers. At this level, all people is aware of how essential brokers are. It was fairly cool to be a bit forward of our time,” he added.
What Nova Act affords devs
The Nova Act SDK gives builders with a framework for setting up web-based automation brokers utilizing pure language prompts damaged down into clear, manageable steps.
In contrast to typical LLM-powered brokers that try whole workflows from a single immediate—typically leading to unreliable conduct—Nova Act is designed to incrementally execute smaller, verifiable duties.
A number of the key options of Nova Act embody:
- Superb-Grained Activity Decomposition: Builders can break down advanced digital workflows into smaller act() calls, every guiding the agent to carry out particular UI interactions.
- Direct Browser Manipulation by way of Playwright: Nova Act integrates with Playwright, an open-source browser automation framework developed by Microsoft. Playwright permits builders to manage internet browsers programmatically—clicking components, filling types, or navigating pages—with out relying solely on AI predictions. This integration is especially helpful for dealing with delicate duties similar to getting into passwords or bank card particulars. For instance, as an alternative of sending delicate info to the mannequin, builders can instruct Nova Act to deal with a password subject after which use Playwright APIs to securely enter the password with out the mannequin ever “seeing” it. This method helps strengthen safety and privateness when automating internet interactions.
- Python Integration: The SDK permits builders to interleave Python code with Nova Act instructions, together with commonplace Python instruments similar to breakpoints, assertions, or thread pooling for parallel execution.
- Structured Data Extraction: The SDK helps structured knowledge extraction by Pydantic schemas, permitting brokers to transform display screen content material into structured codecs.
- Parallelization and Scheduling: Builders can run a number of Nova Act situations concurrently and schedule automated workflows with out the necessity for steady human oversight.
Luan emphasised that Nova Act is a software for builders fairly than a general-purpose chatbot. “Nova Act is constructed for builders. It’s not a chatbot you discuss to for enjoyable. It’s designed to let builders begin constructing helpful merchandise,” he stated.
For instance, one of many pattern workflows demonstrated in Amazon’s documentation exhibits how Nova Act can automate house searches by scraping rental listings and calculating biking distance to coach stations, then sorting the leads to a structured desk.
One other showcased instance makes use of Nova Act to order a selected salad from Sweetgreen each Tuesday, fully hands-free and on a schedule, illustrating how builders can automate repeatable digital duties in a approach that feels dependable and customizable.
Benchmark efficiency and a deal with reliability
A central message in Amazon’s announcement is that reliability, not simply intelligence, is the important thing barrier to widespread agent adoption.
Present state-of-the-art fashions are literally fairly brittle at powering AI brokers, with brokers sometimes attaining 30% to 60% success charges on browser-based multi-step duties, in response to Amazon.
Nova Act, nevertheless, emphasizes a building-block method, scoring over 90% on inside evaluations of duties that problem different fashions—similar to interacting with dropdowns, date pickers, or pop-ups.
Luan underscored why that reliability focus issues. “What we’ve actually centered on is how do you truly make brokers dependable? If you happen to ask it to replace a document in Salesforce and it deletes your database one out of ten instances, you’re in all probability by no means going to make use of it once more,” he stated.
Amazon AGI benchmarked Nova Act towards competing fashions together with Anthropic’s Claude 3.7 Sonnet and OpenAI’s CUA mannequin. On the ScreenSpot Internet Textual content benchmark, which assessments instruction-following on textual display screen components, Nova Act achieved a rating of 0.939, outperforming Claude 3.7 Sonnet (0.900) and OpenAI CUA (0.883).

On the ScreenSpot Internet Icon benchmark, which focuses on visible UI components, Nova Act scored 0.879, once more forward of the opposite fashions.
Nonetheless, on the GroundUI Internet benchmark, which assessments normal UI interplay, Nova Act scored 0.805, barely behind its opponents.
These scores had been measured internally by Amazon utilizing constant prompts and analysis standards.
Amazon additionally highlighted early leads to Nova Act’s means to generalize past commonplace environments.
For example, group member Rick Liu demonstrated how the agent, with out express coaching, efficiently interacted with a pigeon-themed internet recreation—assigning stats, battling opponents, and progressing within the recreation.
In keeping with Luan, that means to generalize is central to the long-term imaginative and prescient. “Our aim with Nova Act is to be a common browser-use answer. We would like an agent that may do something you need to do on a pc for you,” he stated.
Versatile to be used in several clouds, however locked to Amazon’s Nova mannequin
Whereas Nova Act is accessible to builders globally by nova.amazon.com, Luan clarified that the system is tightly coupled to Amazon’s in-house Nova basis fashions.
Builders can not plug in exterior LLMs similar to OpenAI’s GPT-4o or Anthropic’s Claude 3.7 Sonnet, not like with OpenAI’s Brokers SDK, and to a lesser extent, Microsoft’s AutoGen and Salesforce’s Agentforce platforms (which permit switching to some completely different supplier firms and mannequin households).
“Nova Act is a customized skilled model of the Nova mannequin,” he stated. “It’s not only a scaffolding over a generic LLM. It’s natively skilled to behave on the web in your behalf.”
Nonetheless, Nova Act is just not restricted to AWS environments. Builders can obtain the SDK and run it domestically, within the cloud, or wherever they select. “You don’t must be on AWS to make use of it,” Luan said.
Thus, for companies searching for most underlying mannequin flexibility for his or her brokers, Nova Act might be not your best option. Nonetheless, for these searching for a purpose-built mannequin particularly designed to navigate the net and carry out actions throughout all kinds of internet sites with very completely different person interfaces (UIs), it’s in all probability price a glance — particularly should you’re already within the Amazon or AWS developer ecosystem.
Safety, licensing and pricing
The Nova Act SDK is launched beneath the Apache License, Model 2.0 (January 2004), an open supply license. Nonetheless, this is applicable solely to the SDK software program.
The Nova Act mannequin itself, together with its weights and coaching knowledge, is proprietary and stays closed-source. The method is intentional, in response to Luan, who defined that the mannequin is tightly built-in and co-trained with the SDK to attain reliability.
At launch, Nova Act is obtainable as a free analysis preview. There is no such thing as a introduced pricing for manufacturing use but.
Luan described this section as a possibility for builders to experiment and construct with the know-how. “Our perception is that almost all of probably the most helpful agent merchandise haven’t but been constructed. We need to allow anyone to construct a extremely helpful agent, whether or not for themselves or as a product,” he stated.
Long run, Amazon plans to introduce production-grade phrases, together with usage-based billing and scaling ensures, however these aren’t but obtainable.
What’s subsequent for Nova Act?
The discharge of Nova Act displays Amazon’s broader ambition to make action-oriented AI brokers a foundational part of computing.
Luan summed up the chance forward: “My private dream is that brokers develop into the constructing block of computing, and the best new startups and merchandise get constructed on prime of what our group is creating.”
The Nova Act SDK is out there now for experimentation and prototyping on Amazon’s web site and on Github.