By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
PulseReporterPulseReporter
  • Home
  • Entertainment
  • Lifestyle
  • Money
  • Tech
  • Travel
  • Investigations
Reading: From hallucinations to {hardware}: Classes from a real-world pc imaginative and prescient challenge gone sideways
Share
Notification Show More
Font ResizerAa
PulseReporterPulseReporter
Font ResizerAa
  • Home
  • Entertainment
  • Lifestyle
  • Money
  • Tech
  • Travel
  • Investigations
Have an existing account? Sign In
Follow US
  • Advertise
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
PulseReporter > Blog > Tech > From hallucinations to {hardware}: Classes from a real-world pc imaginative and prescient challenge gone sideways
Tech

From hallucinations to {hardware}: Classes from a real-world pc imaginative and prescient challenge gone sideways

Pulse Reporter
Last updated: June 28, 2025 10:13 pm
Pulse Reporter 4 hours ago
Share
From hallucinations to {hardware}: Classes from a real-world pc imaginative and prescient challenge gone sideways
SHARE

Be a part of the occasion trusted by enterprise leaders for almost 20 years. VB Remodel brings collectively the individuals constructing actual enterprise AI technique. Study extra


Laptop imaginative and prescient tasks not often go precisely as deliberate, and this one was no exception. The thought was easy: Construct a mannequin that might take a look at a photograph of a laptop computer and establish any bodily harm — issues like cracked screens, lacking keys or damaged hinges. It appeared like an easy use case for picture fashions and giant language mannequins (LLMs), however it shortly changed into one thing extra sophisticated.

Alongside the way in which, we bumped into points with hallucinations, unreliable outputs and pictures that weren’t even laptops. To resolve these, we ended up making use of an agentic framework in an atypical approach — not for job automation, however to enhance the mannequin’s efficiency.

On this submit, we are going to stroll by way of what we tried, what didn’t work and the way a mix of approaches finally helped us construct one thing dependable.

The place we began: Monolithic prompting

Our preliminary strategy was pretty customary for a multimodal mannequin. We used a single, giant immediate to cross a picture into an image-capable LLM and requested it to establish seen harm. This monolithic prompting technique is easy to implement and works decently for clear, well-defined duties. However real-world information not often performs alongside.

We bumped into three main points early on:

  • Hallucinations: The mannequin would generally invent harm that didn’t exist or mislabel what it was seeing.
  • Junk picture detection: It had no dependable solution to flag photographs that weren’t even laptops, like photos of desks, partitions or individuals sometimes slipped by way of and acquired nonsensical harm experiences.
  • Inconsistent accuracy: The mix of those issues made the mannequin too unreliable for operational use.

This was the purpose when it turned clear we would want to iterate.

First repair: Mixing picture resolutions

One factor we seen was how a lot picture high quality affected the mannequin’s output. Customers uploaded every kind of photographs starting from sharp and high-resolution to blurry. This led us to seek advice from analysis highlighting how picture decision impacts deep studying fashions.

We educated and examined the mannequin utilizing a mixture of high-and low-resolution photographs. The thought was to make the mannequin extra resilient to the big selection of picture qualities it might encounter in observe. This helped enhance consistency, however the core problems with hallucination and junk picture dealing with continued.

The multimodal detour: Textual content-only LLM goes multimodal

Inspired by latest experiments in combining picture captioning with text-only LLMs — just like the approach coated in The Batch, the place captions are generated from photographs after which interpreted by a language mannequin, we determined to present it a attempt.

Right here’s the way it works:

  • The LLM begins by producing a number of attainable captions for a picture. 
  • One other mannequin, known as a multimodal embedding mannequin, checks how effectively every caption matches the picture. On this case, we used SigLIP to attain the similarity between the picture and the textual content.
  • The system retains the highest few captions based mostly on these scores.
  • The LLM makes use of these prime captions to put in writing new ones, making an attempt to get nearer to what the picture really reveals.
  • It repeats this course of till the captions cease bettering, or it hits a set restrict.

Whereas intelligent in idea, this strategy launched new issues for our use case:

  • Persistent hallucinations: The captions themselves generally included imaginary harm, which the LLM then confidently reported.
  • Incomplete protection: Even with a number of captions, some points had been missed totally.
  • Elevated complexity, little profit: The added steps made the system extra sophisticated with out reliably outperforming the earlier setup.

It was an fascinating experiment, however finally not an answer.

A artistic use of agentic frameworks

This was the turning level. Whereas agentic frameworks are normally used for orchestrating job flows (suppose brokers coordinating calendar invitations or customer support actions), we questioned if breaking down the picture interpretation job into smaller, specialised brokers may assist.

We constructed an agentic framework structured like this:

  • Orchestrator agent: It checked the picture and recognized which laptop computer elements had been seen (display, keyboard, chassis, ports).
  • Part brokers: Devoted brokers inspected every element for particular harm varieties; for instance, one for cracked screens, one other for lacking keys.
  • Junk detection agent: A separate agent flagged whether or not the picture was even a laptop computer within the first place.

This modular, task-driven strategy produced far more exact and explainable outcomes. Hallucinations dropped dramatically, junk photographs had been reliably flagged and every agent’s job was easy and targeted sufficient to manage high quality effectively.

The blind spots: Commerce-offs of an agentic strategy

As efficient as this was, it was not good. Two primary limitations confirmed up:

  • Elevated latency: Working a number of sequential brokers added to the entire inference time.
  • Protection gaps: Brokers may solely detect points they had been explicitly programmed to search for. If a picture confirmed one thing sudden that no agent was tasked with figuring out, it might go unnoticed.

We would have liked a solution to steadiness precision with protection.

The hybrid answer: Combining agentic and monolithic approaches

To bridge the gaps, we created a hybrid system:

  1. The agentic framework ran first, dealing with exact detection of recognized harm varieties and junk photographs. We restricted the variety of brokers to probably the most important ones to enhance latency.
  2. Then, a monolithic picture LLM immediate scanned the picture for anything the brokers may need missed.
  3. Lastly, we fine-tuned the mannequin utilizing a curated set of photographs for high-priority use instances, like often reported harm situations, to additional enhance accuracy and reliability.

This mix gave us the precision and explainability of the agentic setup, the broad protection of monolithic prompting and the arrogance enhance of focused fine-tuning.

What we discovered

A couple of issues turned clear by the point we wrapped up this challenge:

  • Agentic frameworks are extra versatile than they get credit score for: Whereas they’re normally related to workflow administration, we discovered they might meaningfully enhance mannequin efficiency when utilized in a structured, modular approach.
  • Mixing completely different approaches beats counting on only one: The mix of exact, agent-based detection alongside the broad protection of LLMs, plus a little bit of fine-tuning the place it mattered most, gave us way more dependable outcomes than any single methodology by itself.
  • Visible fashions are liable to hallucinations: Even the extra superior setups can leap to conclusions or see issues that aren’t there. It takes a considerate system design to maintain these errors in test.
  • Picture high quality selection makes a distinction: Coaching and testing with each clear, high-resolution photographs and on a regular basis, lower-quality ones helped the mannequin keep resilient when confronted with unpredictable, real-world images.
  • You want a solution to catch junk photographs: A devoted test for junk or unrelated photos was one of many easiest modifications we made, and it had an outsized influence on general system reliability.

Closing ideas

What began as a easy concept, utilizing an LLM immediate to detect bodily harm in laptop computer photographs, shortly changed into a a lot deeper experiment in combining completely different AI strategies to deal with unpredictable, real-world issues. Alongside the way in which, we realized that among the most helpful instruments had been ones not initially designed for this sort of work.

Agentic frameworks, typically seen as workflow utilities, proved surprisingly efficient when repurposed for duties like structured harm detection and picture filtering. With a little bit of creativity, they helped us construct a system that was not simply extra correct, however simpler to know and handle in observe.

Shruti Tiwari is an AI product supervisor at Dell Applied sciences.

Vadiraj Kulkarni is an information scientist at Dell Applied sciences.

Day by day insights on enterprise use instances with VB Day by day

If you wish to impress your boss, VB Day by day has you coated. We provide the inside scoop on what corporations are doing with generative AI, from regulatory shifts to sensible deployments, so you’ll be able to share insights for max ROI.

Learn our Privateness Coverage

Thanks for subscribing. Take a look at extra VB newsletters right here.

An error occured.


You Might Also Like

Greatest headphones deal: Save $51.99 on Sony ULT WEAR

Nazara’s Nodwin Gaming acquires AFK Gaming

China Conquers Mexico’s Automotive Market, and the US Is Apprehensive

Looktech unveils AI glasses with personalised help and media seize

Windblown exhibits how good roguelikes could be with buddies

Share This Article
Facebook Twitter Email Print
Previous Article Making The Workplace After Carell A Wrestle Making The Workplace After Carell A Wrestle
Next Article Matty Healy Might Have Shaded Taylor Swift At Glastonbury, And Folks Are Undoubtedly Reacting Matty Healy Might Have Shaded Taylor Swift At Glastonbury, And Folks Are Undoubtedly Reacting
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Weekly Newsletter

Subscribe to our newsletter to get our newest articles instantly!

More News

Each "Mates" Character, Ranked By Their Capability To Survive A Zombie Apocalypse
Each "Mates" Character, Ranked By Their Capability To Survive A Zombie Apocalypse
39 minutes ago
The rise of immediate ops: Tackling hidden AI prices from dangerous inputs and context bloat
The rise of immediate ops: Tackling hidden AI prices from dangerous inputs and context bloat
54 minutes ago
Senate eyes key vote on Trump’s tax invoice, whereas Musk calls it ‘completely insane and harmful’
Senate eyes key vote on Trump’s tax invoice, whereas Musk calls it ‘completely insane and harmful’
1 hour ago
Justin And Hailey Bieber Marriage Replace June 2025
Justin And Hailey Bieber Marriage Replace June 2025
2 hours ago
Mattress Shopping for: In-Retailer or On-line?
Mattress Shopping for: In-Retailer or On-line?
2 hours ago

About Us

about us

PulseReporter connects with and influences 20 million readers globally, establishing us as the leading destination for cutting-edge insights in entertainment, lifestyle, money, tech, travel, and investigative journalism.

Categories

  • Entertainment
  • Investigations
  • Lifestyle
  • Money
  • Tech
  • Travel

Trending

  • Each "Mates" Character, Ranked By Their Capability To Survive A Zombie Apocalypse
  • The rise of immediate ops: Tackling hidden AI prices from dangerous inputs and context bloat
  • Senate eyes key vote on Trump’s tax invoice, whereas Musk calls it ‘completely insane and harmful’

Quick Links

  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Disclaimer
2024 © Pulse Reporter. All Rights Reserved.
Welcome Back!

Sign in to your account