Congrats! I think the space is very interesting, I was a founder of a similar windows CUA infra/ RPA agents but pivoted. My thoughts:
1) The funny thing about determinism is how deterministic you should be when to break, its kind of a recursive problem. agents are inherently very tough to guardrail on an action space so big like in CUA. The guys from browser use realized it as well and built workflow-use. Or you could try RL or finetuning per task but is not viable(economically or tech wise) currently.
2) As you know, It's a very client facing/customized solution space You might find this interesting, it reflects my thoughts in the space as well. Tough to scale as a fresh startup unless you really niche down on some specific workflows. https://x.com/erikdunteman/status/1923140514549043413 (he is also building in the deterministic agent space now funnily enough)
3) It actually is annoyingly expensive with Claude if you break caching, which you have to at some point if you feed in every screenshot etc. You mentioned you use multiple models (i guess uitars/omniparser?), but in the comments you said claude?
4) Ultimately the big bet in the RPA space, as again you know, is that the TAM wont shrink a lot due to more and more SAP's, ERP's etc implementing API's. Of course the big money will always be in ancient apps that wont, but then again in that space, uipath and the others have a chokehold. (and their agentic tech is actually surprisingly good when i had a look 3 months ago)
Good luck in any case! I feel like its one of those spaces that we are definitely still a touch too early, but its such a big market that there is plenty of space for a lot of people.
mahmoud-almadi 13 hours ago [-]
Thanks! Really appreciate the awesome thoughts.
1) You're totally right about this problem! We handle this issue with intelligent caching and heavy prompt/context engineering. These measures have been controlling agent behavior pretty well.
2) The key to scaling is building a tool that developers can learn and pick up themselves and that's what we're seeing here. By understanding how to use the tools we built to control agent behavior, developers have been able to leverage our docs to achieve desirable behaviors from our computer use agents when using Cyberdesk.
3) Surely you're correct about cost here as well, but with well defined workflows this will only happen a minority of the times the agent runs.
4) Great point! The beauty of what computer use made possible is it can solve problems previously unsolved by RPA altogether. The TAM will increase significantly when computer use agents start working really well. We've already seen this with our customers: they're able to build automations with Cyberdesk that they weren't able to using RPA. So while the TAM might bleed some ERPs and legacy apps that will implement APIs, I think it's going to grow at a much faster rate than it will shrink.
kjellsbells 9 hours ago [-]
This is great, but a part of me wonders if our industry isnt putting a bandaid on a problem that we ourselves created.
Consider your typical early-2000s era Windows app. It would expect a mouse, but for power users, keyboard shortcuts would be available for every action, even if clunky. For example, Alt F tab tab tab to get to some input field, enter text, tab Alt R Return.
By about 2015 these were all straightforwardly scriptable with AutoHotkey amd similar tools.
But too late: by 2015 even Windows users were using web apps, where the keyboard bindings are variable or non existent, where the entire UI can change overnight, etc. I see some RPA approaches desperately trying to decode the DOM or match pixel elements. It's wild, as you point out.
I guess what I'm wondering if going after legacy Windows apps is a small TAM already largely solved, whereas the SPA/webapp market is gigantic, growing every day, and woefully, miserably, broken as far as automation is concerned.
mattfrommars 17 hours ago [-]
Looks great to automate workload for Windows desktop application. I'd love to understand more deeply how your application works, so the set of commands your backend send is click, scroll, screenshot. Does it send command to say type character into an input field? How is it able to pin point a text field from a screenshot? Is LLM reliable to pin point x and y to click on a field?
Also, to have this run in a large scale, Does it become prohibitively expensive to run on daily basis on thousand of custom workflows? I assume this runs on the cloud.
sgtwompwomp 17 hours ago [-]
Thanks! And yes, so our pathfinder agents utilize Sonnet 4's precise coordinate generation capabilities. You give it a screenshot, give it a task, and it can output exact coordinates of where to click on an input field, for example.
And yes we've found the computer use models are quite reliable.
Great questions on scale: the whole way we designed our engine is that in the happy path, we actually use very little LLMs. The agent runs deterministically, only checking at various critical spots if anomalies occurred (if it does, we fallback to computer use to take it home). If not, our system can complete an entire task end to end, on the order of less than $0.0001.
So it's a hybrid system at the end of the day. This results in really low costs at scale, as well as speed and reliability improvements (since in the happy path, we run exactly what has worked before).
throw03172019 21 hours ago [-]
Looks great. For the EMR use cases, do you sign BAAs? Which CUA models are being used? No data retention?
mahmoud-almadi 20 hours ago [-]
We sign BAAs with all our healthcare customers + all our vendors. Currently using Claude computer-use. Zero-data retention signed with both Anthropic and OpenAI, so none of the information getting sent to their LLMs ever get retained
hermitcrab 19 hours ago [-]
>none of the information getting sent to their LLMs ever get retained
Is it possible to verify that?
sgtwompwomp 19 hours ago [-]
Yup! We have signed certificates that explicitly state this, with all LLM providers we use.
feisty0630 19 hours ago [-]
That's not "verification" by any definition of the word.
mahmoud-almadi 18 hours ago [-]
Good point. In a way we can verify to a customer that we have that policy set up with them by showing them the certificate. But you are correct in that we haven't gone as far as asking for proof from Anthropic or OpenAI on not retaining any of our data but what we did do is we got their SOC 2 Type II reports and they showed no significant security vulnerabilities that will impact our usage of their service. So now we have been operating under the assumption that they are honoring our signed agreement within the context of the SOC 2 Type II report we retrieved, and our customers have been okay with that. But we are definitely open to pursuing that kind of proof at some point.
feisty0630 6 hours ago [-]
All of which has nothing to do with OpenAI or Anthropic deciding to use your data??? SOC 2 Type II is completely irrelevant.
You've got two companies that basically built their entire business upon stealing people's content, and they've given you a piece of paper saying "trust me bro".
piltdownman 4 hours ago [-]
Welcome to the invalidated EU-US Safe Harbour, the invalidated EU-US Privacy Shield, and the soon-to-be invalidated EU-US Data Privacy Framework (DPF) and Transatlantic Data Privacy Framework (TADPF).
Digital sovereignty and respect for privacy and local laws are the exception in this domain, not the expectation.
As Max Schrems puts it "Instead of stable legal limitations, the EU agreed to executive promises that can be overturned in seconds. Now that the first Trump waves hit this deal, it quickly throws many EU businesses into a legal limbo."
After recently terrifying the EU with the truth in an ill-advised blogpost, Microsoft are now attempting the concept of a 'Sovereign Public Cloud' with a supposedly transparent and indelible access-log service called Data Guardian.
If Nation States can't manage to keep their grubby hands off your data, private US Companies obliged to co-operate with Intelligence Apparatus certainly won't be.
DaiPlusPlus 17 hours ago [-]
Honestly, I'm surprised your lawyers let you post that here.
+1 for honesty and transparency
sethhochberg 17 hours ago [-]
Typically with this sort of thing the way it really works is that you, the startup, use a service provider (like OpenAI) who publish their own external audit reports (like a SOC 2 Type 2) and then the SOC 2 auditors will see that the service provider company has a policy related to how it handles customer data for customers covered by Agreement XYZ, and require evidence to prove that the service provider company is following its policies related to not using that data for undeclared purposes or whatever else.
Audit rights are all about who has the most power in a given situation. Just like very few customers are big enough to go to AWS and say "let us audit you", you're not going to get that right with a vendor like Anthropic or OpenAI unless you're certifiably huge, and even then it will come with lots of caveats. Instead, you trust the audit results they publish and implicitly are trusting the auditors they hire.
Whether that is sufficient level of trust is really up to the customer buying the service. There's a reason many companies sell on-prem hosted solutions or even support airgapped deployments, because no level of external trust is quite enough. But for many other companies and industries, some level of trust in a reputable auditor is acceptable.
mahmoud-almadi 17 hours ago [-]
Thanks for the breakdown Seth! We did indeed get their SOC 2 Type II reports and made sure they showed no significant security vulnerabilities that will impact our usage of their service.
downrightmike 18 hours ago [-]
Is it a 3rd party that is verifying?
mahmoud-almadi 18 hours ago [-]
We haven't looked into this kind of approach yet, but definitely worthwhile to do at some point!
bozhark 17 hours ago [-]
[flagged]
mahmoud-almadi 17 hours ago [-]
Right now we are taking the policies we signed with our LLM vendors as a verification of a zero data retention policy. We did also get their SOC 2 Type II reports and they showed no significant security vulnerabilities that will impact our usage of their service. We're doing our best to deliver value while taking as many security precautions as possible: our own data retention policy, encrypting data at rest and in transit, row-level security, SOC 2 Type I and HIPAA compliance (in observation for Type II), secret managers. We have other measures we plan to take like de-identifying screenshots before sending them up. Would love to get your thoughts on any other security measures you would recommend!
herval 19 hours ago [-]
I’m guessing OP is asking if it’s possible to verify they’re honoring the contract and deleting the data?
bozhark 17 hours ago [-]
Nope.
mwcampbell 15 hours ago [-]
Have you looked at using accessibility APIs, such as UI Automation on Windows, to augment screenshots and simulated mouse clicks?
mahmoud-almadi 10 hours ago [-]
Not yet! Vision only has been doing pretty well so far, but we're definitely looking into that at some point. Thanks for the suggestion!
throw03172019 14 hours ago [-]
Isn’t this an optional feature for developers? They can disable it / remove the names of the buttons, etc to make RPA harder?
rkagerer 20 hours ago [-]
Personally I think this approach is flawed because it runs in the cloud. If it were an agent I could run locally I'd be much more interested.
mahmoud-almadi 20 hours ago [-]
Are you referring to the LLM being used or where the actions (click, type, etc) are being executed? The actual actions can be executed on any windows machine, so the actual execution can take place locally on your device. The LLMs we're using right now are cloud LLMs. We haven't done an LLM self hosting option yet. Can I ask what reservations you have about running in the cloud? We have zero-date retention signed with our LLM vendors, so none of the data getting sent to them ever gets retained.
iptq 18 hours ago [-]
If this can't run full-local, isn't that basically a botnet? You're talking about installing a kernel-level driver that receives instructions on what to do from a cloud service.
mahmoud-almadi 18 hours ago [-]
Great point! Yes you are correct in that the actual "agent" lives in the cloud and its actions are executed by a proxy running on the desktop. Hopefully at some point we can set up a straightforward installation procedure to have the AI models running entirely on the desktop, but that's constrained by desktop specs for now. VMs and desktops with the specs to handle that would be prohibitively expensive for a lot of teams trying to build these automations.
rm_-rf_slash 17 hours ago [-]
Out of curiosity, what would the minimum specs need to be in order to run this locally?
My PC is just good enough to run a DeepSeek distill. Is that on par with the requirements for your model?
sgtwompwomp 17 hours ago [-]
There isn't a viable computer use model that can be ran locally yet unfortunately. Am extremely excited for the day that happens though. Essentially the key capability that makes a model a computer use model is precise coordinate generation.
So if you come across a local model that can do that well, let us know! We're also keeping a close watch.
ciaranmca 16 hours ago [-]
Haven’t looked into them much but I thought the Chinese labs had released some for this kind of thing
mahmoud-almadi 10 hours ago [-]
You are correct in that ByteDance did releas UI-TARS which sounds like a really good open source computer use model according to some articles I read. You could run that locally. We haven't tested it so I wouldn't know how it performs, but sounds like it's definitely worth exploring!
rkagerer 16 hours ago [-]
What would it take to train your own?
mahmoud-almadi 12 hours ago [-]
I don't know too much about training your own computer use model other than it would probably be a very hefty, very expensive task.
However, I believe ByteDance released UI-TARS which is an excellent open source computer use model according to some articles I read. You could run that locally. We haven't tested it so I wouldn't know how it performs, but sounds like it's definitely worth exploring!
rkagerer 16 hours ago [-]
I'm talking about the LLM (and any other infrastructure involved). Reasons are:
- Pricing. If I grow to do this at scale, I don't want to be paying per-action, per-month, per-token, etc.
- Privacy. I don't want my data, screenshots, whatever being sent to you or the cloud AI providers.
- Control. I don't want to be vulnerable to you or other third parties going bankrupt, arbitrarily deciding to kill the product or it's dependencies, or restructuring plans/pricing/etc. I also want to be able to keep my day to day operations running even if there's a major cloud outage (that's one reason we're still using this "old fashioned", non-cloud software in the first place).
I think I'm simply not your target market.
I advise several companies who could be (they run "legacy" software with vast teams of human operators whose daily tasks include some portion of work that would be a good candidate for increased automation), but most of them are in a space where one or more of the above factors would be potential deal breakers.
The retention agreements between you and your vendors are great (I mean that sincerely), but I'm not party to them so they don't do anything for me. If you offered a contractual agreement with some teeth in it (eg. underwritten or bond-backed to the tune of several digits, committing to specific security-related measures that are audited, with a tacit acknowledgement any proven breach of contract in and of itself constitutes damages) it could go a long way to address the privacy issues.
In terms of pricing it feels like the core of your product is an outside vendor's computer-operating AI model, and you've written a prompt wrapper and plumbing around it that ferries screenshots and directives back and forth. This could be totally awesome for a small scale customer that wants to dip their toes into AI automation and try it out as a turnkey solution. But the moat doesn't seem very big, and I'd need to be convinced it's a really slick solution in order to favour that route instead of rolling my own wrapper.
Please don't take this the wrong way, it's just one datapoint of feedback and I do wish you luck with your venture.
mahmoud-almadi 12 hours ago [-]
These points you're making are excellent!
Self hosting is inevitably a part of our roadmap. Cyberdesk will have a future where we host our entire agentic framework on your own servers. AI models and the whole backend included.
I can totally see myself having the same preferences as you if I were you with regards to cost, privacy, and control.
The unique value in Cyberdesk lies beyond being a wrapper around a computer use AI model. Our intelligence caching is built on large evals that help us produce prompts that are highly reliable for the intelligent caching to work well in the first place. On top of that there are several tools that allow the agent to be useful (import/export files, failsafes, taking actions using data that was read during the same run). Rebuilding Cyberdesk, while possible, will require several weeks at the very least of very rapid iteration. So for a dev team that wants to build the best computer use agent in the world, I guess that's doable. But for a team trying to be the best "X" in their particular industry, it's probably going to be a time sink that will take away from their ability to compete well in their space, hence why Cyberdesk is a great choice for them.
I hope you keep an eye on what we're doing! I really like your insights here and I'm curious to see what you think as we evolve over the next months and years. Maybe when we do full self hosting you'll be a customer :)
Unfortunately these scripting tools just are untenable when dealing with so many desktop flows that all have changing UIs and random popups. You end up having to repair all of them all the time, in fact there's a whole consulting industry out there just to do this all day.
The whole idea of Cyberdesk is the prompt is the source of truth, and then once you learn a task once via CUA, the system follows that cache most of the time until you have to fall back to CUA, which follows the prompt. And that anomaly is also cached too.
So over time, the system just learns, and gets cheaper and faster.
gerdesj 12 hours ago [-]
I used AutoIT to remove old AV from roughly 6000 PCs across 20 odd countries back in 2002. I still use it from Zenworks on some customer sites, 20+ years later.
Old school Windows apps are not "flowing" they generally use a toolkit and AutoIT is able to use the Windows APIs to note window handles, or the text in various widgets and so on and act on them.
These are not complicated beasts - they are largely determinant. If you have to go off piste and deal with moguls, you have a mode called "adlib" where you deal with unusual cases.
I find it a bit unpleasant that you describe part of my job as "untenable". I'm sure you didn't mean it as such. I'm still just as cheap as I was 20 years ago and probably a bit quicker now too!
MetaWhirledPeas 14 hours ago [-]
Can it do assertions? This could be useful for testing old software.
sgtwompwomp 13 hours ago [-]
Yup, a few of our clients have a need to verify something in the software, so we support an agentic step where we look at the screen and can verify whether something exists, or whatever a step was completed, etc!
jeffWrld 10 hours ago [-]
[dead]
curtisszmania 19 hours ago [-]
[dead]
lazyninja987 18 hours ago [-]
[dead]
MortyWaves 18 hours ago [-]
Frankly quite insulting to call any Windows app legacy
mahmoud-almadi 18 hours ago [-]
sorry it came off that way! could you elaborate on that thought?
1) The funny thing about determinism is how deterministic you should be when to break, its kind of a recursive problem. agents are inherently very tough to guardrail on an action space so big like in CUA. The guys from browser use realized it as well and built workflow-use. Or you could try RL or finetuning per task but is not viable(economically or tech wise) currently.
2) As you know, It's a very client facing/customized solution space You might find this interesting, it reflects my thoughts in the space as well. Tough to scale as a fresh startup unless you really niche down on some specific workflows. https://x.com/erikdunteman/status/1923140514549043413 (he is also building in the deterministic agent space now funnily enough) 3) It actually is annoyingly expensive with Claude if you break caching, which you have to at some point if you feed in every screenshot etc. You mentioned you use multiple models (i guess uitars/omniparser?), but in the comments you said claude?
4) Ultimately the big bet in the RPA space, as again you know, is that the TAM wont shrink a lot due to more and more SAP's, ERP's etc implementing API's. Of course the big money will always be in ancient apps that wont, but then again in that space, uipath and the others have a chokehold. (and their agentic tech is actually surprisingly good when i had a look 3 months ago)
Good luck in any case! I feel like its one of those spaces that we are definitely still a touch too early, but its such a big market that there is plenty of space for a lot of people.
1) You're totally right about this problem! We handle this issue with intelligent caching and heavy prompt/context engineering. These measures have been controlling agent behavior pretty well.
2) The key to scaling is building a tool that developers can learn and pick up themselves and that's what we're seeing here. By understanding how to use the tools we built to control agent behavior, developers have been able to leverage our docs to achieve desirable behaviors from our computer use agents when using Cyberdesk.
3) Surely you're correct about cost here as well, but with well defined workflows this will only happen a minority of the times the agent runs.
4) Great point! The beauty of what computer use made possible is it can solve problems previously unsolved by RPA altogether. The TAM will increase significantly when computer use agents start working really well. We've already seen this with our customers: they're able to build automations with Cyberdesk that they weren't able to using RPA. So while the TAM might bleed some ERPs and legacy apps that will implement APIs, I think it's going to grow at a much faster rate than it will shrink.
Consider your typical early-2000s era Windows app. It would expect a mouse, but for power users, keyboard shortcuts would be available for every action, even if clunky. For example, Alt F tab tab tab to get to some input field, enter text, tab Alt R Return.
By about 2015 these were all straightforwardly scriptable with AutoHotkey amd similar tools.
But too late: by 2015 even Windows users were using web apps, where the keyboard bindings are variable or non existent, where the entire UI can change overnight, etc. I see some RPA approaches desperately trying to decode the DOM or match pixel elements. It's wild, as you point out.
I guess what I'm wondering if going after legacy Windows apps is a small TAM already largely solved, whereas the SPA/webapp market is gigantic, growing every day, and woefully, miserably, broken as far as automation is concerned.
Also, to have this run in a large scale, Does it become prohibitively expensive to run on daily basis on thousand of custom workflows? I assume this runs on the cloud.
And yes we've found the computer use models are quite reliable.
Great questions on scale: the whole way we designed our engine is that in the happy path, we actually use very little LLMs. The agent runs deterministically, only checking at various critical spots if anomalies occurred (if it does, we fallback to computer use to take it home). If not, our system can complete an entire task end to end, on the order of less than $0.0001.
So it's a hybrid system at the end of the day. This results in really low costs at scale, as well as speed and reliability improvements (since in the happy path, we run exactly what has worked before).
Is it possible to verify that?
You've got two companies that basically built their entire business upon stealing people's content, and they've given you a piece of paper saying "trust me bro".
Digital sovereignty and respect for privacy and local laws are the exception in this domain, not the expectation.
As Max Schrems puts it "Instead of stable legal limitations, the EU agreed to executive promises that can be overturned in seconds. Now that the first Trump waves hit this deal, it quickly throws many EU businesses into a legal limbo."
After recently terrifying the EU with the truth in an ill-advised blogpost, Microsoft are now attempting the concept of a 'Sovereign Public Cloud' with a supposedly transparent and indelible access-log service called Data Guardian.
https://blogs.microsoft.com/on-the-issues/2025/04/30/europea...
https://www.lightreading.com/cloud/microsoft-shows-who-reall...
If Nation States can't manage to keep their grubby hands off your data, private US Companies obliged to co-operate with Intelligence Apparatus certainly won't be.
+1 for honesty and transparency
Audit rights are all about who has the most power in a given situation. Just like very few customers are big enough to go to AWS and say "let us audit you", you're not going to get that right with a vendor like Anthropic or OpenAI unless you're certifiably huge, and even then it will come with lots of caveats. Instead, you trust the audit results they publish and implicitly are trusting the auditors they hire.
Whether that is sufficient level of trust is really up to the customer buying the service. There's a reason many companies sell on-prem hosted solutions or even support airgapped deployments, because no level of external trust is quite enough. But for many other companies and industries, some level of trust in a reputable auditor is acceptable.
My PC is just good enough to run a DeepSeek distill. Is that on par with the requirements for your model?
So if you come across a local model that can do that well, let us know! We're also keeping a close watch.
However, I believe ByteDance released UI-TARS which is an excellent open source computer use model according to some articles I read. You could run that locally. We haven't tested it so I wouldn't know how it performs, but sounds like it's definitely worth exploring!
- Pricing. If I grow to do this at scale, I don't want to be paying per-action, per-month, per-token, etc.
- Privacy. I don't want my data, screenshots, whatever being sent to you or the cloud AI providers.
- Control. I don't want to be vulnerable to you or other third parties going bankrupt, arbitrarily deciding to kill the product or it's dependencies, or restructuring plans/pricing/etc. I also want to be able to keep my day to day operations running even if there's a major cloud outage (that's one reason we're still using this "old fashioned", non-cloud software in the first place).
I think I'm simply not your target market.
I advise several companies who could be (they run "legacy" software with vast teams of human operators whose daily tasks include some portion of work that would be a good candidate for increased automation), but most of them are in a space where one or more of the above factors would be potential deal breakers.
The retention agreements between you and your vendors are great (I mean that sincerely), but I'm not party to them so they don't do anything for me. If you offered a contractual agreement with some teeth in it (eg. underwritten or bond-backed to the tune of several digits, committing to specific security-related measures that are audited, with a tacit acknowledgement any proven breach of contract in and of itself constitutes damages) it could go a long way to address the privacy issues.
In terms of pricing it feels like the core of your product is an outside vendor's computer-operating AI model, and you've written a prompt wrapper and plumbing around it that ferries screenshots and directives back and forth. This could be totally awesome for a small scale customer that wants to dip their toes into AI automation and try it out as a turnkey solution. But the moat doesn't seem very big, and I'd need to be convinced it's a really slick solution in order to favour that route instead of rolling my own wrapper.
Please don't take this the wrong way, it's just one datapoint of feedback and I do wish you luck with your venture.
Self hosting is inevitably a part of our roadmap. Cyberdesk will have a future where we host our entire agentic framework on your own servers. AI models and the whole backend included.
I can totally see myself having the same preferences as you if I were you with regards to cost, privacy, and control.
The unique value in Cyberdesk lies beyond being a wrapper around a computer use AI model. Our intelligence caching is built on large evals that help us produce prompts that are highly reliable for the intelligent caching to work well in the first place. On top of that there are several tools that allow the agent to be useful (import/export files, failsafes, taking actions using data that was read during the same run). Rebuilding Cyberdesk, while possible, will require several weeks at the very least of very rapid iteration. So for a dev team that wants to build the best computer use agent in the world, I guess that's doable. But for a team trying to be the best "X" in their particular industry, it's probably going to be a time sink that will take away from their ability to compete well in their space, hence why Cyberdesk is a great choice for them.
I hope you keep an eye on what we're doing! I really like your insights here and I'm curious to see what you think as we evolve over the next months and years. Maybe when we do full self hosting you'll be a customer :)
The whole idea of Cyberdesk is the prompt is the source of truth, and then once you learn a task once via CUA, the system follows that cache most of the time until you have to fall back to CUA, which follows the prompt. And that anomaly is also cached too.
So over time, the system just learns, and gets cheaper and faster.
Old school Windows apps are not "flowing" they generally use a toolkit and AutoIT is able to use the Windows APIs to note window handles, or the text in various widgets and so on and act on them.
These are not complicated beasts - they are largely determinant. If you have to go off piste and deal with moguls, you have a mode called "adlib" where you deal with unusual cases.
I find it a bit unpleasant that you describe part of my job as "untenable". I'm sure you didn't mean it as such. I'm still just as cheap as I was 20 years ago and probably a bit quicker now too!