Original date published: June 22, 2026
What I've worked on and What I'm working on now
I've been reflecting on how I've made impact with my career, how I would like to make future impact and what problems are of deep interest to me in this current chaotic technology landscape
I started my professional career building large-scale data pipelines to gather distributed health data for early pandemic and health outbreak detection in developing countries. This involved creating efficient data gathering tools that could be easily deployed in local hospitals and low-resource environments. In addition, I implemented a data search and scraping tool that would gather health related news, reports and surveys from relevant websites to create an enriched web of information we could query along with the data directly gotten from the hospitals, data aggregators and other data vendors. This was my first attempt at leading/managing an engineering project, I was able to see how my work could drive policies and decisions that saved actual lives.
During my MSc I had the opportunity to dive ever deeper into curiosity in AI and robotics, LLMs were just about taking off, and agents were just within the fringes of possibility. I decided that LLMs would be very effective for creating embodied intelligence, so I embarked on making an LLM effective for controlling a robot arm in a collaborative setting with a human, the human could issue voice commands and the robot would execute tasks within its scene based on those user commands. I continued the investigation as a research associate exploring how foundation models could be used beyond maniupation to more complex interaction and navigation.
Following my masters, I got involved with a project funded by Innovate UK for building vehicle telemetry and route optimization vertically integrated systems for home-to-school transportation vendors of the Birmingham City Council. During the project I was majorly forward deployed with the largest logistics vendor, working directly with their operations and maintenance teams to build telemetry hardware as well as an AI software pipelines for vehicle route optimization and vehicle predictive maintenance and orchestration. The returns of the project are currently projected to be a 10x return on the initial investments.
Through the journey, I got to work across the entire technology stack, working on both embedded systems designing PCBs, circuits and hardware components to robotics engineering hacking ROS nodes and working on visual calibration to make trajectory executions reliable, software engineering across both infrastructure deployments, to backend, frontend and mobile enigneering, systems thinking and systems engineering etc. I’ve learnt a lot from these experiences, but now I think mostly about what is next for my career, the world is changing rapidly with AI, there is an avalanche coming for the entire economy and would create a period of deep uncertainty and opportunity. What are the most important problems I care about now that are going to be extremely important in the future? I’ve articulated some of the problems I am excited about and would be focusing on moving forward.
I’ve spent a lot of time away thinking about what to work on next. There are 3 key market trends and positions I have observed over the past few months; Disaggregation of the frontier labs into various independent vendors forming a robust AI ecosystem, security and safety of agents are no longer buzz terms but are practical requirements of teams looking to unleash AI within their companies and personal lives, finally physical AI infrasturcture and tooling lags significantly behind its digital counterpart.
After a few weeks of synthesis, here are some open problems I am looking to explore in the coming months:
-
Domain specific evaluation construction tooling for enterprises and AI native companies/labs/data vendors
Agentic evaluations currently skew towards software engineering or generalist use-cases. As enterprises and AI native companies start to deploy their own AI tooling, they are now evaluating open-weights models as well as proprietary models. While frontier proprietary models lead in benchmarks, the benchmark performance often does not represent the deployment needs of the organization, often cheaper, less performant models may be more than enough to meet the deployment needs of the organization. Yet the bills continue piling for proprietary model use. Building tools that allow such organizations easily create/generate evaluations to explore the pareto frontier of performance, cost and latency for their AI deployment use-case.
-
Runtime safety and security for AI agents
AI safety is a real risk factor in AI deployment, both from simple model misbehaviors to adversarial actors taking advantage of open AI systems. As companies are now considering moving away from API provided frontier models to more economical open weights models they will now be directly responsible for the security of their AI models and agents. The biggest hindrance to wide adoption of AI agents such as OpenClaw and Hermes is the immense security vulnerabilities that these AI systems expose you to. There has been a lot of work on red teaming AI models, I’m interested in creating blue teaming techniques to secure and control AI deployments from being harmful to themselves or their users.
-
Computer/Mobile Use Agents
There has been an explosion of terminal use agents, in fact what makes agents like ClaudeCode and Codex very powerful is their ability to execute arbitrary terminal commands to achieve a particular outcome. While this works for most use cases as evident by the broad and rapid adoption of AI agents, there is still a gap in having the agent manually manipulate the user interface. A lot of people have flagged this as a dead end and not necessary, most modern tools have APIs and those that don’t simply can be bypassed by writing a scraper or some type of RPA software. These are valid arguments and are true. The goal is not just to have the agent execute task but to have the agent execute the task within the same context and environment as the “user”. I believe that having computer use agents working reliably and under tight time boundaries, while optionally being collaborative with the user will create a new wave of opportunities in AI applications, and downstream applications.
-
Visuospatial memory systems for Physical AI agents
We have seen a rise in the use of epistemic memory for AI agents through advanced retrieval techniques such as GraphRAG. These have been an incredibly valuable infrastructure asset as a good memory system is critical for agent performance in long-horizon/complex tasks. Physical AI agents are AI agents that are embodied in some form to perceive and in some cases manipulate the physical world. This means that they are operating on an information space that is beyond text. Rather they are acting on a vast amount of visual data from camera sensors. Passing all of this information into the context would be practically impossible, understanding the right video frames and at the right time steps to pass into the context in the most efficient way possible is an incredibly challenging open problem that the current AI memory providers are not positioned to solve. They have designed their systems around non-temporal large amounts of text data. This problem involves working with large streams of vision data that is both space and time dependent, i.e the information retrieval would not just be on the content but relevant physical meta data about the data point. This would unlock a lot of new opportunities in personal robotics, intelligent smart glasses and AI visual inspections.
These are the core ideas I’m going to be exploring in the coming months. I’ll share my findings and expand on them as I gather more context on the problem, and interact with more users and engineers tackling these challenges.