In just 2 days, we built a system of robots for assisting you on your desktop from scratch. You can do tasks such as desk cleaning or organization just by talking to a voice agent.
For example, you can tell the robot to "put the screwdriver away" and it would pick up the screwdriver on one end of the table and put it to the bin at the other end of the table.
Behind the scene, this is a group of multiple systems working together.
First, the robots are orchestrated by a voice agent running on LiveKit. This agent has access to the overview of the table and can use such to make plans based on user-given commands.
If the user gives a task like clean the table, the agent would first check what objects are on the table and utilize the tools it have to execute the goal.
The agent has access to 3 tools.
The first tool is move_to, which is a P loop for controlling the robot slider to any object on the table. The robot is localized using an April tag while the object is localized using a VLM.
The second tool is run_policy, which allows us to run any of our trained ACT policy on demand.
We collected data and trained 2 policy at SPC:
The third tool is run_molmo, which utilizes MolmoACT2, a VLA that has proven great generalization capabilities on various robot embodiments. We fine-tuned this on all the dataset we gathered at SPC, so the model can better adapt to our embodiment. Results show that the model can reliably pick up and localize objects that were not in trained dataset, showing its capabilities to generalize. However, the model itself is jittery and takes a long time to converge, we suspect this is due to our naive method of remote inference and state sampling on the SO101 arm.
Behind the scene, the entire system is powered by our own arbitration and network infrastructure. The robot is not controlled from a single computer.
The voice agent lives on one laptop. The ACT policies live on one laptop. The slider control lives on one laptop. MolmoACT2 lives on a H200 instance in Finland. This is the power of our prior: LiveKit Portal.
The datasets, models, and source code of this project can be found here: https://github.com/livekit-examples/embodied-ai-hackathon. The priors used in this project are as following:
Through this project, we showcase how robotics is a systems problem, how humans can interact with robots, how robots should be deployed in the wild, and how the future looks like.
At last, a big thank you to SPC for hosting this awesome hackathon. We’re really grateful for the help and the hospitality that the partners and members there have given us.