
Interview with the CEO of "Hangzhou AI Six Little Dragons" Qunhe Technology: Physical AI is just the beginning of a new era, entrepreneurship is about first making the hammer, then changing the world

Physical AI is just the beginning of a new era. The future world, including homes, office environments, and factories, will be filled with embodied robots or humanoid robots; entrepreneurship is about finding nails with a hammer, but you first need to create the hammer; only China can accumulate these digital assets, while the United States does not have such a large industrial system. There may be no other country in the world with such a massive scale. This scale is so large that it cannot rely on manual production but must depend on automated production in unmanned factories
In a corner of Hangzhou Future Technology City, a technology company that has grown from an attic is using "digital twin + physical AI" as its brush to outline a new landscape for the global robotics industry. It is Qunhe Technology - hailed as one of the "Six Little Dragons of Hangzhou" and an invisible champion in the field of spatial intelligence.
The company's co-founder and chairman, Huang Xiaohuang, recently accepted an in-depth interview with CCTV's "Face to Face" program, revealing his entrepreneurial journey from "looking for nails with a hammer" to "defining spatial intelligence infrastructure."
Here are the key points from the conversation:
Physical AI is just the beginning of a new era. In the future world, say ten or twenty years from now, all devices will be intelligent, including homes, office environments, and factories filled with embodied robots or humanoid robots.
In the tech race, it's not a zero-sum game. Everyone collaborates, and the cake may suddenly grow 100 times or 1,000 times larger.
When I was working at NVIDIA, the entire company's methodology was to first create impressive technology and then spend various costs to find applications. So I was influenced by this kind of methodology, which is essentially looking for nails with a hammer. You first need to create the hammer.
We need to determine whether this track has many players, thousands of them, or just one or two. If it's just serving one or two big clients, it becomes project-based work. We still hope to dream of being a product-oriented company, not a project-oriented one.
I transformed the original physically accurate rendering from taking half an hour to an hour to produce an image, down to 10 seconds.
Pushing this forward is more important than how much money we make. We believe that after creating value, we will definitely make money in the future. But if we don't push this forward, there may be no resources in other parts of the world, and it will be stuck there forever.
(Why can you settle down and find this data while American companies cannot?) This was achieved when we implemented Industry 4.0. So all the digital assets we have accumulated in the digital world correspond one-to-one with the physical world and can be produced and manufactured. This is a very important milestone.
The United States does not have such a large industrial system; I believe there may be no other country in the world with such a large scale. This scale is so large that it cannot rely on manual production but must depend on unmanned factories for automatic production.
Here is the full conversation:
Huang Xiaohuang:
You can see that this simulates an earthquake; you need to train your robot not to fall during the earthquake and to continue working.
Dong Qian:
Why train it in such extreme scenarios?
Huang Xiaohuang:
Because this scenario is difficult to simulate in the physical world, and you need to ensure that the robot can still work or not have issues during an earthquake. You can only train in this digital world and then complete the training.
Voiceover:
In Hangzhou, Zhejiang, this seemingly ordinary office space hides another world, a digital training ground for robots. Here, Qunhe Technology has accumulated significant advantages in the field of spatial intelligence over 14 years of entrepreneurship, not only ranking among the Six Little Dragons of Hangzhou but also holding a significant position internationally
Huang Xiaohuang:
We believe this is just the beginning of a new era.
Dong Qian:
What new era?
Huang Xiaohuang:
It's the era of physical AI.
The Era of Physical AI Has Just Begun
Voiceover:
Physical AI can be understood as artificial intelligence that understands the rules of physics. Only by understanding the rules of physics can autonomous machines, such as robots and self-driving cars, perceive, understand, and execute complex operations in the real physical world.
Huang Xiaohuang:
To put it simply, for example, if you buy a robot to do work for you in the future, it first needs to fall down dozens of times in your home before it can start working for you. Would you be scared? You would definitely be scared because it needs to make all the mistakes in a digital world before coming to your home to work seriously.
Dong Qian:
What can you do during this process?
Huang Xiaohuang:
First, we need to help the robot with training. We are making early layouts around this future world so that humans and machines see the same thing. The image you see now is... what humans see, and below is what the robot sees; the darker areas are closer to you, and the lighter areas are farther away.
Voiceover:
Huang Xiaohuang, as the co-founder and chairman of Qunhe Technology, explains physical AI spatial intelligence and how to train robots, needing to simplify and provide examples continuously. Before the success of the "Six Little Dragons" in Hangzhou, he rarely appeared in the media; essentially, he is a technology enthusiast.
Huang Xiaohuang:
To put it simply, in the future world, say ten or twenty years from now, all the devices around you will be intelligent, including your home, your office environment, and your factory, which will all be filled with giant robots or humanoid robots, including your cameras, etc. One day, they will all be intelligent.
Dong Qian:
What does it mean for cameras to be intelligent?
Huang Xiaohuang:
For example, today a colleague still has to sit behind the camera, but in a few days, it will have its own "brain," working by itself, taking pictures, and conversing.
Dong Qian:
Are you trying to dismantle my team?
Huang Xiaohuang:
No, they might manage ten cameras by themselves, and they will have more advanced tasks to perform.
You can imagine that all the devices around you are currently lifeless, but later they will be alive, serving you. Each robot needs training; they need to perceive this environment. They all need to make deductions in a shared digital physical world before they can collaborate. Otherwise, having ten robots around you doing things "randomly" won't work either.
Dong Qian:
How is this related to the training you just described?
Huang Xiaohuang:
The Special LM we released is used to help devices understand your space.
Voiceover:
Compared to understanding the training of robots, understanding the entrepreneurial process of groups and technology is much simpler.
Dong Qian:
Why? These three words are your company culture: simple, focused, open.
Huang Xiaohuang:
This was also distilled at that time, mainly emphasizing simplicity, so that communication at work should be simpler and avoid a lot of bureaucracy. Then I might have mentioned focus, and Chen Hang might have thought of openness, which everyone later agreed upon, and this became the DNA of the company that has been remembered ever since.
Voiceover:
In 2007, Huang Xiaohuang graduated from the Zhuke Zhen College of Zhejiang University and went to the University of Illinois at Urbana-Champaign in the United States to pursue a Ph.D. His research direction was high-performance computing using GPU graphics processors. Before completing his studies, he joined NVIDIA, where his main work was to develop programming frameworks for parallel computing for GPU chips and the ecosystem. However, just a year later, he decided to leave NVIDIA.
Dong Qian:
Are you interested in this matter?
Huang Xiaohuang:
Yes, very interested.
Dong Qian:
Since you are very interested, why did you leave after just a year?
Huang Xiaohuang:
Because at that time, NVIDIA's entire system was still focused on desktop computers.
If you go back ten years, you can imagine that all the media were saying that the future was mobile and then the cloud era. But who would think that desktops and even laptops might disappear? At that time, I told my boss in the group, why not do cloud computing, that is, move the GPU to the cloud? Why still do research for desktops? Then...
Dong Qian:
Why do you think NVIDIA didn't think of such an idea?
Huang Xiaohuang:
I was also puzzled at first; I thought the company's stock price was quite low at that time. I wondered if it was a bit outdated.
Dong Qian:
You felt that NVIDIA was limiting your idea at that time. When you decided to leave, how much preparation did you need to make to leave?
Huang Xiaohuang:
I was actually quite young and impulsive at that time, and I didn't make much preparation.
Dong Qian:
Did you have any...
Huang Xiaohuang:
At that time, I had nothing. I thought young people were like that. After discussing my ideas with the manager, he said the company was not planning to go in that direction for now, that they weren't doing cloud computing or mobile, so was there still a future? Then I might as well do it myself.
First Build a Hammer, Then Change the World
Voiceover:
In 2011, NVIDIA was still considered a consumer electronics hardware company by mainstream opinion. Although Geoffrey Hinton was already using NVIDIA's GPUs to train deep neural networks at that time, most people had not yet realized that the parallel computing power of GPUs would become the cornerstone of computing power for the future explosion of artificial intelligence
At this time point, Huang Xiaohuang sees the potential of combining the super computing power of GPUs with cloud deployment.
He invited Chen Hang from Zhejiang University and Zhu Hao from Tsinghua University to start a business together. The entrepreneurial direction is to use GPUs for fast rendering of graphics and images in the cloud. Rendering refers to the process of converting three-dimensional models or scenes into two-dimensional images or videos through algorithms.
Huang Xiaohuang:
At that time, I also sold NVIDIA's stock, and then I and my partners pooled together hundreds of thousands of RMB. Was that enough? At that age, I didn't have a particularly good concept of money.
Dong Qian:
How old were you?
Huang Xiaohuang:
I was about 25 years old. At that time, I thought hundreds of thousands was quite a lot, because you could buy a house in China for that amount over a decade ago.
Voiceover:
In a very short time, the young founding team assembled a high-performance GPU cluster that coordinated end and cloud using low-cost graphics cards, significantly reducing computing costs and achieving faster computing speeds. However, the popular concept in the investment circle at that time was still mobile internet. When Huang Xiaohuang sought funding in Silicon Valley, he was invariably rejected. During the most difficult period, coinciding with Zhejiang Province's investment promotion in Silicon Valley, Huang Xiaohuang and his partners decided to return to China to start a business.
Huang Xiaohuang:
After returning to China, we started our business in an attic.
Dong Qian:
Why did you come back to an attic?
Huang Xiaohuang:
Yes, because back then, starting a business in a garage or an attic felt very cool. Later, I realized that this environment was completely unsuitable for the ecosystem in China. It was just too strange to bring employees to interview in a bedroom.
Voiceover:
In 2012, Hinton led students to crush traditional algorithms in an image recognition competition using deep convolutional neural networks, opening a new chapter in the AI revolution, and GPUs became famous overnight. Through cooperation with Amazon, NVIDIA began to enter the cloud services battlefield. At that time, the young team of Qunhe Technology was running on the road of using a hammer to find nails, and their hammer was a physically accurate rendering engine using GPUs. Physically accurate means that the rendered image is consistent with the real physical world in various parameters.
Huang Xiaohuang:
I changed the original physically accurate rendering, which used to take half an hour to an hour to produce an image, to 10 seconds.
Dong Qian:
Why do you insist on the point of physical accuracy?
Huang Xiaohuang:
Because physical accuracy has stronger universality and stability.
Dong Qian:
For someone immersed in technology, have you ever thought about the connection between this technology and the real world?
Huang Xiaohuang:
At that time, I actually didn't think that much. I didn't think that much. When I was working at NVIDIA, the whole company's methodology was to first create very powerful technology and then spend various costs to find applications. So the influence I received was roughly this methodology, which is to put it bluntly, to take a hammer to find nails. You first need to create the hammer.
Voiceover:
This hammer can be used for movie special effects rendering, but the time to recover costs is too long. It can also be used in the gaming industry, but at that time, mobile games did not have high requirements for graphics quality, so ultimately their technology fell into the home decoration industry.
Huang Xiaohuang:
We need to determine whether there are many companies in this track, thousands of them, or just one or two.
Dong Qian:
What’s the difference?
Huang Xiaohuang:
If we are only serving one or two big clients, it becomes project-based work. We still hope to realize our dream of being a product-based company, not a project-based company.
Dong Qian:
So why do home decoration companies need you? And how did they become influential?
Huang Xiaohuang:
At that time, it was precisely during the rapid rise of the real estate market in China. So when they signed contracts, it used to take a week to create a rendering, and by then, it might have already been snatched away. So speed is time for them, which is competitiveness.
Dong Qian:
So your emergence just happened to meet their needs.
Huang Xiaohuang:
I remember the first offline event we held, where we showcased our demo, and the whole venue erupted. We swiped the POS machine and burned out two machines.
Dong Qian:
Swiping the POS machine sounds wonderful, which means making money every day. Are you more focused on the money coming in, or the market emerging?
Huang Xiaohuang:
The market is emerging, and someone is really willing to pay for it, indicating that this is a real demand, not just talk.
Voiceover:
However, as the user base expanded, the technical challenges for Huang Xiaohuang and his team also increased exponentially.
Huang Xiaohuang:
The most embarrassing moment I remember was when I went to demonstrate on-site for a client, and halfway through the demonstration, the server crashed, and the entire cluster went down. I could only say I needed to go to the restroom, hiding in there while constantly messaging, asking if it was fixed, and waiting to come out until it was repaired.
Dong Qian:
What problem lies behind this embarrassing moment?
Huang Xiaohuang:
Behind it, it actually still requires technical accumulation. The most typical issue we encountered was that the temperature would often exceed 100 degrees, causing the graphics card to start malfunctioning. We had to think of various solutions to help cool it down. As explorers in this field, there were indeed many unexpected problems, which posed significant challenges for us at that time.
Voiceover:
In 2013, Qunhe Technology launched its flagship product CoolJiaLe, a space design software that became an instant hit due to its 10-second rapid rendering capability, attracting a large number of designers and becoming the preferred design software in the home furnishing industry.
Dong Qian:
In this process, if you insist on physical correctness, can you also collect data from thousands of households?
Huang Xiaohuang:
I think we should call it accumulation rather than collection. Accumulation means that when they use our product for design, their work will continuously accumulate here
Dong Qian:
What kind of data is being accumulated? Can you give an example?
Huang Xiaohuang:
It includes every single component inside, even where a nail should be, or where a hinge should be; all of this data exists.
Voiceover:
The expansion of the industrial chain and data scale behind the home decoration industry has naturally allowed Huang Xiaohuang and his team to extend their technological advantages to Industry 4.0. Physically accurate data allows design drawings to directly connect with factory production, and this step brings more data accumulation.
Huang Xiaohuang:
I vaguely felt that there must be some value we hadn't discovered in this, so we formed a research team to study it and publish papers.
Dong Qian:
While you are accumulating data, you are also conducting your own research on this data.
Huang Xiaohuang:
Researching how to use it. Before 2018, we were discussing internally; I clearly had a gold mine, right? But I just couldn't extract the gold, that's the feeling.
Embrace Openness to Outpace the Times
Voiceover:
In 2018, based on the massive accumulation of indoor space data from its own business, Qunhe Technology collaborated with several domestic and foreign universities to launch the Interior Net dataset.
Prior to this, there were already many well-known datasets internationally, but most were static or non-interactive data. Interior Net is one of the few datasets composed of interactive 3D data and is the largest indoor scene cognitive deep learning dataset in the world. Most importantly, it is a free and open-source dataset.
Huang Xiaohuang:
Before 2018, we had no idea how to train this. So we decided to open up the data.
Dong Qian:
What did you gain from this?
Huang Xiaohuang:
At that time, our company's cash flow was positive, and we were making good profits, so we didn't think too much about making money. Secondly, we had been exploring internally for two or three years and really had no clue. Rather than let it rot in our hands, it was better to let everyone try it together. Maybe at some point, someone would have a brilliant idea and discover something.
Dong Qian:
Why did you have such an idea? I would rather let others use it than let it rot in the pot.
Huang Xiaohuang:
To be fair, I think being interested in this is more important for pushing this forward than how much money I can make. We believe that after creating value, we will definitely make money in the future. But if we don't push this forward, there may be no resources in other parts of the world, and it will be stuck there forever.
Dong Qian:
If I can't do it myself, then I will gather the best minds from around the world. From me.
Huang Xiaohuang:
What I gain is that I believe these things should belong to the wealth of all humanity. Everyone should work together to break through this, and who can make money later depends on your luck; otherwise, there won't even be an opportunity. But if you keep it to yourself and can't research it, and others have no conditions to research it, then this track will just die, right?
Dong Qian:
Is it possible that while you are hiding something, others might discover a mine, and even a bigger one, making you less valuable?
Huang Xiaohuang:
It is possible, and that risk does exist. At least you still have a mine in hand, and you know others can extract from it. At worst, you can cooperate, right? So our company's value has always been openness. We believe that especially in the tech sector, it's not a zero-sum game. Yes, by cooperating together, the cake might suddenly grow 100 times or 1000 times larger. If you don't cooperate, everyone might end up with nothing.
Voiceover:
Shortly after the dataset was opened, Qunhe Technology received an email from a tech giant in Silicon Valley, hoping to collaborate with them.
Huang Xiaohuang:
At first, I thought it was a scam.
Dong Qian:
How could you think that?
Huang Xiaohuang:
Because I never imagined such a big player would seek collaboration with us.
Dong Qian:
If you were to make a comparison, if it were an elephant, what would you be?
Huang Xiaohuang:
Maybe not even an ant.
Dong Qian:
Actually, the effect of open-source became apparent immediately. Yes, once you contribute your own capabilities, others start to build on it, and you can climb up step by step.
Huang Xiaohuang:
Yes, definitely because for such big players, their research institutes are much stronger than ours. After collaborating, they published papers, and we realized from the papers that it could be done this way. These things were actually impossible to achieve relying solely on our own strength, as we assessed at the time.
Voiceover:
At that time, the tech giant was struggling with a lack of large amounts of physically accurate synthetic data for robot training. This collaboration allowed Qunhe Technology's dataset to be applied for the first time in spatial intelligence training.
Huang Xiaohuang:
Gradually, everyone realized that this path was feasible.
Dong Qian:
Hmm, which path is feasible?
Huang Xiaohuang:
The path of spatial training, the understanding of space, is feasible, and it works for synthetic data. This means that all your future devices will generate intelligence in the physical world.
I don't know if you used a robotic vacuum cleaner ten years ago; it could only blindly sweep around the house in a certain zigzag pattern. But at that time, I saw some academic papers suggesting that robotic vacuum cleaners could understand their environment like humans. If you tell it to clean under the table, it should know where the table is.
At this point, understanding the environment becomes very important. When the devices around you can comprehend the world in front of them, they can have many new functions. We first need to turn it into a thriving, massive ecosystem, so that we can share in the "cake."
Dong Qian:
So can you, by your own efforts, make it thrive and become huge?
Huang Xiaohuang:
We don't call it individual effort; I am contributing my modest strength for the entire industry. It's not just us; the whole industry should be doing this. This includes autonomous driving, which is actually part of this machine intelligence; the power in the autonomous driving sector might be even greater.
Voiceover:
In the real world, training robots is costly and difficult to scale, while using data to train robots faces the bottleneck of a scarcity of high-quality 3D data.
Synthetic data is therefore a more cost-effective and potentially limitless source of training data. The dataset launched by Qunhe Technology has been adopted by several universities, including Imperial College London, the University of Southern California, and Zhejiang University, becoming one of the representative infrastructures for indoor AI vision training.
Dong Qian:
I have a question: why are you able to find and accumulate these physically accurate data, while some companies, including those in the United States, have not been able to do so?
Huang Xiaohuang:
This was achieved when we were working on Industry 4.0. So all the digital assets we have accumulated in the digital world correspond one-to-one with the physical world and can be produced and manufactured. This is a very important milestone.
The United States does not have such a large industrial system, so the companies they had previously were not very useful. I think there may not be another country in the world with such a large scale. And this scale is so large that it cannot rely on manual production but must rely on automated factories.
Dong Qian:
So the times create people; sometimes, you say you have this hammer, but if you miss this era, you might not find such a nail to hammer.
Huang Xiaohuang:
Yes, in a way, it is also quite fortunate. But on the other hand, we have always felt that this technology is valuable.
Voiceover:
In March 2025, Qunhe Technology released and open-sourced its self-developed spatial understanding model Special LM. Combined with the previously released spatial intelligence platform Special Words, it allows robots to complete a full closed-loop training from cognitive understanding to action interaction. With the explosive growth of embodied intelligence, Qunhe Technology has new possibilities to become one of the cloud infrastructure giants for spatial intelligence training.
Dong Qian:
If we make an analogy, what would your Special LM and Special Verse together correspond to in our current large language models?
Huang Xiaohuang:
Special Verse is like the corpus of a large language model, while Special LM is like the large language model itself. Currently, it should still be relatively primitive; I think it is roughly at the stage of GPT2.5—3.0
Dong Qian:
But you are unique in doing this, right?
Huang Xiaohuang:
So we will continue to iterate in the future.
Dong Qian:
In a way, you are like a company such as ChatGPT.
Huang Xiaohuang:
Yes, but they are closed, and we are open.
Dong Qian:
What kind of differences will your openness and their closure bring?
Huang Xiaohuang:
I focus on our business in the next ten or twenty years. We first need to lay the infrastructure, and then our real capabilities can be unleashed. I believe that for this generation of entrepreneurs in China, embracing open source may create greater value.
Dong Qian:
So this brings us back to what your original intention for starting a business was. Even now, what drives you?
Huang Xiaohuang:
We have always believed that if you know your technology has value and this track is thriving, you will definitely get a share of the pie. Moreover, you have to be interested in doing this; even if you fail, you will still be very happy, excited, and fulfilled during the process. Ultimately, even if you don't make money, you will feel that it was worth it