Wang Jian: Why Alibaba "de-IOE"

by anonymous on 2013-08-08 10:41:02

From "commercial software" and "open source software," to the final evolution of independent technology and cloud computing service capabilities, what kind of changes are happening in Alibaba's IT backend?

In the past year, the high-profile Alibaba has garnered plenty of attention.

From the colossal transaction amount of RMB 19.1 billion on "Singles' Day," to its rapid split into seven companies and 25 branches, then to its aggressive expansion in the financial industry, and large-scale acquisitions in the mobile internet sector... The Ali ecosystem constructed by Jack Ma is evolving from a city-state into an "empire."

In reality, behind all these high-profile moves as Alibaba transitions from a city-state to an "empire," there is one low-key individual who plays a crucial role: Dr. Zhou Jingren.

In 2008, Zhou Jingren joined Alibaba as the Group Chief Architect, now known as the Chief Technology Officer (CTO). This former Deputy Director of Microsoft Research Asia was tasked by Jack Ma with helping establish a world-class technical team for Alibaba Group and responsible for building the group's technical architecture and foundational technology platform.

After joining Alibaba, Zhou Jingren, with his technical background and scholarly demeanor, proposed the concept of "de-IOE" (removing IBM minicomputers, Oracle databases, and EMC storage devices in IT construction) and began integrating the essence of cloud computing into Alibaba's IT DNA.

This work transformed Alibaba's IT development strategy from reliance on "commercial software" to embracing "open-source software," eventually evolving into having strong independent technology and cloud computing service capabilities. It also laid the groundwork for the establishment of "AliCloud Computing Co., Ltd." in 2009. On July 28, 2011, AliCloud’s self-developed "Apsara" cloud computing platform began providing commercial cloud services publicly. A flexible IT infrastructure supports the rapid changes in Alibaba Group's business and outlines a new, highly potential profit model—cloud services—beyond Taobao and Alipay core businesses.

In October 2012, at the AliCloud Developer Conference, Zhou Jingren claimed, "AliCloud can achieve break-even within 24 months." Ten months later, during an interview with Business Value, he still stood by this promise. As Alibaba's touchpoint in the IT field, "AliCloud" holds boundless imagination for its future.

Meanwhile, the "de-IOE" idea initially proposed by Zhou Jingren has never stopped. In fact, Alibaba's "de-IOE" movement sparked a new trend in the IT infrastructure construction of large enterprises and put significant pressure on international giants like IBM and Oracle to transform in China.

"Platform, finance, and data" are the three major businesses promised by Jack Ma for the Alibaba Group. The IT framework of the Alibaba Group supports the development of these three businesses; "AliCloud" becomes the important IT touchpoint for Alibaba to extend its reach to more small and medium-sized enterprises. With an investigation into Alibaba's IT layout, Business Value publisher Liu Xiangming engaged in a dialogue with Alibaba's Chief Technology Officer and AliCloud President Zhou Jingren at the AliCloud headquarters in Hangzhou. ITValue will present this conversation in four serialized articles.

[Part One]

Zhou Jingren: Why do I oppose some companies' "de-IOE" movements?

In the office of Alibaba's Chief Technology Officer and AliCloud President Zhou Jingren, there is a wall filled with books. Among them, "Breasts and Money," "Steve Jobs Biography," and "Principles and Paradigms of Distributed Systems" are placed on the third row to the right.

Using these three books to summarize Zhou Jingren seems quite fitting. With a background in engineering psychology, Zhou leads his team with magical realism, naming all AliCloud products after traditional mythological gods: the core technology engine is called "Apsara," a water-loving deity; the collaborative scheduling system is named "Nüwa"; the "distributed file system" is called Pangu, etc.

Since joining Alibaba Group in November 2008, Zhou Jingren has been switching between roles as the Group CTO and AliCloud President. The entire wall of his office is covered with "Cloud OS" effect diagrams, and late-night meetings in the "Zhongkui Dao - Dispute Resolution Room" next to the office have become routine. Zhou Jingren exhibits the sensitive and persistent traits of a product manager.

In Zhou Jingren's view, cloud computing is a revolution—a revolution that disrupts the old mindset of IT construction in the traditional software and hardware era through services. Zhou summarizes the relationship between the "de-IOE" movement and AliCloud as follows: "De-IOE" fundamentally changed the foundation of Alibaba Group's IT architecture, laying the groundwork for embracing cloud computing and producing computing services. The essence of "de-IOE" is decentralization, making it possible to use commodity PCs that can be purchased anywhere, which is the primary condition for cloud computing to take root.

From "de-IOE" to embracing open-source technology, from supporting the entire group's IT needs to the future development of AliCloud, these aspects give Zhou Jingren a somewhat mysterious aura. Members of the ITValue community (China's largest knowledge-sharing-based CIO interpersonal community) asked questions via posts and WeChat discussion groups, and Zhou Jingren believes that: "Exchanges with CIOs will become the source of AliCloud's competitiveness, and Chinese users are pushing cloud computing to new heights."

Q: Why is "IOE" a problem for the development of Internet companies? What level of technical reserve does "de-IOE" and open-source require for IT teams? (By Wang Hua, IT Director of Bausch & Lomb China)

A: Different people understand the reasons for "de-IOE" differently. What I fear most is reducing the reasons for "de-IOE" to two extremes: one is purely transforming it into a cost issue for enterprises, and the other is simplistically turning it into a question of whether to use foreign products and technologies. Cost reduction is the first visible benefit of "de-IOE," but the fundamental reason is that in the Internet era, not just Internet companies, but most enterprises find it difficult to meet their computing needs with the technologies provided by IOE, constraining their long-term development; relying on specialized hardware equipment in terms of technical path is dangerous, while the architecture of commodity PCs available everywhere is safest for Alibaba and most enterprises in the long run. Regarding costs, I would say that today all discussions about open-source technology only address the issue of software usage costs, ignoring the upgrade and maintenance costs of open-source software.

Theoretically, as long as computational power is sufficient, "IOE" can certainly be removed! But in practice, "de-IOE" poses technical challenges and risks. For the majority of enterprises, "de-IOE" is not simply changing software and hardware itself, replacing old software and hardware with new ones, but rather replacing old methods with new ones, thoroughly changing the IT infrastructure using cloud computing.

"IOE" is a product of the software era or the "buying computers" era. In the cloud computing era, it transforms into buying "computing." The best solution to "de-IOE" is adopting cloud computing, not replacing existing machines with new ones. This is an industry change rather than a strategic choice, and the challenge lies in whether you accept the disappearance of so-called "private clouds," transitioning from trusting traditional software and hardware vendors to trusting cloud computing as a secure service.

Q: Will the difficulties in IT construction be resolved when enterprises all adopt open-source technology, and there are no longer software suppliers but only service suppliers? (By Bing Zhe, CIO of Ningbo Fangtai Kitchenware Co., Ltd.)

A: I want to explain this issue based on the technological development of Alibaba Group itself. Alibaba's technological development evolved from mainly relying on commercial software, to open-source, and further developed into more proprietary technology or cloud computing.

Alibaba initially depended on commercial software, from having an Oracle RAC database cluster with over 20 nodes (the largest in Asia at the time), to becoming one of the best enterprises in developing and using open-source software MySQL, to researching and developing its own relational database OceanBase, which has already been applied in different business scenarios. From processing data with Oracle to using Hadoop clusters (one of the Hadoop clusters with the most nodes in a single cluster in the industry), and then using ODPS on its own Apsara platform. This evolutionary path shows that commercial software, open-source software, and proprietary technology always complement each other. For different enterprises, it's just a matter of percentage. For large Internet companies, proprietary technology becomes very important. Many of Alibaba's proprietary technologies, such as the core platform of Apsara, are made available to others through cloud computing, so cloud computing is another way suitable for many enterprises. I believe that in the future, for most enterprises, cloud computing platforms + proprietary technology + open-source technology will become increasingly important.

In many contexts, people habitually equate openness directly with open-source, but open-source is not the only means of openness. In the Internet era, cloud computing is a new form of openness. In many business and application scenarios, compared to cloud computing, open-source may not necessarily be the best approach. Open-source software is still software, and it is a product of the software era, requiring significant investment in maintenance and upgrades. Today, the development of cloud computing benefits from the development of open-source software, but the emergence of cloud computing will also impact the application of open-source software. For example, when cloud computing provides relational database services, you need to consider whether you should purchase or use commercial database software or open-source database software. AliCloud's cooperation with institutions like CODE.CSDN and Open China aims to better integrate the open-source community and technology with cloud computing to serve users well.

In the Internet era, software running in data centers faces operational challenges due to its complexity. Transitioning from using software (including open-source software) to operating software in data centers represents a fundamental change. Most enterprises lack this service capability, and social division of labor cannot require every enterprise to undertake such operations. Cloud computing gives enterprises an opportunity to enjoy the best services and computing power without purchasing software. This is why, in the Internet era, the basic pattern will be cloud computing platforms + open-source software + proprietary technology, and traditional commercial software that relies on license fees and service fees will have less and less market.

Q: What are the technical systems of AliCloud, Tmall, and Taobao? How do they collaborate during the de-IOE process? (By Long Geng, Technical Director of China Southern Airlines)

A: The actual technical applications of Alipay, Tmall, Taobao, and AliCloud indeed differ, and we encounter the same challenges that all companies face, but we are fortunate that Alibaba's entire technical system works in synergy.

The overall efficiency of Alibaba's technical collaboration can be illustrated by several key examples: First, during the "de-IOE" process, a consensus and collaboration on the technical direction gradually formed. When Taobao started doing this, without mutual collaboration among technical, product, and business teams, it would have been impossible. Second, in 2011, all the technical backend operation and maintenance departments of the Alibaba Group were centralized under the CTO, forming a unified technical support department. People from different subsidiaries and divisions had to integrate from tools to concepts, and the results showed that this method was very successful in adapting to the rapid development and changes of the group's business and could handle the challenges of technical services in large Internet enterprises. The technical and organizational challenges we faced and the tuition we paid can serve as lessons for other enterprises. Third, business-driven technical collaboration, such as Taobao's "Jushita" and Alipay's "Jubao Pen," naturally resulted in the operation of customers' IT systems on AliCloud's "Apsara" platform.

Q: How does AliCloud eliminate users' concerns about cloud security? How is user access speed ensured? Are there any simple exit mechanisms and migration tools? (By Zhu Mingsheng, Vice President of Zhoushan Yi Hotels and Resorts Group)

A: The essence of security concerns is trust issues, and the only way to solve trust issues is transparency. AliCloud welcomes all CIOs to challenge cloud security issues, and transparency is truly the only way. At a meeting of provincial telecommunications administration bureau directors convened by the Ministry of Industry and Information Technology, I also said that we have a strong demand for government regulation! We can think of a method, starting from your CIO club, to establish a very transparent mechanism, and AliCloud is willing to be regulated.

User access speed involves two aspects: one is the speed of the Internet itself, and the other is the response speed of the service. Today, our cloud computing services solve the interconnection problems between different operators, and the speed of the Internet itself is sufficient to meet the needs. However, the architecture of application services can also affect response time, which requires technical improvements. Many people move their services and applications to the cloud by directly transferring the original architecture, which is something we need to work together with enterprises to solve.

Regarding the exit mechanism, many of our current customers have migrated from Amazon, and similarly, some customers have moved elsewhere because they were dissatisfied with our services. Cloud computing is sticky, but today it is an open competitive environment, and no one has the ability to confine themselves to a specific area. We also provide corresponding migration tools, but any migration comes at a cost. I believe this can be discussed alongside trust issues and requires common norms from everyone.

In 2012's "Singles' Day," Alibaba created a daily trading miracle of RMB 19.1 billion. The AliCloud computing platform processed orders for 20% of Taobao and Tmall merchants, achieving zero failures and zero order omissions despite a sudden surge in traffic. On the cloud platform, some Tmall merchants even reached nearly 600,000 orders per day, with transaction volumes matching the total one-day order volume of China's B2C market outside of Taobao at the time.

Prior to the "Singles' Day" event, some merchants did not fully trust the AliCloud platform and continued to deploy on their own IT infrastructure. When the transaction volume surged during the event, and server capacity became insufficient, Alibaba quickly assisted these merchants in migrating to the AliCloud platform within minutes, ensuring smooth transactions. Some companies completed this migration around midnight.

This article is the second part of a dialogue between Liu Xiangming, publisher of Business Value, and Zhou Jingren, Chief Technology Officer of Alibaba and President of AliCloud, at the AliCloud headquarters in Hangzhou:

Alibaba's Secrets Behind Singles' Day

Q: What key issues must be resolved for Taobao to successfully handle the high traffic and transactions on "Singles' Day"?

A: Taobao's Singles' Day must resolve three key issues: payment issues, Taobao's own issues, and issues related to Taobao's clients.

The difficulty with payment lies with the banks. Taobao and Tmall's total transaction amount was RMB 19.1 billion, with 102.8 million transactions. For Taobao, knowing the exact number of settlements is very important because, under conditions where banks cannot bear the load, as many transactions as possible need to be completed through Alipay. The technical capability of Alipay determines the system's load-bearing capacity.

In the past, when transaction volumes were too high, bank support would encounter problems, and Alipay would queue up transactions, delaying them. However, the transaction volume on Singles' Day was too large, and prolonged delays would cause significant user experience issues. Therefore, Alipay launched an early activity encouraging users to pre-deposit money into Alipay, reaching a scale of tens of billions at the time. This alleviated the pressure on banks, shifting the burden directly onto Alipay's system, allowing the transactions on Singles' Day to proceed smoothly. "Singles' Day" demonstrated that Alipay's payment system is world-class.

Secondly, Taobao's own challenge lies in maintaining system stability and completing astronomical numbers of transactions when confronted with sudden concurrent traffic and unexpected situations. Similar to how movie theaters build safety passages, but in emergencies, audiences might not follow the emergency channel prompts and end up getting hurt, it is impossible to predict user behavior in emergencies. Similarly, Taobao cannot predict user behavior in emergencies. Places that normally have no traffic might suddenly see a frightening spike, and any local issue could potentially evolve into a global one. This is the challenging aspect. On Singles' Day, nearly a thousand technical and business personnel from the Alibaba Group sat together on one floor, using a command system to handle problems and preparing nearly 500 contingency plans. Resolving issues on-site would definitely be too late, and a single wrong keystroke could crash the system. "Singles' Day" demonstrated that Taobao and Tmall's transaction systems are world-class.

Thirdly, resolving client issues mainly involves using Alibaba's own technical strength to solve customer problems, such as the "Jushita" project, deploying the entire transaction process of Tmall and Taobao sellers on the AliCloud computing platform to ensure the stability of their transaction systems and thus guarantee the smooth completion of their transactions.

In the past, if the seller's IT infrastructure was weak, system exchanges of data might fail due to network issues, causing transactions to fail. During "Singles' Day," there was a saying: "20% of transactions were completed on the cloud," meaning that 20% of the transactions involved sellers whose backend systems were deployed on AliCloud. Previously, buyers would click to purchase and then pay, directing actions along two IT paths: one connecting to Alipay to ensure funds were available for payment, and the other entering the seller's ERP system to confirm inventory availability and reduce stock accordingly. By completely setting up the seller's ERP system on the cloud, the difficulty and significance go beyond merely solving website traffic expansion. This is because everything from invoice printing to shipping must pass through this system, where different ISVs (independent software developers) play a key role, illustrating the value of an ecosystem.

Sellers moving their ERPs to the cloud proves their trust in cloud computing. Our goal this year is to let 70%-80% of transactions be completed on the cloud. This isn't just about saving costs for customers but also helping sellers increase their chances of earning more. This reflects the essence: cloud computing is a reform that creates business value for customers far beyond the cost itself.