OpenAI Unveils Jalapeño Chip in Nine Months to Slash Inference Costs by 50 Percent

2026-06-25 21:39

Woofun AI reports that OpenAI has officially unveiled its first custom artificial intelligence chip, the Jalapeño, developed in partnership with Broadcom. This launch marks a decisive departure from reliance on third-party hardware, with the company aiming to deploy the processor by the end of this year. Hock Tan, CEO of Broadcom, confirmed in an interview with Reuters that the Jalapeño's performance metrics rival those of NVIDIA's Blackwell architecture and Google's TPU, a subsidiary of Alphabet. The chip has already undergone rigorous laboratory testing within the GPT-5.3-Codex-Spark AI model, where it successfully met targeted benchmarks for both power consumption and computational performance. This initiative represents the inaugural phase of a multi-generation chip development roadmap designed to secure OpenAI's long-term infrastructure needs.

The division of labor for this project delineates a precise separation of duties between the collaborating entities. OpenAI assumed responsibility for the chip's architectural design, tailoring the cores, service systems, and product requirements directly to its proprietary models. Broadcom managed the physical implementation of the silicon, alongside the critical networking and connectivity technologies required for high-speed data transmission. Canadian electronics manufacturer Celestica contributed specialized expertise in circuit board fabrication, rack assembly, and system integration, facilitating the transformation of the raw silicon into a fully functional server system ready for mass production. It is reported that OpenAI will subsequently entrust TSMC with the manufacturing of these chips, leveraging the foundry's advanced capabilities to ensure production scalability.

The rapid development cycle of the Jalapeño has sparked significant discourse within the industry. Some observers noted that the pace of change is astonishing, stating, 'A few years ago, I never would have imagined that AI companies would start designing their own chips.' This sentiment reflects a broader anxiety regarding the future roles of established hardware giants like NVIDIA and AMD as more AI laboratories begin developing custom hardware. The naming convention of the chip also drew attention, with one commentator observing, 'The fact that OpenAI named this chip Jalapeño, a type of chili pepper, speaks volumes about the intensity of the competition.' While some netizens created humorous memes regarding the moniker, others questioned the branding strategy, with one remarking, 'When it comes to naming, OpenAI might be the worst in history.' Richard Ho, head of OpenAI's hardware team, clarified the intent behind the name, explaining that the Jalapeño processor is engineered to work efficiently and quickly in conjunction with large models powering numerous AI applications. He stated, 'We believe it will perform well in all future iterations of these large models.'

Structurally, the Jalapeño is a chip specifically designed for large-model inference tasks rather than training. It is intended for deployment in scenarios such as ChatGPT, Codex, API services, and future Agentic products. The primary objectives are to achieve high throughput, low latency, and high energy efficiency in large-scale interactive AI applications. A netizen highlighted the economic imperative behind this move, commenting, 'Some software-based profit margins simply cannot be maintained at exajoule-level inference scales. To further reduce the cost per token, developing custom ASICs is an essential infrastructure transformation.' Richard Ho elaborated that the architectural optimizations of Jalapeño resulted from close collaboration between OpenAI and its research teams, incorporating a deep understanding of critical components in cutting-edge AI models, including cores, memory management, networking, and service mechanisms. Although OpenAI is still evaluating the final performance metrics, early tests indicate that Jalapeño can operate close to the theoretical limits of hardware performance under important workloads. This architecture reduces data transmission and balances computing, memory, and networking resources, resulting in higher actual utilization rates compared to traditional designs. Such an approach emphasizes real-world efficiency in large-model inference rather than simply stacking more hash rate.

Woofun AI data shows that the chip's heat dissipation performance exceeded initial expectations, leading OpenAI to classify it as an 'Intelligence Processor' rather than a mere 'AI accelerator.' From the initial design phase to the commencement of mass production, the development of Jalapeño took only nine months. OpenAI believes this represents one of the fastest ASIC development cycles in the field of high-performance advanced semiconductors. The rapid design process was attributed not only to the close collaboration between OpenAI's engineering team and Broadcom but also to Broadcom's extensive experience in the sector.

Furthermore, OpenAI utilized its own models in specific design and optimization steps, a strategy the company claims is helping to improve the infrastructure needed for its future operations. This indicates that AI companies are no longer merely users of chips but are becoming an integral part of the chip design process. OpenAI posits that if AI can help engineers develop better chips more quickly, it could reduce the overall computing costs of the industry and make advanced AI technologies more widely accessible. Previously, Hock Tan had mentioned that the Jalapeño accelerator could save approximately 50 percent in costs compared to typical AI graphics processing units.

The Jalapeño project is not a one-time initiative but marks the first step in OpenAI and Broadcom's joint efforts to build multiple generations of computing platforms. Broadcom expects that the first batch of chips will be commercially available by the end of this year in Microsoft and other partner organizations, although OpenAI notes that mass production will not begin until next year. OpenAI's long-term goal is to achieve 10 gigawatts of computing power using custom chips by 2029. 'This gives OpenAI full control over the entire ecosystem,' said Ho. OpenAI believes that the release of Jalapeño signifies its continued expansion of its full-stack platform capabilities, extending from products and models to the underlying hardware. Some netizens observed that 'Some of the key focus areas in the next round of AI competition may be infrastructure rather than just intelligent algorithms themselves.' Others drew a comparison between OpenAI's Jalapeño project and SpaceX's deal with Cursor, noting that although they seem like very different initiatives, they actually reflect the same underlying structural shift. Jalapeño represents control over the infrastructure that supports intelligent operations—chips, hash rate, and networking—while Cursor represents control over the 'workflow layer' where intelligence is actually applied. As one observer concluded, 'As the capabilities of cutting-edge models continue to improve, the competitive advantage is gradually shifting away from the models themselves. In the next decade, the companies that win in the AI race may not be those with the smartest models but those that can master the strongest 'technology stacks' surrounding those models.'

Greg Brockman, President and Co-founder of OpenAI, stated that 'The world is entering an economy driven by computing.' Jalapeño is part of OpenAI's long-term strategy to build a full-stack infrastructure, aiming to make computing power more readily available. This strategy seeks to make AI faster, more reliable, and more affordable for individuals and businesses, enabling it to solve more important problems. In OpenAI's view, the advantage of a full-stack approach lies in the ability to optimize different components together towards a common goal: making models faster, more reliable, and more cost-effective. Better infrastructure improves computing efficiency, which in turn supports better training and inference processes, leading to more powerful models and better products. As more people use these technologies, OpenAI can reinvest its revenue in the next generation of infrastructure, creating a virtuous cycle centered around computing power, models, products, and commercialization.

OpenAI's first chip product effectively avoided direct competition with companies like NVIDIA and Google by focusing on a specific segment of the market. Currently, it is clear that training and inference infrastructures are diverging. Many inference workloads are still running on infrastructure similar to that used for training.

However, as inference becomes more widespread, the demand for inference services will increase significantly, and it will gradually become the main source of computing power requirements. Compared to training, inference is more sensitive to cost, energy efficiency, and response time, and it is also easier to optimize hardware for specific use cases. Therefore, inference infrastructure is likely to move towards more specialized hardware. It is evident that OpenAI is focusing on this area. While training continues to rely on external chips such as those from NVIDIA, OpenAI has decided to develop its own inference chips for internal use first.

In contrast, NVIDIA's approach is not to use separate chips for training and inference but to use a general-purpose GPU architecture that can handle both tasks. For example, both Hopper and Blackwell GPUs can be used for both training and inference.

However, NVIDIA markets and packages certain products specifically for inference purposes. For instance, the Blackwell platform is clearly positioned as a large-model inference platform, claiming that the GB300 NVL72 chip can significantly reduce the cost per token in agentic inference scenarios and emphasizing 'AI inference at scale.' Similarly, Google's TPU is an ASIC designed specifically for matrix multiplication, tensor calculations, and Transformer-based deep learning tasks. The goal is to make tensor calculations more efficient and integrate them closely with Google's software stack, data centers, and model frameworks, thereby offering advantages in terms of cost, power consumption, and interconnectivity compared to general-purpose GPUs. Google also offers some products specifically for inference, such as the TPU v5e, which combines training and inference functions in one chip, and the v6e-8 configuration, which is optimized for inference and can handle multiple inference tasks with just eight chips.

The shift toward custom hardware has raised concerns among market participants. 'Once inference becomes your biggest cost factor, you no longer just rent chips; you start building your own. Everyone who is still renting out computing power should probably be a bit concerned right now,' said a netizen.

Additionally, whether OpenAI will make its future chip products publicly available could have an impact on companies like Groq, which claims to offer fast, low-cost inference services that never fail even in critical situations. Reuters first reported in 2023 that OpenAI was exploring the possibility of developing its own chips. OpenAI considered developing all its chips in-house and even raised funds for a massive project to build a network of chip manufacturing foundries.

However, due to the high costs and time required, the company has put this ambitious plan on hold and focused on its own chip design efforts instead.

Behind this decision is the fact that AI laboratories, including OpenAI, are facing a shortage of computing resources and are struggling to obtain enough capacity to run the latest and most powerful AI applications. As a result, some leading companies have turned to developing their own chips in order to reduce costs and provide alternatives to NVIDIA's GPUs, which are currently widely used in AI applications. Companies such as Meta Platforms, Amazon, and Google have also collaborated with companies like Broadcom and Marvell. These companies can provide specialized design services and intellectual property, which are often difficult to replicate internally. In April this year, Reuters reported that Anthropic was also considering developing its own AI chips. Undoubtedly, one of the most direct impacts of generative AI on the semiconductor industry is the surge in demand for CPUs, GPUs, and AI accelerators.

McKinsey estimates that by 2030, non-generative AI applications will generate approximately 15 million logic wafers, of which about 7 million will be produced using 3-nanometer or more advanced fabrication processes, and another 8 million will be produced using 3-nanometer or less advanced processes. Generative AI will additionally add another 1.2 million to 3.6 million logic wafers that require 3-nanometer or less advanced fabrication processes. According to current plans for logic wafer production, by 2030, the global market is expected to produce approximately 15 million logic wafers using 7-nanometer or less advanced fabrication processes. This means that generative AI could create a potential supply gap of 1 million to 4 million advanced logic wafers, especially in the 3-nanometer or less advanced fabrication segment. McKinsey estimates that to fill this gap, 3 to 9 new logic wafer factories may need to be built by 2030. Given the huge investment required, long construction periods, and complex equipment and supply chain considerations, this will be a critical issue that the semiconductor industry must address in advance.

On the training side, future architectures are likely to continue to follow the current high-performance cluster model, where servers in data centers are connected through high-bandwidth, low-latency networks. McKinsey noted that current mainstream high-performance generative AI servers typically consist of two CPUs and eight GPUs. By 2030, most training workloads will still use this CPU+GPU architecture. At the same time, GPUs and AI accelerators may also evolve towards system-level packaging designs and coexist with existing architectures for a long time. On the inference side, the situation will be different. By 2030, more AI servers dedicated to inference tasks are expected to use combinations of CPUs and multiple custom AI accelerators, most of which will be based on ASICs. Since ASICs can be optimized for specific AI tasks, they are expected to offer lower costs, higher energy efficiency, and better performance in large-scale inference scenarios.

It is worth mentioning that Hock Tan, CEO of Broadcom, revealed in an interview with Reuters that due to the surge in demand for memory driven by AI, Broadcom's profit margins on custom chips are not as high as those on some of its other chip products, such as network switching chips. Tan explained that AI chips require large amounts of high-bandwidth memory, which poses challenges to Broadcom's profit margins on custom AI chip products. He also mentioned that South Korean companies SK Hynix and Samsung Electronics supply memory chips to Broadcom. Generative AI has primarily driven demand for two types of DRAM: high-bandwidth HBM memory, which is used in GPUs or AI accelerators, and DDR memory, which is used in CPUs. HBM offers higher bandwidth and is an essential component for current AI training and high-performance inference tasks.

However, compared to DDR, HBM requires more silicon area to store the same amount of data, which also results in greater manufacturing challenges. SK Hynix is one of the biggest beneficiaries of the shortage in AI memory, but its HBM production capacity is highly strained, and its key customers have likely already reserved large quantities in advance. SK Hynix previously stated that all of its DRAM, HBM, and NAND flash memory products were sold out in 2026. Micron's latest earnings report also indicates that the overall shortage in AI memory supply may continue beyond 2027, suggesting that the industry as a whole is facing a shortage of HBM.

Currently, various companies are working to increase their memory capacities.However, increasing memory capacity is not straightforward and poses challenges for both hardware and software design. One of the most critical issues is the 'memory wall': the limited capacity and bandwidth of memory are becoming bottlenecks that restrict the overall performance of systems. Even if a computing chip itself has high peak performance, if data cannot be read, transmitted, and processed quickly enough, the overall system performance will still be limited. Currently, the industry is exploring various solutions. For example, static random access memory (SRAM) is used to increase the amount of near-computing memory available, but its widespread use is still limited due to high costs. At the same time, future algorithms may also reduce the amount of memory required for each inference task, thereby slowing down the overall growth in memory demand. Another uncertain factor relates to the architecture of AI accelerators. Compared to the CPU+GPU architecture, some AI accelerators may require less memory. As inference workloads increase, AI accelerators may become more popular by 2030, which could result in slower growth in memory demand than some initial projections suggest. This divergence in infrastructure strategy marks a pivotal moment where the competitive landscape shifts from model-centric to stack-centric dominance.

Disclaimer: Views are the author's own and do not represent the platform. Do not reproduce without permission. Content is for reference only, not investment advice. Trade at your own risk.

WOOFUN.AI — Your Smart Crypto Assistant. Reconstructing the crypto experience with smart technology. We simplify the complex, break professional barriers, and enable everyone to embrace the digital future with confidence, intelligence, and joy.

iOS

Google Play

Android Apk

Market Ecosystem Alpha Paradise Lost Ratings News News Flash Calendar Exchanges Wallets