Skip to main content
Research Hub

Computer Architecture Research

Stay current with breakthrough research and emerging trends. Explore cutting-edge papers from top-tier conferences and understand their practical implications.

11
Papers
7
Recent
8
Venues
5
Categories
Developer Note

My favorite page here, paper to PR, ready to read, quickly.

Showing 11 of 11 papers

2026

3 papersRecent
OtherRecent5 insights
arXiv preprint
2026

Extracting books from production language models

Ahmed Ahmed, A. Feder Cooper, Sanmi Koyejo +1 more

This paper demonstrates that it is feasible to extract large portions of copyrighted books from four production LLMs (Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3) using a two-phase procedure involving initial probes and iterative continuation prompts. The authors successfully extracted near-verbatim text from in-copyright books, with Claude 3.7 Sonnet yielding up to 95.8% of Harry Potter and the Sorcerer's Stone, highlighting ongoing challenges with safeguards against training data leakage.

Impact: This research has significant implications for copyright law, AI safety, and the deployment of production LLMs. It demonstrates that current model-level and system-level safeguards are insufficient to prevent extraction of memorized copyrighted training data, even in commercial systems designed with safety measures. The findings are directly relevant to ongoing copyright litigation involving AI companies and may inform future policy decisions about fair use, training data transparency, and the legal treatment of memorized content in AI models. For AI developers, this work highlights the need for more robust safeguards against training data leakage. The relatively low cost of extraction (sometimes under $10) makes this a practical concern for content creators and copyright holders. The research also contributes to the broader understanding of memorization in large language models and provides methodologies for measuring extraction that account for near-verbatim rather than exact matches.

13 min read
Computer ArchitectureRecent5 insights
NVIDIA Developer Blog
2026

Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

Kyle Aubrey

The NVIDIA Rubin platform introduces a rack-scale AI supercomputer architecture built on six co-designed chips (Vera CPU, Rubin GPU, NVLink 6 switch, ConnectX-9, BlueField-4 DPU, and Spectrum-6 Ethernet switch) optimized for continuous AI factory operations. The platform delivers extreme co-design across compute, networking, power delivery, and cooling to enable sustained intelligence production at scale, achieving 10x higher inference throughput and 10x lower cost per token compared to previous generations.

Impact: The Rubin platform addresses the critical challenge of scaling AI factories that continuously convert power, silicon, and data into intelligence for applications like business planning, market analysis, and agentic reasoning. By treating the rack as a coherent machine and co-designing all components, it enables predictable performance and economics in production deployments. The platform's extreme co-design approach delivers practical benefits including reduced GPU count requirements for training, dramatically improved inference throughput for long-context and reasoning workloads, and lower operational costs through improved power efficiency and reliability. The architecture supports secure multi-tenant operations and enables AI factories to scale across geographically dispersed data centers while maintaining deterministic performance.

14 min read
Computer ArchitectureRecent5 insights
ISSCC 2026
2026

ISSCC 2026 — Session 10 Digital Processing and Circuit Techniques Comprehensive Research Report on Four Key Papers

Renesas Electronics Corporation, Qualcomm, Xi Chen +11 more

This report analyzes four papers from ISSCC 2026 Session 10, covering advances in automotive chiplet SoCs with ASIL D safety, dual-edge clock architectures for 40% power reduction, ML-based proactive voltage droop mitigation, and 3D hybrid-bonded DNN processors. The papers collectively represent state-of-the-art innovations in digital circuit design, power management, and heterogeneous integration.

Impact: These innovations address critical industry challenges in automotive safety, power efficiency, and AI acceleration. The Renesas chiplet architecture enables cost-effective scaling of software-defined vehicles with certified safety, already adopted by Bosch and ZF. Qualcomm's dual-edge clocking can reduce processor power consumption by 40%, directly improving battery life in mobile devices and thermal headroom in data centers. Northwestern's ML-based droop mitigation enables tighter voltage margins, translating to 9-10% performance or efficiency gains. Intel's 3D hybrid bonding demonstrates a path to continued compute density scaling beyond 2D limitations, critical for AI workloads. Together, these advances enable more efficient, safer, and more powerful computing systems across automotive, mobile, and data center domains.

17 min read

2025

4 papersRecent
OtherRecent5 insights
arXiv preprint
2025

Early science acceleration experiments with GPT-5

Sébastien Bubeck, Christian Coester, Ronen Eldan +11 more

This paper presents case studies demonstrating how GPT-5 accelerated scientific research across mathematics, physics, astronomy, computer science, biology, and materials science. The authors document concrete examples where GPT-5 contributed to research progress, including four new mathematical results, while highlighting both the capabilities and limitations of frontier AI in scientific discovery.

Impact: This work demonstrates that frontier AI models like GPT-5 can substantially accelerate scientific discovery across multiple disciplines. The practical applications include: faster literature searches that overcome vocabulary barriers between fields, rapid hypothesis generation and mechanistic reasoning in biology and immunotherapy, automated verification of mathematical conjectures, and assistance in solving decades-old open problems. The paper shows that GPT-5 can compress months of research reasoning into minutes while maintaining scientific rigor when properly guided by experts. However, it also highlights important limitations including potential attribution errors and the continued necessity of human verification, providing crucial guidance for researchers on how to effectively collaborate with AI while maintaining high scientific standards.

14 min read
Computer ArchitectureRecent5 insights
arXiv preprint
2025

HipKittens: Fast and Furious AMD Kernels

William Hu, Drew Wadsworth, Sean Siddens +6 more

HipKittens is a C++ embedded domain-specific language that provides tile-based programming primitives for high-performance AI kernel development on AMD GPUs. The framework introduces novel scheduling patterns (8-wave ping-pong and 4-wave interleave), explicit register management, and chiplet-aware cache optimization to achieve performance competitive with or exceeding hand-optimized assembly kernels across diverse AI workloads.

Impact: HipKittens addresses the critical software gap limiting AMD GPU adoption in AI workloads, often called the 'CUDA moat.' By providing accessible C++ programming primitives, it enables developers to write high-performance AMD kernels without resorting to raw assembly. The framework achieves 1.2-10× speedups over existing baselines in various settings and matches AMD's hand-optimized assembly kernels across key operations like GEMM and attention. This work is particularly impactful for democratizing AI hardware access, as AMD MI355X GPUs offer competitive or superior specifications to NVIDIA alternatives (2.5 PFLOPs BF16, 8 TB/s bandwidth, 288 GB memory). The open-source release enables the AI community to leverage diverse hardware platforms, potentially breaking vendor lock-in and accelerating AI development through increased compute availability.

11 min read
Computer ArchitectureRecent5 insights
NeurIPS 2025
2025

Nested Learning: The Illusion of Deep Learning Architectures

Ali Behrouz, Meisam Razaviyayn, Peiling Zhong +1 more

This paper introduces Nested Learning (NL), a new learning paradigm that represents models as nested, multi-level optimization problems with distinct context flows. NL reveals that deep learning methods compress their context flow and explains in-context learning emergence, leading to three contributions: Deep Optimizers (showing gradient-based optimizers are associative memory modules), Self-Modifying Titans (a sequence model that learns its own update algorithm), and a Continuum Memory System with the HOPE architecture.

Impact: Nested Learning provides a new theoretical framework for understanding and designing machine learning models with enhanced continual learning capabilities. The HOPE architecture demonstrates practical improvements over Transformers and modern recurrent networks in language modeling tasks, achieving better perplexity and accuracy on benchmark datasets. The framework addresses the static nature of Large Language Models after deployment by enabling online memory consolidation, similar to human brain processes. This has significant implications for developing AI systems that can continually acquire new knowledge beyond their immediate context window, potentially reducing the need for expensive retraining and enabling more adaptive AI systems in production environments.

10 min read
OtherRecent5 insights
arXiv preprint
2025

PAN: A World Model for General, Interactable, and Long-Horizon World Simulation

PAN Team, Institute of Foundation Models

PAN is a general-purpose world model that predicts future world states through high-quality video simulation conditioned on history and natural language actions. It employs a Generative Latent Prediction (GLP) architecture combining an LLM-based autoregressive latent dynamics backbone with a video diffusion decoder to achieve unified latent space reasoning and realizable world dynamics.

Impact: PAN enables practical applications in robotics, autonomous systems, and AI planning by providing a general-purpose simulator that can predict future world states based on natural language actions. The model's ability to maintain long-horizon consistency and support interactive simulation makes it valuable for training robotic policies, testing autonomous vehicle scenarios, generating synthetic training data, and enabling agents to perform 'thought experiments' before taking real-world actions. Its open-domain generalization allows deployment across diverse environments without domain-specific retraining, while the natural language action interface makes it accessible for human-in-the-loop applications and planning systems.

10 min read

2024

1 paper
InferenceOpt4 insights
ICML
2024

KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation

Minsik Cho, Mohammad Rastegari, Devang Naik

Novel parallelization scheme that accelerates LLM prompt phase by dual-purposing KV-cache for parallel generation, achieving 1.4× and 1.6× speedups for Llama 7B and Falcon 7B with asynchronous communication and context-level load-balancing.

Impact: Directly reduces time-to-first-token (TTFT) in production LLM serving systems, enabling better user experience for long-context applications like RAG, summarization, and in-context learning.

2023

2 papers
GPU3 insights
MICRO
2023

Dynamic Warp Scheduling for Improved GPU Utilization

Alex Thompson, Dr. Priya Patel, James Liu +1 more

Machine learning-based warp scheduler that adapts to workload characteristics, achieving 15-25% performance improvements across diverse GPU workloads.

Impact: Directly applicable to next-generation GPU architectures, with major vendors expressing interest in the approach for future products.

CPU3 insights
ISCA
2023

Scalable Cache Coherence for Manycore Processors

Sarah Chen, Michael Rodriguez, Dr. Lisa Wang

Novel directory-based coherence protocol that reduces memory overhead by 60% while maintaining performance in 256-core systems.

Impact: Enables cost-effective scaling to 256+ cores without prohibitive directory memory overhead, directly applicable to datacenter processors.

2017

1 paper
Computer Architecture5 insights
NIPS 2017
2017

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar +5 more

This paper introduces the Transformer, a novel neural network architecture based entirely on attention mechanisms, eliminating the need for recurrence and convolutions. The model achieves state-of-the-art results on machine translation tasks (28.4 BLEU on WMT 2014 English-to-German) while being significantly more parallelizable and requiring less training time than previous approaches.

Impact: The Transformer architecture has revolutionized natural language processing and beyond, becoming the foundation for modern large language models like BERT, GPT, and their successors. Its parallel processing capabilities enable efficient training on modern GPU hardware, reducing computational costs and training time significantly. The architecture's success in machine translation demonstrated that attention mechanisms alone could outperform recurrent networks, leading to widespread adoption across various domains including computer vision, speech recognition, and protein folding. The model's interpretability through attention visualizations has also provided insights into how neural networks process sequential data, influencing both research and production systems in industry.

11 min read

Stay Ahead of the Curve

Computer architecture is rapidly evolving. Our research summaries help you understand the latest breakthroughs and their practical implications for system design.

Top-Tier VenuesPractical InsightsIndustry Impact