Agent Papers

Updated on 2026.04.29

Publish Date	Title	Authors	PDF
2026-04-28	AOI: Context-Aware Multi-Agent Operations via Dynamic Scheduling and Hierarchical Memory Compression	Zishan Bai et.al.	2512.13956
2026-04-28	From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills	Qiliang Liang et.al.	2604.24026
2026-04-28	CacheFlow: Efficient LLM Serving with 3D-Parallel KV Cache Restoration	Sean Nian et.al.	2604.25080
2026-04-27	Kwai Summary Attention Technical Report	Chenglong Chu et.al.	2604.24432
2026-04-27	PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference	Ishan Patel et.al.	2604.24971
2026-04-27	Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate	John Seon Keun Yi et.al.	2604.24881_(ACL)
2026-04-26	JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training	Zhengding Hu et.al.	2604.23838
2026-04-25	KLong: Training LLM Agent for Extremely Long-horizon Tasks	Yue Liu et.al.	2602.17547_(DIS)
2026-04-23	From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation	Bartosz Balis et.al.	2604.21910
2026-04-22	Sutradhara: An Intelligent Orchestrator-Engine Co-design for Tool-based Agentic Inference	Anish Biswas et.al.	2601.12967
2026-04-22	FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation	Yinpeng Wu et.al.	2603.09046
2026-04-22	Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms	Ari Azarafrooz et.al.	2604.21131
2026-04-20	AIT Academy: Cultivating the Complete Agent with a Confucian Three-Domain Curriculum	Jiaqi Li et.al.	2604.17989
2026-04-19	Beyond Static Snapshots: A Grounded Evaluation Framework for Language Models at the Agentic Frontier	Jazmia Henry et.al.	2604.17573
2026-04-19	EmbodiedHead: Real-Time Listening and Speaking Avatar for Conversational Agents	Yu Zhang et.al.	2604.17211
2026-04-19	Hive: A Multi-Agent Infrastructure for Algorithm- and Task-Level Scaling	Zizhang Luo et.al.	2604.17353
2026-04-18	HiveMind: OS-Inspired Scheduling for Concurrent LLM Agent Workloads	Justice Owusu Agyemang et.al.	2604.17111
2026-04-17	CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling	Runsong Zhao et.al.	2602.01766_(ACL)
2026-04-16	LLMOrbit: A Circular Taxonomy of Large Language Models -From Scaling Walls to Agentic AI Systems	Badri N. Patro et.al.	2601.14053
2026-04-16	Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines	Marcel Wagenländer et.al.	2604.15186
2026-04-16	ARGUS: Agentic GPU Optimization Guided by Data-Flow Invariants	Haohui Mai et.al.	2604.18616
2026-04-15	AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent	Wenyue Hua et.al.	2604.06296
2026-04-14	When Less Latent Leads to Better Relay: Information-Preserving Compression for Latent Multi-Agent LLM Collaboration	Yiping Li et.al.	2604.13349
2026-04-13	FlashMem: Distilling Intrinsic Latent Memory via Computation Reuse	Yubo Hou et.al.	2601.05505
2026-04-13	MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens	Yu Chen et.al.	2603.23516
2026-04-13	From Agent Loops to Structured Graphs:A Scheduler-Theoretic Framework for LLM Agent Execution	Hu Wei et.al.	2604.11378
2026-04-13	MAFIG: Multi-agent Driven Formal Instruction Generation Framework	Shixing Zhao et.al.	2604.10989
2026-04-13	ProbeLogits: Kernel-Level LLM Inference Primitives for AI-Native Operating Systems	Daeyeon Son et.al.	2604.11943
2026-04-11	CodeComp: Structural KV Cache Compression for Agentic Coding	Qiujiang Chen et.al.	2604.10235
2026-04-11	WaterAdmin: Orchestrating Community Water Distribution Optimization via AI Agents	Jiaqi Wen et.al.	2604.10343
2026-04-09	Quine: Realizing LLM Agents as Native POSIX Processes	Hao Ke et.al.	2603.18030_(ICS)
2026-04-09	B-PASTE: Beam-Aware Pattern-Guided Speculative Execution for Resource-Constrained LLM Agents	Yanfei Song et.al.	2604.16469
2026-04-08	LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification	Penghui Yang et.al.	2502.17421_(ACL)
2026-04-08	PEARL: Self-Evolving Assistant for Time Management with Reinforcement Learning	Bingxuan Li et.al.	2601.11957
2026-04-08	The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project	Huamin Chen et.al.	2603.21354
2026-04-08	ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces	Xiangyi Li et.al.	2604.05172
2026-04-07	AutoSOTA: An End-to-End Automated Research System for State-of-the-Art AI Model Discovery	Yu Li et.al.	2604.05550
2026-04-07	RL-VLA$^3$: A Flexible and Asynchronous Reinforcement Learning Framework for VLA Training	Haoran Sun et.al.	2602.05765
2026-04-07	ForkKV: Scaling Multi-LoRA Agent Serving via Copy-on-Write Disaggregated KV Cache	Shao Wang et.al.	2604.06370
2026-04-07	Learning to Interrupt in Language-based Multi-agent Communication	Danqing Wang et.al.	2604.06452
2026-04-06	RoboPhD: Evolving Diverse Complex Agents Under Tight Evaluation Budgets	Andrew Borthwick et.al.	2604.04347
2026-04-05	Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty	Haomiaomiao Wang et.al.	2604.04182
2026-04-04	LightThinker++: From Reasoning Compression to Memory Management	Yuqi Zhu et.al.	2604.03679
2026-04-03	Glia: A Human-Inspired AI for Automated Systems Design and Optimization	Pouya Hamadanian et.al.	2510.27176
2026-04-03	TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing	Zhuohang Bian et.al.	2604.03143
2026-04-03	Let’s Have a Conversation: Designing and Evaluating LLM Agents for Interactive Optimization	Joshua Drossman et.al.	2604.02666
2026-04-01	Universal YOCO for Efficient Depth Scaling	Yutao Sun et.al.	2604.01220
2026-03-31	KAIJU: An Executive Kernel for Intent-Gated Execution of LLM Agents	Cormac Guerin et.al.	2604.02375
2026-03-30	Heddle: A Distributed Orchestration System for Agentic RL Rollout	Zili Zhang et.al.	2603.28101
2026-03-30	Multi-Agent Home Energy Management Assistant	Wooyoung Jung et.al.	2602.15219
2026-03-30	CSAttention: Centroid-Scoring Attention for Accelerating LLM Inference	Chuxu Song et.al.	2604.08584
2026-03-28	Simulating Human Cognition: Heartbeat-Driven Autonomous Thinking Activity Scheduling for LLM-based AI systems	Hong Su et.al.	2604.14178
2026-03-26	Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Optimization	Shaoliang Yang et.al.	2603.25099
2026-03-25	LMetric: Simple is Better - Multiplication May Be All You Need for LLM Request Scheduling	Dingyan Zhang et.al.	2603.15202
2026-03-25	Language-Grounded Multi-Agent Planning for Personalized and Fair Participatory Urban Sensing	Xusen Guo et.al.	2603.24014
2026-03-23	Chimera: Latency- and Performance-Aware Multi-agent Serving for Heterogeneous LLMs	Kangqi Ni et.al.	2603.22206
2026-03-23	AI Co-Scientist for Ranking: Discovering Novel Search Ranking Models alongside LLM-based AI Agents with Cloud Computing Access	Liwei Wu et.al.	2603.22376_(ACL)
2026-03-22	Improving Coherence and Persistence in Agentic AI for System Optimization	Pantea Karimi et.al.	2603.21321
2026-03-22	CALVO: Improve Serving Efficiency for LLM Inferences with Intense Network Demands	Weiye Wang et.al.	2603.21257
2026-03-18	IEMAS: An Incentive-Efficiency Routing Framework for Open Agentic Web Ecosystems	Hongze Liu et.al.	2603.17302
2026-03-17	EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval	Zebin Yang et.al.	2510.18546_(NeurIPS)
2026-03-17	Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective	Noppanat Wadlom et.al.	2603.16104
2026-03-17	MetaClaw: Just Talk – An Agent That Meta-Learns and Evolves in the Wild	Peng Xia et.al.	2603.17187
2026-03-15	OxyGen: Unified KV Cache Management for Vision-Language-Action Models under Multi-Task Parallelism	Xiangyu Li et.al.	2603.14371
2026-03-14	ATCC: Adaptive Concurrency Control for Unforeseen Agentic Transactions	Weixing Zhou et.al.	2603.13906
2026-03-14	Retrieve, Schedule, Reflect: LLM Agents for Chip QoR Optimization	Yikang ouyang et.al.	2603.13767
2026-03-14	Justitia: Fair and Efficient Scheduling of Task-parallel LLM Agents with Selective Pampering	Mingyan Yang et.al.	2510.17015
2026-03-13	AgentRM: An OS-Inspired Resource Manager for LLM Agent Systems	Jianshu She et.al.	2603.13110
2026-03-13	ARL-Tangram: Unleash the Resource Efficiency in Agentic Reinforcement Learning	Bangjun Xiao et.al.	2603.13019
2026-03-13	Orla: A Library for Serving LLM-Based Multi-Agent Systems	Rana Shahout et.al.	2603.13605
2026-03-12	Slow-Fast Inference: Training-Free Inference Acceleration via Within-Sentence Support Stability	Xingyu Xie et.al.	2603.12038
2026-03-12	Characterizing Performance-Energy Trade-offs of Large Language Models in Multi-Request Workflows	Md. Monzurul Amin Ifath et.al.	2604.09611
2026-03-11	Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents	Kaiyu Zhou et.al.	2601.10955
2026-03-11	BLITZRANK: Principled Zero-shot Ranking Agents with Tournament Graphs	Sheshansh Agrawal et.al.	2602.05448
2026-03-10	ThunderAgent: A Simple, Fast and Program-Aware Agentic Inference System	Hao Kang et.al.	2602.13692
2026-03-09	Advancing Automated Algorithm Design via Evolutionary Stagewise Design with LLMs	Chen Lu et.al.	2603.07970
2026-03-09	Not All Prefills Are Equal: PPD Disaggregation for Multi-turn LLM Serving	Zongze Li et.al.	2603.13358
2026-03-09	Can LLMs Perceive Time? An Empirical Investigation	Aniketh Garikaparthi et.al.	2604.00010_(ICLR)
2026-03-06	Characterizing Faults in Agentic AI: A Taxonomy of Types, Symptoms, and Root Causes	Mehil B Shah et.al.	2603.06847
2026-03-05	PerfGuard: A Performance-Aware Agent for Visual Content Generation	Zhipeng Chen et.al.	2601.22571_(FG)
2026-03-04	ChatNeuroSim: An LLM Agent Framework for Automated Compute-in-Memory Accelerator Deployment and Optimization	Ming-Yen Lee et.al.	2603.08745
2026-03-03	ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution	Liu Yang et.al.	2603.02510
2026-03-03	CraniMem: Cranial Inspired Gated and Bounded Memory for Agentic Systems	Pearl Mody et.al.	2603.15642
2026-03-02	SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents	Gyuhyeon Seo et.al.	2509.24282_(ICLR)
2026-03-02	Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents	Heyang Gao et.al.	2510.03253_(ICLR)
2026-03-02	HeRo: Adaptive Orchestration of Agentic RAG on Heterogeneous Mobile SoC	Maoliang Li et.al.	2603.01661_(DAC)
2026-03-01	MagicAgent: Towards Generalized Agent Planning	Xuhui Ren et.al.	2602.19000
2026-02-28	ZeroDVFS: Zero-Shot LLM-Guided Core and Frequency Allocation for Embedded Platforms	Mohammad Pivezhandi et.al.	2601.08166
2026-02-28	RelayCaching: Accelerating LLM Collaboration via Decoding KV Cache Reuse	Yingsheng Geng et.al.	2603.13289
2026-02-27	Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis	Pengfei Zhang et.al.	2602.15909_(ICLR)
2026-02-27	SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning	Sanjay Kariyappa et.al.	2602.22603
2026-02-27	Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression	Bowen Zhou et.al.	2603.00188
2026-02-27	Verifier-Bound Communication for LLM Agents: Certified Bounds on Covert Signaling	Om Tailor et.al.	2603.00381
2026-02-27	ICaRus: Identical Cache Reuse for Efficient Multi Model Inference	Sunghyeon Woo et.al.	2603.13281
2026-02-26	DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference	Yongtong Wu et.al.	2602.21548
2026-02-25	Two-Stage Active Distribution Network Voltage Control via LLM-RL Collaboration: A Hybrid Knowledge-Data-Driven Approach	Xu Yang et.al.	2602.21715
2026-02-24	Towards Efficient Agents: A Co-Design of Inference Architecture and System	Weizhe Lin et.al.	2512.18337
2026-02-24	Hierarchical Decision Mamba Meets Agentic AI: A Novel Approach for RAN Slicing in 6G	Md Arafat Habib et.al.	2512.23502_(Networking)
2026-02-24	ST-EVO: Towards Generative Spatio-Temporal Evolution of Multi-Agent Communication Topologies	Xingjian Wu et.al.	2602.14681
2026-02-24	Architecting AgentOS: From Token-Level Context to Emergent System-Level Intelligence	ChengYou Li et.al.	2602.20934
2026-02-23	ContextPilot: Fast Long-Context Inference via Context Reuse	Yinsicheng Jiang et.al.	2511.03475
2026-02-22	KVComm: Enabling Efficient LLM Communication through Selective KV Sharing	Xiangyu Shi et.al.	2510.03346_(ICLR)
2026-02-20	Aurora: Neuro-Symbolic AI Driven Advising Agent	Lorena Amanda Quincoso Lugones et.al.	2602.17999
2026-02-20	Alignment in Time: Peak-Aware Orchestration for Long-Horizon Agentic Systems	Hanjing Shi et.al.	2602.17910
2026-02-19	Position: AI Agents Are Not (Yet) a Panacea for Social Simulation	Yiming Li et.al.	2603.00113
2026-02-19	ARKV: Adaptive and Resource-Efficient KV Cache Management under Limited Memory Budget for Long-Context Inference in LLMs	Jianlong Lei et.al.	2603.08727
2026-02-17	Agent Memory Below the Prompt: Persistent Q4 KV Cache for Multi-Agent LLM Inference on Edge Devices	Yakov Pyotr Shkolnikov et.al.	2603.04428
2026-02-16	OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety	Sanidhya Vijayvargiya et.al.	2507.06134_(ICLR)
2026-02-16	Efficient Multi-round LLM Inference over Disaggregated Serving	Wenhao He et.al.	2602.14516
2026-02-15	HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling	Xiaochen Zhao et.al.	2602.13933
2026-02-12	PrefillShare: A Shared Prefill Module for KV Reuse in Multi-LLM Disaggregated Serving	Sunghyeon Woo et.al.	2602.12029
2026-02-12	Deep Kernel Fusion for Transformers	Zixi Zhang et.al.	2602.11808
2026-02-12	IMAGAgent: Orchestrating Multi-Turn Image Editing via Constraint-Aware Planning and Reflection	Fei Shen et.al.	2603.29602
2026-02-10	Learning to Evict from Key-Value Cache	Luca Moschella et.al.	2602.10238
2026-02-10	ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue	Ruike Cao et.al.	2603.02216_(ICLR)
2026-02-09	Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems	Lang Feng et.al.	2602.08847
2026-02-08	DeltaKV: Residual-Based KV Cache Compression via Long-Range Similarity	Jitai Hao et.al.	2602.08005
2026-02-06	Lemon Agent Technical Report	Haipeng Jiang et.al.	2602.07092
2026-02-05	AriadneMem: Threading the Maze of Lifelong Memory for LLM Agents	Wenhui Zhu et.al.	2603.03290
2026-02-03	Cortex: Achieving Low-Latency, Cost-Efficient Remote Data Access For LLM via Semantic-Aware Knowledge Caching	Chaoyi Ruan et.al.	2509.17360
2026-02-03	ENGRAM: Effective, Lightweight Memory Orchestration for Conversational Agents	Daivik Patel et.al.	2511.12960
2026-02-03	LegalOne: A Family of Foundation Models for Reliable Legal Reasoning	Haitao Li et.al.	2602.00642
2026-02-03	Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems	Haibo Jin et.al.	2602.03695
2026-02-02	AgentRAN: An Agentic AI Architecture for Autonomous Control of Open 6G Networks	Maxime Elkael et.al.	2508.17778
2026-02-02	Implicit Bias in LLMs for Transgender Populations	Micaela Hirsch et.al.	2602.13253
2026-02-01	PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing	Junyi Hou et.al.	2512.02589
2026-02-01	LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents	Hyesung Jeon et.al.	2602.01053
2026-01-31	FastTTS: Accelerating Test-Time Scaling for Edge LLM Reasoning	Hao Mark Chen et.al.	2509.00195_(ASPLOS)
2026-01-30	Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live	Hanchen Li et.al.	2511.02230
2026-01-30	Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models	Elias Hossain et.al.	2510.17098
2026-01-30	CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control	Qiaoling Chen et.al.	2601.22705
2026-01-29	Emergent Coordination in Multi-Agent Systems via Pressure Fields and Temporal Decay	Roland Rodriguez et.al.	2601.08129
2026-01-29	CORE:Toward Ubiquitous 6G Intelligence Through Collaborative Orchestration of Large Language Model Agents Over Hierarchical Edge	Zitong Yu et.al.	2601.21822
2026-01-29	Heterogeneous Computing: The Key to Powering the Future of AI Agent Inference	Yiren Zhao et.al.	2601.22001
2026-01-27	GTA: Generative Traffic Agents for Simulating Realistic Mobility Behavior	Simon Lämmer et.al.	2601.16778_(CHI)
2026-01-21	IB-GRPO: Aligning LLM-based Learning Path Recommendation with Educational Objectives via Indicator-Based Group Relative Policy Optimization	Shuai Wang et.al.	2601.14686
2026-01-20	IGAA: Intent-Driven General Agentic AI for Edge Services Scheduling using Generative Meta Learning	Yan Sun et.al.	2601.13702
2026-01-19	Batch Query Processing and Optimization for Agentic Workflows	Junyi Shen et.al.	2509.02121
2026-01-19	IntAgent: NWDAF-Based Intent LLM Agent Towards Advanced Next Generation Networks	Abdelrahman Soliman et.al.	2601.13114
2026-01-18	Rethinking the Value of Multi-Agent Workflow: A Strong Single Agent Baseline	Jiawei Xu et.al.	2601.12307
2026-01-16	AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems	Weiyi Wang et.al.	2601.11354
2026-01-13	StackPilot: Autonomous Function Agents for Scalable and Environment-Free Code Execution	Xinkui Zhao et.al.	2508.11665
2026-01-13	When KV Cache Reuse Fails in Multi-Agent Systems: Cross-Candidate Interaction is Crucial for LLM Judges	Sichu Liang et.al.	2601.08343
2026-01-13	Unleashing Tool Engineering and Intelligence for Agentic AI in Next-Generation Communication Networks	Yinqiu Liu et.al.	2601.08259
2026-01-12	OpenTinker: Separating Concerns in Agentic Reinforcement Learning	Siqi Zhu et.al.	2601.07376
2026-01-11	RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction	Haonan Bian et.al.	2601.06966
2026-01-08	Nalar: An agent serving framework	Marco Laju et.al.	2601.05109
2026-01-07	NEMO-4-PAYPAL: Leveraging NVIDIA’s Nemo Framework for empowering PayPal’s Commerce Agent	Sudhanshu Garg et.al.	2512.21578
2026-01-06	Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC	Xinming Wei et.al.	2506.24045
2026-01-03	Chimera: Harnessing Multi-Agent LLMs for Automatic Insider Threat Simulation	Jiongchi Yu et.al.	2508.07745_(NDSS)
2026-01-03	Warp-Cortex: An Asynchronous, Memory-Efficient Architecture for Million-Agent Cognitive Scaling on Consumer Hardware	Jorge L. Ruiz Williams et.al.	2601.01298
2026-01-03	ReliabilityBench: Evaluating LLM Agent Reliability Under Production-Like Stress Conditions	Aayush Gupta et.al.	2601.06112
2025-12-31	Context-aware LLM-based AI Agents for Human-centered Energy Management Systems in Smart Buildings	Tianzhi He et.al.	2512.25055
2025-12-30	SmartFlow Reinforcement Learning and Agentic AI for Bike-Sharing Optimisation	Aditya Sreevatsa K et.al.	2601.00868
2025-12-28	Accelerating Language Model Workflows with Prompt Choreography	TJ Bai et.al.	2512.23049_(ACL)
2025-12-27	Agentic Auto-Scheduling: An Experimental Study of LLM-Guided Loop Optimization	Massinissa Merouani et.al.	2511.00592_(CHI)
2025-12-27	AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent	Haipeng Luo et.al.	2512.20745
2025-12-24	V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval	Donghyuk Kim et.al.	2512.12284_(HPCA)
2025-12-22	JITServe: SLO-aware LLM Serving with Imprecise Request Information	Wei Zhang et.al.	2504.20068
2025-12-21	IntelliCode: A Multi-Agent LLM Tutoring System with Centralized Learner Modeling	Jones David et.al.	2512.18669_(ACL)
2025-12-21	Does It Tie Out? Towards Autonomous Legal Agents in Venture Capital	Pierre Colombo et.al.	2512.18658
2025-12-18	MEPIC: Memory Efficient Position Independent Caching for LLM Serving	Qian Wang et.al.	2512.16822
2025-12-16	SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching	Xinye Zhao et.al.	2509.24832
2025-12-16	Astraea: A State-Aware Scheduling Engine for LLM-Powered Agents	Hongqiu Ni et.al.	2512.14142
2025-12-16	Model-First Reasoning LLM Agents: Reducing Hallucinations through Explicit Problem Modeling	Annu Rana et.al.	2512.14474
2025-12-15	Beyond Training: Enabling Self-Evolution of Agents with MOBIMEM	Zibin Liu et.al.	2512.15784
2025-12-12	Aragog: Just-in-Time Model Routing for Scalable Serving of Agentic Workflows	Yinwei Dai et.al.	2511.20975
2025-12-08	EvoMem: Improving Multi-Agent Planning with Dual-Evolving Memory	Wenzhe Fan et.al.	2511.01912
2025-12-07	Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning	Yulei Qin et.al.	2509.22601
2025-12-05	MARINE: Theoretical Optimization and Design for Multi-Agent Recursive IN-context Enhancement	Hongwei Zhang et.al.	2512.07898
2025-12-03	KVNAND: Efficient On-Device Large Language Model Inference Using DRAM-Free In-Flash Computing	Lishuo Deng et.al.	2512.03608
2025-12-02	DynTaskMAS: A Dynamic Task Graph-driven Framework for Asynchronous and Parallel LLM-based Multi-Agent Systems	Junwei Yu et.al.	2503.07675
2025-11-29	A CPU-Centric Perspective on Agentic AI	Ritik Raj et.al.	2511.00739
2025-11-28	Beyond Curve Fitting: Neuro-Symbolic Agents for Context-Aware Epidemic Forecasting	Joongwon Chae et.al.	2511.23276
2025-11-28	LegalWebAgent: Empowering Access to Justice via LLM-Based Web Agents	Jinzhe Tan et.al.	2512.04105
2025-11-27	Optimizing NetGPT via Routing-Based Synergy and Reinforcement Learning	Yuxuan Chen et.al.	2511.22217
2025-11-27	Q-KVComm: Efficient Multi-Agent Communication Via Adaptive KV Cache Compression	Boris Kriuk et.al.	2512.17914
2025-11-25	Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design	Zixiao Huang et.al.	2511.20048
2025-11-25	Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation	Inferix Team et.al.	2511.20714
2025-11-23	Hybrid Agentic AI and Multi-Agent Systems in Smart Manufacturing	Mojtaba A. Farahani et.al.	2511.18258
2025-11-18	KnowCoder-A1: Incentivizing Agentic Reasoning Capability with Outcome Supervision for KBQA	Zhuo Chen et.al.	2510.25101
2025-11-05	ALAS: Transactional and Dynamic Multi-Agent LLM Planning	Longling Geng et.al.	2511.03094
2025-11-05	FREESH: Fair, Resource- and Energy-Efficient Scheduling for LLM Serving on Heterogeneous GPUs	Xuan He et.al.	2511.00807_(ISS)
2025-11-04	LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context	Yudong Li et.al.	2511.02366
2025-11-04	Optimal-Agent-Selection: State-Aware Routing Framework for Efficient Multi-Agent Collaboration	Jingbo Wang et.al.	2511.02200
2025-11-04	Using Span Queries to Optimize for Cache and Attention Locality	Paul Castro et.al.	2511.02749
2025-11-03	Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving	Chengying Huan et.al.	2511.01633
2025-11-03	TPS-Bench: Evaluating AI Agents’ Tool Planning \& Scheduling Abilities in Compounding Tasks	Hanwen Xu et.al.	2511.01527
2025-11-02	HEXGEN-FLOW: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL	You Peng et.al.	2505.05286
2025-11-01	KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems	Hancheng Ye et.al.	2510.12872_(FAST)
2025-10-31	Tokencake: A KV-Cache-centric Serving Framework for LLM-based Multi-Agent Applications	Zhuohang Bian et.al.	2510.18586
2025-10-30	Agentic AI Home Energy Management System: A Large Language Model Framework for Residential Load Scheduling	Reda El Makroum et.al.	2510.26603
2025-10-28	Pie: A Programmable Serving System for Emerging LLM Applications	In Gim et.al.	2510.24051_(SOSP)
2025-10-28	Improving LLM Reasoning via Dependency-Aware Query Decomposition and Logic-Parallel Content Expansion	Xianjun Gao et.al.	2510.24390
2025-10-28	From Narrative to Action: A Hierarchical LLM-Agent Framework for Human Mobility Generation	Qiumeng Li et.al.	2510.24802
2025-10-26	SwiftSolve: A Self-Iterative, Complexity-Aware Multi-Agent Framework for Competitive Programming	Adhyayan Veer Singh et.al.	2510.22626
2025-10-23	Accelerating Mobile Language Model via Speculative Decoding and NPU-Coordinated Execution	Zhiyang Chen et.al.	2510.15312
2025-10-22	PTFA: An LLM-based Agent that Facilitates Online Consensus Building through Parallel Thinking	Wen Gu et.al.	2503.12499
2025-10-21	The Trust Paradox in LLM-Based Multi-Agent Systems: When Collaboration Becomes a Security Vulnerability	Zijie Xu et.al.	2510.18563
2025-10-19	STARK: Strategic Team of Agents for Refining Kernels	Juncheng Dong et.al.	2510.16996
2025-10-18	Ripple Effect Protocol: Coordinating Agent Populations	Ayush Chopra et.al.	2510.16572
2025-10-16	Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies	Mason Nakamura et.al.	2510.14312
2025-10-15	Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic Serving	Nikos Pagonas et.al.	2510.14126
2025-10-14	Evaluating the Quality of Randomness and Entropy in Tasks Supported by Large Language Models	Rabimba Karanjai et.al.	2510.12080
2025-10-13	Part II: ROLL Flash – Accelerating RLVR and Agentic Training with Asynchrony	Han Lu et.al.	2510.11345
2025-10-13	ShishuLM: Lightweight Language Model with Hybrid Decoder-MLP Architecture and Paired Weight Sharing	Shivanshu Kumar et.al.	2510.13860
2025-10-13	StreamAgent: Towards Anticipatory Agents for Streaming Video Understanding	Haolin Yang et.al.	2508.01875
2025-10-11	Agentic Troubleshooting Guide Automation for Incident Management	Jiayi Mao et.al.	2510.10074
2025-10-10	OrcaLoca: An LLM Agent Framework for Software Issue Localization	Zhongming Yu et.al.	2502.00350
2025-10-10	StreamingVLM: Real-Time Understanding for Infinite Video Streams	Ruyi Xu et.al.	2510.09608
2025-10-08	FLEET: Formal Language-Grounded Scheduling for Heterogeneous Robot Teams	Corban Rivera et.al.	2510.07417
2025-10-06	Multi-Agent Collaborative Intelligence: Dual-Dial Control for Reliable LLM Reasoning	Edward Y. Chang et.al.	2510.04488
2025-10-06	ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering	Yuki Imajuku et.al.	2506.09050_(NeurIPS)
2025-10-03	Automatic Building Code Review: A Case Study	Hanlong Wan et.al.	2510.02634
2025-10-01	GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness	Kung-Hsiang Huang et.al.	2510.00536
2025-09-30	Towards Agentic OS: An LLM Agent Framework for Linux Schedulers	Yusheng Zheng et.al.	2509.01245
2025-09-30	AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges	Ranjan Sapkota et.al.	2505.10468
2025-09-27	Runtime Adaptive Pruning for LLM Inference	Huanrong Liu et.al.	2505.17138
2025-09-26	LLM Assisted Alpha Fairness for 6 GHz WiFi and NR_U Coexistence: An Agentic Orchestrator for Throughput, Energy, and SLA	Qun Wang et.al.	2510.17814
2025-09-26	ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration	Gaole Dai et.al.	2509.21823
2025-09-25	Nova: Real-Time Agentic Vision-Language Model Serving with Adaptive Cross-Stage Parallelization	Yuhang Xu et.al.	2509.21301
2025-09-24	CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks	Jiewei Chen et.al.	2509.19855
2025-09-20	Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games	Niv Eckhaus et.al.	2506.05309
2025-09-19	Overhearing LLM Agents: A Survey, Taxonomy, and Roadmap	Andrew Zhu et.al.	2509.16325
2025-09-17	CrowdAgent: Multi-Agent Managed Multi-Source Annotation System	Maosheng Qin et.al.	2509.14030
2025-08-30	LLM-Assisted Iterative Evolution with Swarm Intelligence Toward SuperBrain	Li Weigang et.al.	2509.00510
2025-08-29	Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward	Yong Deng et.al.	2508.12800
2025-08-20	Entropy-Constrained Strategy Optimization in Urban Floods: A Multi-Agent Framework with LLM and Knowledge Graph Integration	Peilin Ji et.al.	2508.14654
2025-08-18	Datarus-R1: An Adaptive Multi-Step Reasoning LLM for Automated Data Analysis	Ayoub Ben Chaliah et.al.	2508.13382
2025-08-18	AutoChemSchematic AI: Agentic Physics-Aware Automation for Chemical Manufacturing Scale-Up	Sakhinana Sagar Srinivas et.al.	2505.24584
2025-08-11	From Natural Language to Solver-Ready Power System Optimization: An LLM-Assisted, Validation-in-the-Loop Framework	Yunkai Hu et.al.	2508.08147
2025-08-09	Kairos: Low-latency Multi-Agent Serving with Shared LLMs and Excessive Loads in the Public Cloud	Jinyuan Chen et.al.	2508.06948
2025-08-06	AquaChat++: LLM-Assisted Multi-ROV Inspection for Aquaculture Net Pens with Integrated Battery Management and Thruster Fault Tolerance	Abdelhaleem Saad et.al.	2508.06554
2025-08-05	REALM-Bench: A Benchmark for Evaluating Multi-Agent Systems on Real-world, Dynamic Planning and Scheduling Tasks	Longling Geng et.al.	2502.18836
2025-08-01	CyGATE: Game-Theoretic Cyber Attack-Defense Engine for Patch Strategy Optimization	Yuning Jiang et.al.	2508.00478
2025-07-29	Forecasting LLM Inference Performance via Hardware-Agnostic Analytical Modeling	Rajeev Patwari et.al.	2508.00904
2025-07-29	StaffPro: an LLM Agent for Joint Staffing and Profiling	Alessio Maritan et.al.	2507.21636
2025-07-21	LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra	Seth Karten et.al.	2507.15815
2025-07-18	DREAMS: Density Functional Theory Based Research Engine for Agentic Materials Simulation	Ziqi Wang et.al.	2507.14267
2025-07-18	CodeEdu: A Multi-Agent Collaborative Platform for Personalized Coding Education	Jianing Zhao et.al.	2507.13814
2025-07-14	DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving	Yuhan Liu et.al.	2411.02820
2025-07-10	KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows	Zaifeng Pan et.al.	2507.07400
2025-07-09	Gradientsys: A Multi-Agent LLM Scheduler with ReAct Orchestration	Xinyuan Song et.al.	2507.06520
2025-07-07	StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling	Meng Wei et.al.	2507.05240
2025-06-28	FairMarket-RL: LLM-Guided Fairness Shaping for Multi-Agent Reinforcement Learning in Peer-to-Peer Markets	Shrenik Jadhav et.al.	2506.22708
2025-06-26	CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation	Nicolas Bougie et.al.	2506.21805
2025-06-26	MobiVerse: Scaling Urban Mobility Simulation with Hybrid Lightweight Domain-Specific Generator and Large Language Models	Yifan Liu et.al.	2506.21784
2025-06-25	MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation	Gurusha Juneja et.al.	2506.20737
2025-06-16	AlphaEvolve: A coding agent for scientific and algorithmic discovery	Alexander Novikov et.al.	2506.13131
2025-06-11	SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems	Peiran Li et.al.	2506.07564
2025-06-09	DeepServe: Serverless Large Language Model Serving at Scale	Junhao Hu et.al.	2501.14417
2025-06-07	EconEvals: Benchmarks and Litmus Tests for LLM Agents in Unknown Environments	Sara Fish et.al.	2503.18825
2025-06-04	AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance	Dhaval Patel et.al.	2506.03828
2025-06-01	A Survey of LLM $\times$ DATA	Xuanhe Zhou et.al.	2505.18458
2025-05-28	Design and testing of an agent chatbot supporting decision making with public transport data	Luca Fantin et.al.	2505.22698
2025-05-26	Task Memory Engine: Spatial Memory for Robust Multi-Step LLM Agents	Ye Ye et.al.	2505.19436
2025-05-23	Curriculum Guided Reinforcement Learning for Efficient Multi Hop Retrieval Augmented Generation	Yuelyu Ji et.al.	2505.17391
2025-05-20	Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation	Peter Baile Chen et.al.	2505.14398
2025-05-19	Learning Virtual Machine Scheduling in Cloud Computing through Language Agents	JieHao Wu et.al.	2505.10117
2025-05-18	ALAS: A Stateful Multi-LLM Agent Framework for Disruption-Aware Planning	Edward Y. Chang et.al.	2505.12501
2025-05-17	Demystifying and Enhancing the Efficiency of Large Language Model Based Search Agents	Tiannuo Yang et.al.	2505.12065
2025-05-17	OptimAI: Optimization from Natural Language Using LLM-Powered AI Agents	Raghav Thind et.al.	2504.16918
2025-04-24	Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents	Yueying Li et.al.	2504.07347
2025-04-21	PLANET: A Collection of Benchmarks for Evaluating LLMs’ Planning Capabilities	Haoming Li et.al.	2504.14773
2025-04-02	MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding	Ranajoy Sadhukhan et.al.	2408.11049
2025-04-01	Personality-Driven Decision-Making in LLM-Based Autonomous Agents	Lewis Newsham et.al.	2504.00727
2025-04-01	HERA: Hybrid Edge-cloud Resource Allocation for Cost-Efficient AI Agents	Shiyi Liu et.al.	2504.00434
2025-03-25	Agent-Initiated Interaction in Phone UI Automation	Noam Kahlon et.al.	2503.19537
2025-03-19	Exploring Large Language Models for Word Games:Who is the Spy?	Chentian Wei et.al.	2503.15235
2025-03-12	COLA: A Scalable Multi-Agent Framework For Windows UI Task Automation	Di Zhao et.al.	2503.09263
2025-03-11	LLM4MAC: An LLM-Driven Reinforcement Learning Framework for MAC Protocol Emergence	Renxuan Tan et.al.	2503.08123
2025-03-05	Pretrained LLMs as Real-Time Controllers for Robot Operated Serial Production Line	Muhammad Waseem et.al.	2503.03889
2025-02-28	ARIES: Autonomous Reasoning with LLMs on Interactive Thought Graph Environments	Pedro Gimenes et.al.	2502.21208
2025-02-27	TripCraft: A Benchmark for Spatio-Temporally Fine Grained Travel Planning	Soumyabrata Chaudhuri et.al.	2502.20508
2025-02-20	Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents	Axel Backlund et.al.	2502.15840
2025-02-20	Plan-over-Graph: Towards Parallelable LLM Agent Schedule	Shiqi Zhang et.al.	2502.14563
2025-02-19	Autellix: An Efficient Serving Engine for LLM Agents as General Programs	Michael Luo et.al.	2502.13965
2025-02-16	Streamlining the Collaborative Chain of Models into A Single Forward Pass in Generation-Based Tasks	Yuanjie Lyu et.al.	2502.11083
2025-02-06	Division-of-Thoughts: Harnessing Hybrid Language Model Synergy for Efficient On-Device Agents	Chenyang Shao et.al.	2502.04392
2025-01-29	MACI: Multi-Agent Collaborative Intelligence for Adaptive Reasoning and Temporal Planning	Edward Y. Chang et.al.	2501.16689
2025-01-27	LLM-powered Multi-agent Framework for Goal-oriented Learning in Intelligent Tutoring System	Tianfu Wang et.al.	2501.15749_(WWW)
2025-01-14	CuAsmRL: Optimizing GPU SASS Schedules via Deep Reinforcement Learning	Guoliang He et.al.	2501.08071_(CGO)
2024-12-24	TimelyLLM: Segmented LLM Serving System for Time-sensitive Robotic Applications	Neiwen Ling et.al.	2412.18695
2024-12-21	SYMPHONY: Improving Memory Management for LLM Inference Workloads	Saurabh Agarwal et.al.	2412.16434
2024-11-02	NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference	Xuanlin Jiang et.al.	2411.01142
2024-06-06	SGLang: Efficient Execution of Structured Language Model Programs	Lianmin Zheng et.al.	2312.07104
2024-05-14	Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis	Yao Fu et.al.	2405.08944