LLM Papers

Updated on 2025.08.02

Publish Date Title Authors PDF Code
2025-07-23 KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider Jiahao Wang et.al. 2506.02634 link
2025-07-23 GTA: Grouped-head latenT Attention Luoyang Sun et.al. 2506.17286 null
2025-07-23 DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training Zhixin Wang et.al. 2507.13833 null
2025-07-23 Learning to Extract Rational Evidence via Reinforcement Learning for Retrieval-Augmented Generation Xinping Zhao et.al. 2507.15586 null
2025-07-23 Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning Yanjun Zheng et.al. 2507.16802 null
2025-07-23 WAKENLLM: Evaluating Reasoning Potential and Stability in LLMs via Fine-Grained Benchmarking Zipeng Ling et.al. 2507.16199 null
2025-07-23 HyDRA: A Hybrid-Driven Reasoning Architecture for Verifiable Knowledge Graphs Adrian Kaiser et.al. 2507.15917 null
2025-07-23 Thinking Isn’t an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations Zhao Song et.al. 2507.17699 null
2025-07-23 Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks Ilias Chatzistefanidis et.al. 2507.17695 null
2025-07-23 CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning Lingxiao Tang et.al. 2507.17548 null
2025-07-23 Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning Yu Li et.al. 2507.17512 null
2025-07-23 An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models Haoran Sun et.al. 2507.17477 null
2025-07-23 MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs Alexander R. Fabbri et.al. 2507.17476 null
2025-07-23 Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning Situo Zhang et.al. 2507.17448 null
2025-07-23 HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs Zhaolin Cai et.al. 2507.17394 null
2025-07-23 DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning Chuzhan Hao et.al. 2507.17365 null
2025-07-23 R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning Zhuokun Chen et.al. 2507.17307 null
2025-07-23 Triple X: A LLM-Based Multilingual Speech Recognition System for the INTERSPEECH2025 MLC-SLM Challenge Miaomiao Gao et.al. 2507.17288 null
2025-07-23 Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance Rishi Parekh et.al. 2507.17273 null
2025-07-23 Agent Identity Evals: Measuring Agentic Identity Elija Perrier et.al. 2507.17257 null
2025-07-23 R4ec: A Reasoning, Reflection, and Refinement Framework for Recommendation Systems Hao Gu et.al. 2507.17249 null
2025-07-23 HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery Haoran Jiang et.al. 2507.17209 null
2025-07-23 Improving LLMs’ Generalized Reasoning Abilities by Graph Problems Qifan Zhang et.al. 2507.17168 null
2025-07-23 CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards Cheng Liu et.al. 2507.17147 null
2025-07-23 Resilient Multi-Agent Negotiation for Medical Supply Chains:Integrating LLMs and Blockchain for Transparent Coordination Mariam ALMutairi et.al. 2507.17134 null
2025-07-22 CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Xiaoya Li et.al. 2507.14111 null
2025-07-22 Reasoning Models Can be Easily Hacked by Fake Reasoning Bias Qian Wang et.al. 2507.13758 null
2025-07-22 Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters Shanbo Cheng et.al. 2507.13618 null
2025-07-22 Gemini 2.5 Pro Capable of Winning Gold at IMO 2025 Yichen Huang et.al. 2507.15855 null
2025-07-22 X-Intelligence 3.0: Training and Evaluating Reasoning LLM for Semiconductor Display Xiaolin Yan et.al. 2507.14430 null
2025-07-22 Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Hongyin Luo et.al. 2507.16784 null
2025-07-22 WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding Ran Wang et.al. 2507.16768 null
2025-07-22 Towards Compute-Optimal Many-Shot In-Context Learning Shahriar Golchin et.al. 2507.16217 null
2025-07-22 ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Chi-Pin Huang et.al. 2507.16815 null
2025-07-22 LingBench++: A Linguistically-Informed Benchmark and Reasoning Framework for Multi-Step and Cross-Cultural Inference with LLMs Da-Chen Lian et.al. 2507.16809 null
2025-07-22 When LLMs Copy to Think: Uncovering Copy-Guided Attacks in Reasoning LLMs Yue Li et.al. 2507.16773 null
2025-07-22 P-CoT: A Pedagogically-motivated Participatory Chain-of-Thought Prompting for Phonological Reasoning in LLMs Dongjun Jang et.al. 2507.16656 null
2025-07-22 Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications Jean Lelong et.al. 2507.16507 null
2025-07-22 Learning Temporal Abstractions via Variational Homomorphisms in Option-Induced Abstract MDPs Chang Li et.al. 2507.16473 null
2025-07-22 LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning Bo Hou et.al. 2507.16395 null
2025-07-22 Re:Form – Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny Chuanhao Yan et.al. 2507.16331 null
2025-07-22 Mind the Gap: Evaluating the Representativeness of Quantitative Medical Language Reasoning LLM Benchmarks for African Disease Burdens Fred Mutisya et.al. 2507.16322 null
2025-07-22 Perovskite-R1: A Domain-Specialized LLM for Intelligent Discovery of Precursor Additives and Experimental Design Xin-De Wang et.al. 2507.16307 null
2025-07-22 Do Large Language Models Have a Planning Theory of Mind? Evidence from MindGames: a Multi-Step Persuasion Task Jared Moore et.al. 2507.16196 null
2025-07-22 Emergent Cognitive Convergence via Implementation: A Structured Loop Reflecting Four Theories of Mind (A Position Paper) Myung Ho Kim et.al. 2507.16184 null
2025-07-22 LoRA is All You Need for Safety Alignment of Reasoning LLMs Yihao Xue et.al. 2507.17075 null
2025-07-22 Controllable Hybrid Captioner for Improved Long-form Video Understanding Kuleen Sasse et.al. 2507.17047 null
2025-07-22 Text-to-SPARQL Goes Beyond English: Multilingual Question Answering Over Knowledge Graphs through Human-Inspired Reasoning Aleksandr Perevalov et.al. 2507.16971 null
2025-07-22 AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation Nima Fathi et.al. 2507.16940 null
2025-07-22 CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos Xuchen Li et.al. 2507.16878 null
2025-07-21 A Survey of Context Engineering for Large Language Models Lingrui Mei et.al. 2507.13334 null
2025-07-21 The Impact of Language Mixing on Bilingual LLM Reasoning Yihao Li et.al. 2507.15849 null
2025-07-21 Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning Sneheel Sarangi et.al. 2507.15788 null
2025-07-21 Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR Jiakang Wang et.al. 2507.15778 null
2025-07-21 A Framework for Analyzing Abnormal Emergence in Service Ecosystems Through LLM-based Agent Intention Mining Yifan Shen et.al. 2507.15770 null
2025-07-21 Understanding Large Language Models’ Ability on Interdisciplinary Research Yuanhao Shen et.al. 2507.15736 null
2025-07-21 BEnchmarking LLMs for Ophthalmology (BELO) for Ophthalmological Knowledge and Reasoning Sahana Srinivasan et.al. 2507.15717 null
2025-07-21 Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked? Seok Hwan Song et.al. 2507.15707 null
2025-07-21 CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models Congmin Zheng et.al. 2507.15698 null
2025-07-21 P3: Prompts Promote Prompting Xinyu Zhang et.al. 2507.15675 null
2025-07-21 BugScope: Learn to Find Bugs Like Human Jinyao Guo et.al. 2507.15671 null
2025-07-21 PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors Yimeng Chen et.al. 2507.15550 null
2025-07-21 LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning Cole Robertson et.al. 2507.15521 null
2025-07-21 Step-level Verifier-guided Hybrid Test-Time Scaling for Large Language Models Kaiyan Chang et.al. 2507.15512 null
2025-07-21 AlgoSimBench: Identifying Algorithmically Similar Problems for Competitive Programming Jierui Li et.al. 2507.15378 null
2025-07-21 StackTrans: From Large Language Model to Large Pushdown Automata Model Kechi Zhang et.al. 2507.15343 null
2025-07-21 Reasoning Models are Test Exploiters: Rethinking Multiple-Choice Narun Raman et.al. 2507.15337 null
2025-07-21 Input Reduction Enhanced LLM-based Program Repair Boyang Yang et.al. 2507.15251 null
2025-07-21 SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search Xiaofeng Shi et.al. 2507.15245 null
2025-07-21 FaultLine: Automated Proof-of-Vulnerability Generation Using LLM Agents Vikram Nitin et.al. 2507.15241 null
2025-07-21 Solving Formal Math Problems by Decomposition and Iterative Reflection Yichi Zhou et.al. 2507.15225 null
2025-07-21 Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization Shengchao Liu et.al. 2507.16110 null
2025-07-21 Deep Researcher with Test-Time Diffusion Rujun Han et.al. 2507.16075 null
2025-07-21 Learning without training: The implicit dynamics of in-context learning Benoit Dherin et.al. 2507.16003 null
2025-07-21 Does More Inference-Time Compute Really Help Robustness? Tong Wu et.al. 2507.15974 null
2025-07-20 Lizard: An Efficient Linearization Framework for Large Language Models Chien Van Nguyen et.al. 2507.09025 null
2025-07-20 Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback Yiyuan Yang et.al. 2507.15066 null
2025-07-20 WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization Zhengwei Tao et.al. 2507.15061 null
2025-07-20 Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding Yuanhan Zhang et.al. 2507.15028 null
2025-07-20 RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback Qiaoyu Tang et.al. 2507.15024 null
2025-07-20 EduThink4AI: Translating Educational Critical Thinking into Multi-Agent LLM Systems Xinmeng Hou et.al. 2507.15015 null
2025-07-20 AlphaAlign: Incentivizing Safety Alignment with Extremely Simplified Reinforcement Learning Yi Zhang et.al. 2507.14987 null
2025-07-20 MUR: Momentum Uncertainty guided Reasoning for Large Language Models Hang Yan et.al. 2507.14958 null
2025-07-20 LEKIA: A Framework for Architectural Alignment via Expert Knowledge Injection Boning Zhao et.al. 2507.14944 null
2025-07-20 Feedback-Induced Performance Decline in LLM-Based Decision-Making Xiao Yang et.al. 2507.14906 null
2025-07-20 InsightX Agent: An LMM-based Agentic Framework with Integrated Tools for Reliable X-ray NDT Analysis Jiale Liu et.al. 2507.14899 null
2025-07-20 MEKiT: Multi-source Heterogeneous Knowledge Injection Method via Instruction Tuning for Emotion-Cause Pair Extraction Shiyi Mu et.al. 2507.14887 null
2025-07-20 Large Language Model as An Operator: An Experience-Driven Solution for Distribution Network Voltage Control Xu Yang et.al. 2507.14800 null
2025-07-20 Exploring the In-Context Learning Capabilities of LLMs for Money Laundering Detection in Financial Graphs Erfan Pirmorad et.al. 2507.14785 null
2025-07-20 LeAdQA: LLM-Driven Context-Aware Temporal Grounding for Video Question Answering Xinxin Dong et.al. 2507.14784 null
2025-07-20 Omni-Think: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards Derek Li et.al. 2507.14783 null
2025-07-19 Draft-based Approximate Inference for LLMs Kevin Galim et.al. 2506.08373 link
2025-07-19 Dynamic Context Tuning for Retrieval-Augmented Generation: Enhancing Multi-Turn Planning and Tool Adaptation Jubin Abhishek Soni et.al. 2506.11092 null
2025-07-19 Mind the Gap: A Review of Arabic Post-Training Datasets and Their Limitations Mohammed Alkhowaiter et.al. 2507.14688 null
2025-07-19 Agentic Satellite-Augmented Low-Altitude Economy and Terrestrial Networks: A Survey on Generative Approaches Xiaozheng Gao et.al. 2507.14633 null
2025-07-19 Retrieval-Augmented Clinical Benchmarking for Contextual Model Testing in Kenyan Primary Care: A Methodology Paper Fred Mutisya et.al. 2507.14615 null
2025-07-19 What do Large Language Models know about materials? Adrian Ehrenhofer et.al. 2507.14586 null
2025-07-19 Explainable Collaborative Problem Solving Diagnosis with BERT using SHAP and its Implications for Teacher Adoption Kester Wong et.al. 2507.14584 null
2025-07-19 Amico: An Event-Driven Modular Framework for Persistent and Embedded Autonomy Hongyi Yang et.al. 2507.14513 null
2025-07-18 LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues Haoyang Li et.al. 2507.13681 null
2025-07-18 DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration Xiyun Li et.al. 2507.14088 null
2025-07-18 Efficient Temporal Tokenization for Mobility Prediction with Large Language Models Haoyu He et.al. 2507.14017 null
2025-07-18 DUALRec: A Hybrid Sequential and Language Model Framework for Context-Aware Movie Recommendation Yitong Li et.al. 2507.13957 null
2025-07-18 Cross-modal Causal Intervention for Alzheimer’s Disease Prediction Yutao Jin et.al. 2507.13956 null
2025-07-18 InTraVisTo: Inside Transformer Visualisation Tool Nicolò Brunello et.al. 2507.13858 null
2025-07-18 Team of One: Cracking Complex Video QA with Model Synergy Jun Xie et.al. 2507.13820 null
2025-07-18 Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques Niveen O. Jaffal et.al. 2507.13629 null
2025-07-18 BifrostRAG: Bridging Dual Knowledge Graphs for Multi-Hop Question Answering in Construction Safety Yuxin Zhang et.al. 2507.13625 null
2025-07-18 Fail Fast, or Ask: Mitigating the Deficiencies of Reasoning LLMs with Human-in-the-Loop Systems Engineering Michael J. Zellinger et.al. 2507.14406 null
2025-07-18 NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers Sarunas Kalade et.al. 2507.14403 null
2025-07-18 NetIntent: Leveraging Large Language Models for End-to-End Intent-Based SDN Automation Md. Kamrul Hossain et.al. 2507.14398 null
2025-07-18 ProofCompass: Enhancing Specialized Provers with LLM Guidance Nicolas Wischermann et.al. 2507.14335 null
2025-07-18 How LLMs Comprehend Temporal Meaning in Narratives: A Case Study in Cognitive Evaluation of LLMs Karin de Langis et.al. 2507.14307 null
2025-07-18 A Simple “Try Again” Can Elicit Multi-Turn LLM Reasoning Licheng Liu et.al. 2507.14295 null
2025-07-18 Impact of Code Context and Prompting Strategies on Automated Unit Test Generation with Modern General-Purpose Large Language Models Jakub Walczak et.al. 2507.14256 null
2025-07-17 LLM-Driven Dual-Level Multi-Interest Modeling for Recommendation Ziyan Wang et.al. 2507.10917 null
2025-07-17 MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks Artem Chervyakov et.al. 2507.12284 null
2025-07-17 Aime: Towards Fully-Autonomous Multi-Agent Framework Yexuan Shi et.al. 2507.11988 null
2025-07-17 VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding Shihao Wang et.al. 2507.13353 null
2025-07-17 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Senqiao Yang et.al. 2507.13348 null
2025-07-17 Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes Tyler Loakman et.al. 2507.13335 null
2025-07-17 The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner Zhouqi Hua et.al. 2507.13332 null
2025-07-17 QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation Jiazheng Li et.al. 2507.13266 null
2025-07-17 HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models Ashray Gupta et.al. 2507.13238 null
2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Hao Sun et.al. 2507.13158 null
2025-07-17 SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models Xiangyu Dong et.al. 2507.13152 null
2025-07-17 MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems Yu Cui et.al. 2507.13038 null
2025-07-17 Probabilistic Soundness Guarantees in LLM Reasoning Chains Weiqiu You et.al. 2507.12948 null
2025-07-17 Agentar-DeepFinance-300K: A Large-Scale Financial Dataset via Systematic Chain-of-Thought Synthesis Optimization Xiaoke Zhao et.al. 2507.12901 null
2025-07-17 VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks Jian Yao et.al. 2507.12885 null
2025-07-17 DEMONSTRATE: Zero-shot Language to Robotic Control via Multi-task Demonstration Learning Rahel Rickenbach et.al. 2507.12855 null
2025-07-17 A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models Weijieying Ren et.al. 2507.12774 null
2025-07-17 osmAG-LLM: Zero-Shot Open-Vocabulary Object Navigation via Semantic Maps and Large Language Models Reasoning Fujing Xie et.al. 2507.12753 null
2025-07-17 TransEvalnia: Reasoning-based Evaluation and Ranking of Translations Richard Sproat et.al. 2507.12724 null
2025-07-17 Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation Genki Kusano et.al. 2507.13525 null
2025-07-17 Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers Liang Lin et.al. 2507.13474 null
2025-07-17 Intent-Based Network for RAN Management with Large Language Models Fransiscus Asisi Bimo et.al. 2507.14230 null
2025-07-17 Why Braking? Scenario Extraction and Reasoning Utilizing LLM Yin Wu et.al. 2507.15874 null
2025-07-16 Simple Mechanistic Explanations for Out-Of-Context Reasoning Atticus Wang et.al. 2507.08218 null
2025-07-16 The Challenge of Teaching Reasoning to LLMs Without RL or Distillation Wei Du et.al. 2507.09850 null
2025-07-16 Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs Yangning Li et.al. 2507.09477 null
2025-07-16 Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize? Yanjian Zhang et.al. 2507.11423 null
2025-07-16 GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning Ziru Liu et.al. 2507.10628 null
2025-07-16 IAM: Efficient Inference through Attention Mapping between Different-scale LLMs Yi Zhao et.al. 2507.11953 null
2025-07-16 Assessing the Value of Visual Input: A Benchmark of Multimodal Large Language Models for Robotic Path Planning Jacinto Colan et.al. 2507.12391 null
2025-07-16 Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics Meysam Alizadeh et.al. 2507.12372 null
2025-07-16 Thought Purity: Defense Paradigm For Chain-of-Thought Attack Zihao Xue et.al. 2507.12314 null
2025-07-16 Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning Yuhao Chen et.al. 2507.12215 null
2025-07-16 Findings of MEGA: Maths Explanation with LLMs using the Socratic Method for Active Learning Tosin Adewumi et.al. 2507.12079 null
2025-07-16 Evaluating the Ability of Large Language Models to Reason about Cardinal Directions, Revisited Anthony G Cohn et.al. 2507.12059 null
2025-07-16 Watch, Listen, Understand, Mislead: Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation Sahid Hossain Mustakim et.al. 2507.11968 null
2025-07-16 PoTPTQ: A Two-step Power-of-Two Post-training for LLMs Xinyu Wang et.al. 2507.11959 null
2025-07-16 The benefits of query-based KGQA systems for complex and temporal questions in LLM era Artem Alekseev et.al. 2507.11954 null
2025-07-16 Hyperphantasia: A Benchmark for Evaluating the Mental Visualization Capabilities of Multimodal LLMs Mohammad Shahab Sepehri et.al. 2507.11932 null
2025-07-16 Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training Mingjie Liu et.al. 2507.12507 null
2025-07-16 PARAM-1 BharatGen 2.9B Model Kundeshwar Pundalik et.al. 2507.13390 null
2025-07-15 ContextCache: Context-Aware Semantic Cache for Multi-Turn Queries in Large Language Models Jianxin Yan et.al. 2506.22791 null
2025-07-15 VerifyBench: A Systematic Benchmark for Evaluating Reasoning Verifiers Across Domains Xuzhao Li et.al. 2507.09884 null
2025-07-15 Mixture of LoRA Experts with Multi-Modal and Multi-Granularity LLM Generative Error Correction for Accented Speech Recognition Bingshen Mu et.al. 2507.09116 null
2025-07-15 Bridging Literature and the Universe Via A Multi-Agent Large Language Model System Xiaowen Zhang et.al. 2507.08958 null
2025-07-15 MIRAGE: KV Cache Optimization through Parameter Remapping for Multi-tenant LLM Serving Ruihao Li et.al. 2507.11507 null
2025-07-15 KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding Luohe Shi et.al. 2507.11273 null
2025-07-15 How Many Instructions Can LLMs Follow at Once? Daniel Jaroslawicz et.al. 2507.11538 null
2025-07-15 DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering Yinsheng Li et.al. 2507.11527 null
2025-07-15 Modeling Code: Is Text All You Need? Daniel Nichols et.al. 2507.11467 null
2025-07-15 LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer Yaoxian Dong et.al. 2507.11457 null
2025-07-15 KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? Soumadeep Saha et.al. 2507.11408 null
2025-07-15 DCR: Quantifying Data Contamination in LLMs Evaluation Cheng Xu et.al. 2507.11405 null
2025-07-15 Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs Gabriel Bo et.al. 2507.11371 null
2025-07-15 Guiding LLM Decision-Making with Fairness Reward Models Zara Hall et.al. 2507.11344 null
2025-07-15 LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification Fengxiao Tang et.al. 2507.11310 null
2025-07-15 Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems Dany Moshkovich et.al. 2507.11277 null
2025-07-15 FMC: Formalization of Natural Language Mathematical Competition Problems Jiaxuan Xie et.al. 2507.11275 null
2025-07-15 An Agentic Flow for Finite State Machine Extraction using Prompt Chaining Fares Wael et.al. 2507.11222 null
2025-07-15 LLM-Augmented Symptom Analysis for Cardiovascular Disease Risk Prediction: A Clinical NLP Haowei Yang et.al. 2507.11052 null
2025-07-15 Teach Me Sign: Stepwise Prompting LLM for Sign Language Production Zhaoyi An et.al. 2507.10972 null
2025-07-15 Modeling Understanding of Story-Based Analogies Using Large Language Models Kalit Inani et.al. 2507.10957 null
2025-07-15 Artificial Finance: How AI Thinks About Money Orhan Erdem et.al. 2507.10933 null
2025-07-15 Evaluating Generated Commit Messages with Large Language Models Qunhong Zeng et.al. 2507.10906 null
2025-07-15 General Modular Harness for LLM Agents in Multi-Turn Gaming Environments Yuxuan Zhang et.al. 2507.11633 null
2025-07-14 InstCache: A Predictive Cache for LLM Serving Longwei Zou et.al. 2411.13820 null
2025-07-14 DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving Yuhan Liu et.al. 2411.02820 null
2025-07-14 GR-LLMs: Recent Advances in Generative Recommendation Based on Large Language Models Zhen Yang et.al. 2507.06507 null
2025-07-14 PyVision: Agentic Vision with Dynamic Tooling Shitian Zhao et.al. 2507.07998 null
2025-07-14 Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code Keqin Bao et.al. 2507.07498 null
2025-07-14 ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism Zedong Liu et.al. 2507.10069 null
2025-07-14 Fusing LLM Capabilities with Routing Data Tao Feng et.al. 2507.10540 null
2025-07-14 CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks Hongchao Jiang et.al. 2507.10535 null
2025-07-14 Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Mingqi Wu et.al. 2507.10532 null
2025-07-14 DeepResearch$^{\text{Eco}}$: A Recursive Agentic Workflow for Complex Scientific Question Answering in Ecology Jennifer D’Souza et.al. 2507.10522 null
2025-07-14 Referential ambiguity and clarification requests: comparing human and LLM behaviour Chris Madge et.al. 2507.10445 null
2025-07-14 Prompt Informed Reinforcement Learning for Visual Coverage Path Planning Venkat Margapuri et.al. 2507.10284 null
2025-07-14 Toward Real-World Table Agents: Capabilities, Workflows, and Design Principles for LLM-based Table Intelligence Jiaming Tian et.al. 2507.10281 null
2025-07-14 Breaking the Myth: Can Small Models Infer Postconditions Too? Gehao Zhang et.al. 2507.10182 null
2025-07-14 Fusing Large Language Models with Temporal Transformers for Time Series Forecasting Chen Su et.al. 2507.10098 null
2025-07-14 Foundation Model Driven Robotics: A Comprehensive Review Muhammad Tayyab Khan et.al. 2507.10087 null
2025-07-14 LLMShot: Reducing snapshot testing maintenance via LLMs Ergün Batuhan Kaynak et.al. 2507.10062 null
2025-07-14 Towards Applying Large Language Models to Complement Single-Cell Foundation Models Steven Palayew et.al. 2507.10039 null
2025-07-14 Deep Hidden Cognition Facilitates Reliable Chain-of-Thought Reasoning Zijun Chen et.al. 2507.10007 null
2025-07-14 DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models Luolin Xiong et.al. 2507.09955 null
2025-07-14 Mechanistic Interpretability of LoRA-Adapted Language Models for Nuclear Reactor Safety Applications Yoon Pyo Lee et.al. 2507.09931 null
2025-07-14 ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models Yongheng Zhang et.al. 2507.09876 null
2025-07-14 Warehouse Spatial Question Answering with LLM Agent Hsiang-Wei Huang et.al. 2507.10778 null
2025-07-14 ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning Zhengyue Zhao et.al. 2507.11500 null
2025-07-14 Enhancing the Capabilities of Large Language Models for API calls through Knowledge Graphs Ye Yang et.al. 2507.10630 null
2025-07-14 Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning Zheng Zhang et.al. 2507.10624 null
2025-07-14 Game Theory Meets LLM and Agentic AI: Reimagining Cybersecurity for the Age of Intelligent Threats Quanyan Zhu et.al. 2507.10621 null
2025-07-14 Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding Ishraq Khan et.al. 2507.12482 null
2025-07-14 LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models Dachuan Shi et.al. 2507.14204 null
2025-07-13 Perception-Aware Policy Optimization for Multimodal Reasoning Zhenhailong Wang et.al. 2507.06448 null
2025-07-13 Prompting for Performance: Exploring LLMs for Configuring Software Helge Spieker et.al. 2507.09790 null
2025-07-13 Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations Bradley P. Allen et.al. 2507.09751 null
2025-07-13 Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces Baturay Saglam et.al. 2507.09709 null
2025-07-13 Can AI Rely on the Systematicity of Truth? The Challenge of Modelling Normative Domains Matthieu Queloz et.al. 2507.09676 null
2025-07-13 Can Group Relative Policy Optimization Improve Thai Legal Reasoning and Question Answering? Pawitsapak Akarajaradwong et.al. 2507.09638 null
2025-07-13 AICrypto: A Comprehensive Benchmark For Evaluating Cryptography Capabilities of Large Language Models Yu Wang et.al. 2507.09580 null
2025-07-13 Reframing SAR Target Recognition as Visual Reasoning: A Chain-of-Thought Dataset with Multimodal LLMs Chaoran Li et.al. 2507.09535 null
2025-07-13 Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them Neel Rajani et.al. 2507.10616 null
2025-07-12 DrugMCTS: a drug repurposing framework combining multi-agent, RAG and Monte Carlo Tree Search Zerui Yang et.al. 2507.07426 null
2025-07-12 LLM-Stackelberg Games: Conjectural Reasoning Equilibria and Their Applications to Spearphishing Quanyan Zhu et.al. 2507.09407 null
2025-07-12 StockSim: A Dual-Mode Order-Level Simulator for Evaluating Multi-Agent LLMs in Financial Markets Charidimos Papadakis et.al. 2507.09255 null
2025-07-12 Towards Spatial Audio Understanding via Question Answering Parthasaarathy Sudarsanam et.al. 2507.09195 null
2025-07-12 Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models Ameen Ali et.al. 2507.09185 null
2025-07-12 OPENXRD: A Comprehensive Benchmark and Enhancement Framework for LLM/MLLM XRD Question Answering Ali Vosoughi et.al. 2507.09155 null
2025-07-12 CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards Taolin Zhang et.al. 2507.09104 null
2025-07-12 Learning from Synthetic Labs: Language Models as Auction Participants Anand Shah et.al. 2507.09083 null
2025-07-12 Emergence of Hierarchical Emotion Organization in Large Language Models Bo Zhao et.al. 2507.10599 null
2025-07-12 PLEX: Perturbation-free Local Explanations for LLM-Based Text Classification Yogachandran Rahulamathavan et.al. 2507.10596 null
2025-07-12 LLM-Powered Quantum Code Transpilation Nazanin Siavash et.al. 2507.12480 null
2025-07-11 Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model Jing Liang et.al. 2507.06892 null
2025-07-11 StarDojo: Benchmarking Open-Ended Behaviors of Agentic Multimodal LLMs in Production-Living Simulations with Stardew Valley Weihao Tan et.al. 2507.07445 null
2025-07-11 InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching Yilun Wang et.al. 2507.08523 null
2025-07-11 xpSHACL: Explainable SHACL Validation using Retrieval-Augmented Generation and Large Language Models Gustavo Correa Publio et.al. 2507.08432 null
2025-07-11 One Token to Fool LLM-as-a-Judge Yulai Zhao et.al. 2507.08794 null
2025-07-11 ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way Rajarshi Roy et.al. 2507.08679 null
2025-07-11 Introspection of Thought Helps AI Agents Haoran Sun et.al. 2507.08664 null
2025-07-11 Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning Xingguang Ji et.al. 2507.08649 null
2025-07-11 A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1 Marcin Pietroń et.al. 2507.08621 null
2025-07-11 Agentic Large Language Models for Conceptual Systems Engineering and Design Soheyl Massoudi et.al. 2507.08619 null
2025-07-11 AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs Florian Grötschla et.al. 2507.08616 null
2025-07-11 AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling Preslav Aleksandrov et.al. 2507.08567 null
2025-07-11 The AI Language Proficiency Monitor – Tracking the Progress of LLMs on Multilingual Benchmarks David Pomerenke et.al. 2507.08538 null
2025-07-11 From Language to Logic: A Bi-Level Framework for Structured Reasoning Keying Yang et.al. 2507.08501 null
2025-07-11 LLaPa: A Vision-Language Model Framework for Counterfactual-Aware Procedural Planning Shibo Sun et.al. 2507.08496 null
2025-07-11 Using Large Language Models for Legal Decision-Making in Austrian Value-Added Tax Law: An Experimental Study Marina Luketina et.al. 2507.08468 null
2025-07-11 ChainEdit: Propagating Ripple Effects in LLM Knowledge Editing through Logical Rule-Guided Chains Zilu Dong et.al. 2507.08427 null
2025-07-11 Understanding Driving Risks using Large Language Models: Toward Elderly Driver Assessment Yuki Yoshihara et.al. 2507.08367 null
2025-07-11 What Factors Affect LLMs and RLLMs in Financial Question Answering? Peng Wang et.al. 2507.08339 null
2025-07-11 Agent Safety Alignment via Reinforcement Learning Zeyang Sha et.al. 2507.08270 null
2025-07-11 A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning Hiroshi Yoshihara et.al. 2507.08267 null
2025-07-11 InsightBuild: LLM-Powered Causal Reasoning in Smart Building Systems Pinaki Prasad Guha Neogi et.al. 2507.08235 null
2025-07-11 Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural Reasoning Chan Young Park et.al. 2507.08224 null
2025-07-11 OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique Wasi Uddin Ahmad et.al. 2507.09075 null
2025-07-11 Infinite Video Understanding Dell Zhang et.al. 2507.09068 null
2025-07-11 ALIGN: Prompt-based Attribute Alignment for Reliable, Responsible, and Personalized LLM-based Decision-Making Bharadwaj Ravichandran et.al. 2507.09037 null
2025-07-11 How to Train a Leader: Hierarchical Reasoning in Multi-Agent LLMs Andrew Estornell et.al. 2507.08960 null
2025-07-11 GraphRunner: A Multi-Stage Framework for Efficient and Accurate Graph-Based Retrieval Savini Kashmira et.al. 2507.08945 null
2025-07-11 Optimizing Sequential Multi-Step Tasks with Parallel LLM Agents Enhao Zhang et.al. 2507.08944 null
2025-07-11 From Sequence to Structure: Uncovering Substructure Reasoning in Transformers Xinnan Dai et.al. 2507.10435 null
2025-07-11 An Offline Mobile Conversational Agent for Mental Health Support: Learning from Emotional Dialogues and Psychological Texts with Student-Centered Evaluation Vimaleswar A et.al. 2507.10580 null
2025-07-11 Can Large Language Models Understand As Well As Apply Patent Regulations to Pass a Hands-On Patent Attorney Test? Bhakti Khera et.al. 2507.10576 null
2025-07-10 Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs Jiakun Fan et.al. 2506.03296 null
2025-07-10 A Survey on Latent Reasoning Rui-Jie Zhu et.al. 2507.06203 null
2025-07-10 Skywork-R1V3 Technical Report Wei Shen et.al. 2507.06167 null
2025-07-10 Rethinking Verification for LLM Code Generation: From Generation to Testing Zihan Ma et.al. 2507.06920 null
2025-07-10 Shifting from Ranking to Set Selection for Retrieval Augmented Generation Dahyun Lee et.al. 2507.06838 null
2025-07-10 Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs Jeongseok Hyun et.al. 2507.07990 null
2025-07-10 KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows Zaifeng Pan et.al. 2507.07400 null
2025-07-10 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs Ziyue Li et.al. 2507.07996 null
2025-07-10 Automating Expert-Level Medical Reasoning Evaluation of Large Language Models Shuang Zhou et.al. 2507.07988 null
2025-07-10 MIRIX: Multi-Agent Memory System for LLM-Based Agents Yu Wang et.al. 2507.07957 null
2025-07-10 DocCHA: Towards LLM-Augmented Interactive Online diagnosis System Xinyi Liu et.al. 2507.07870 null
2025-07-10 MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving Lu Xu et.al. 2507.07818 null
2025-07-10 SURPRISE3D: A Dataset for Spatial Understanding and Reasoning in Complex 3D Scenes Jiaxin Huang et.al. 2507.07781 null
2025-07-10 When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance Peizhang Shao et.al. 2507.07748 null
2025-07-10 Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization Chengtao Jian et.al. 2507.07723 null
2025-07-10 Rationale-Enhanced Decoding for Multi-modal Chain-of-Thought Shin’ya Yamaguchi et.al. 2507.07685 null
2025-07-10 PlanQA: A Benchmark for Spatial Reasoning in LLMs using Structured Representations Fedor Rodionov et.al. 2507.07644 null
2025-07-10 Position: We Need An Algorithmic Understanding of Generative AI Oliver Eberle et.al. 2507.07544 null
2025-07-10 PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving Mihir Parmar et.al. 2507.07495 null
2025-07-10 RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning Hongzhi Zhang et.al. 2507.07451 null
2025-07-10 SAND: Boosting LLM Agents with Self-Taught Action Deliberation Yu Xia et.al. 2507.07441 null
2025-07-10 Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores Vivek Chari et.al. 2507.08143 null
2025-07-10 Krul: Efficient State Restoration for Multi-turn Conversations with Dynamic Cross-layer KV Sharing Junyi Wen et.al. 2507.08045 null
2025-07-10 Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions Quanyan Zhu et.al. 2507.08208 null
2025-07-10 CTRLS: Chain-of-Thought Reasoning via Latent State-Transition Junda Wu et.al. 2507.08182 null
2025-07-10 TableReasoner: Advancing Table Reasoning Framework with Large Language Models Sishi Xiong et.al. 2507.08046 null
2025-07-09 Saffron-1: Safety Inference Scaling Ruizhong Qiu et.al. 2506.06444 link
2025-07-09 Can LLMs Play Ô Ăn Quan Game? A Study of Multi-Step Planning and Decision Making Sang Quang Nguyen et.al. 2507.03711 null
2025-07-09 FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models Bo Pang et.al. 2507.06057 null
2025-07-09 Measuring how changes in code readability attributes affect code quality evaluation by Large Language Models Igor Regis da Silva Simoes et.al. 2507.05289 null
2025-07-09 SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference Qian Chen et.al. 2507.06567 null
2025-07-09 SpindleKV: A Novel KV Cache Reduction Method Balancing Both Shallow and Deep Layers Zicong Tang et.al. 2507.06517 null
2025-07-09 Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor Vatsal Agarwal et.al. 2507.07106 null
2025-07-09 Evaluating Large Multimodal Models for Nutrition Analysis: A Benchmark Enriched with Contextual Metadata Bruce Coburn et.al. 2507.07048 null
2025-07-09 First Return, Entropy-Eliciting Explore Tianyu Zheng et.al. 2507.07017 null
2025-07-09 Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs Yahan Yu et.al. 2507.06999 null
2025-07-09 Are They All Good? Evaluating the Quality of CoTs in LLM-based Code Generation Binquan Zhang et.al. 2507.06980 null
2025-07-09 Adaptive Termination for Multi-round Parallel Reasoning: An Universal Semantic Entropy-Guided Framework Zenan Xu et.al. 2507.06829 null
2025-07-09 PenTest2.0: Towards Autonomous Privilege Escalation Using GenAI Haitham S. Al-Sinani et.al. 2507.06742 null
2025-07-09 A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding Zhenyang Liu et.al. 2507.06719 null
2025-07-09 From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization Xinjie Chen et.al. 2507.06573 null
2025-07-09 Gradientsys: A Multi-Agent LLM Scheduler with ReAct Orchestration Xinyuan Song et.al. 2507.06520 null
2025-07-09 Towards LLM-based Root Cause Analysis of Hardware Design Failures Siyu Qiu et.al. 2507.06512 null
2025-07-09 Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning Ziyang Wang et.al. 2507.06485 null
2025-07-09 Bridging the Plausibility-Validity Gap by Fine-Tuning a Reasoning-Enhanced LLM for Chemical Synthesis and Discovery Malikussaid et.al. 2507.07328 null
2025-07-09 Frontier LLMs Still Struggle with Simple Reasoning Tasks Alan Malek et.al. 2507.07313 null
2025-07-09 ViDove: A Translation Agent System with Multimodal Context and Memory-Augmented Reasoning Yichen Lu et.al. 2507.07306 null
2025-07-09 CRISP: Complex Reasoning with Interpretable Step-based Plans Matan Vetzler et.al. 2507.08037 null
2025-07-09 Integrating External Tools with Large Language Models to Improve Accuracy Nripesh Niketan et.al. 2507.08034 null
2025-07-09 RAG Safety: Exploring Knowledge Poisoning Attacks to Retrieval-Augmented Generation Tianzhe Zhao et.al. 2507.08862 null
2025-07-08 Activation Steering for Chain-of-Thought Compression Seyedarmin Azizi et.al. 2507.04742 null
2025-07-08 MemOS: A Memory OS for AI System Zhiyu Li et.al. 2507.03724 null
2025-07-08 Coding Triangle: How Does Large Language Model Understand Code? Taolin Zhang et.al. 2507.06138 null
2025-07-08 PrefixAgent: An LLM-Powered Design Framework for Efficient Prefix Adder Optimization Dongsheng Zuo et.al. 2507.06127 null
2025-07-08 Hierarchical Interaction Summarization and Contrastive Prompting for Explainable Recommendations Yibin Liu et.al. 2507.06044 null
2025-07-08 Conditional Multi-Stage Failure Recovery for Embodied Agents Youmna Farag et.al. 2507.06016 null
2025-07-08 CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation Kushal Gajjar et.al. 2507.06013 null
2025-07-08 DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations Nicholas Popovič et.al. 2507.05997 null
2025-07-08 Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval Haiwen Li et.al. 2507.05970 null
2025-07-08 Current Practices for Building LLM-Powered Reasoning Tools Are Ad Hoc – and We Can Do Better Aaron Bembenek et.al. 2507.05886 null
2025-07-08 KERAG_R: Knowledge-Enhanced Retrieval-Augmented Generation for Recommendation Zeyuan Meng et.al. 2507.05863 null
2025-07-08 Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models L’ea Dubois et.al. 2507.05822 null
2025-07-08 Creating a customisable freely-accessible Socratic AI physics tutor Eugenio Tufino et.al. 2507.05795 null
2025-07-08 LeAD: The LLM Enhanced Planning System Converged with End-to-end Autonomous Driving Yuhang Zhang et.al. 2507.05754 null
2025-07-08 ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark He Wang et.al. 2507.05727 null
2025-07-08 Large Language Models for Agent-Based Modelling: Current and possible uses across the modelling cycle Loïs Vanhée et.al. 2507.05723 null
2025-07-08 LLMs are Introvert Litian Zhang et.al. 2507.05638 null
2025-07-08 Flipping Knowledge Distillation: Leveraging Small Models’ Expertise to Enhance LLMs in Text Matching Mingzhe Li et.al. 2507.05617 null
2025-07-08 Structured Task Solving via Modular Embodied Intelligence: A Case Study on Rubik’s Cube Chongshan Fan et.al. 2507.05607 null
2025-07-08 MLlm-DR: Towards Explainable Depression Recognition with MultiModal Large Language Models Wei Zhang et.al. 2507.05591 null
2025-07-08 ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models Jiaxu Tian et.al. 2507.05568 null
2025-07-08 Enhancing Test-Time Scaling of Large Language Models with Hierarchical Retrieval-Augmented MCTS Alex ZH Dou et.al. 2507.05557 null
2025-07-08 An Ensemble Embedding Approach for Improving Semantic Caching Performance in LLM-based Systems Shervin Ghaffari et.al. 2507.07061 null
2025-07-08 Exploring Task Performance with Interpretable Models via Sparse Auto-Encoders Shun Wang et.al. 2507.06427 null
2025-07-08 Bridging AI and Software Security: A Comparative Vulnerability Assessment of LLM Agent Deployment Paradigms Tarek Gasmi et.al. 2507.06323 null
2025-07-08 Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate A. Bochkov et.al. 2507.07129 null
2025-07-08 “Amazing, They All Lean Left” – Analyzing the Political Temperaments of Current LLMs W. Russell Neuman et.al. 2507.08027 null
2025-07-07 Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages Samridhi Raj Sinha et.al. 2507.01853 null
2025-07-07 StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Meng Wei et.al. 2507.05240 null
2025-07-07 The Case for Instance-Optimized LLMs in OLAP Databases Bardia Mohammadi et.al. 2507.04967 null
2025-07-07 Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation Daichi Mukunoki et.al. 2507.04697 null
2025-07-07 Spatio-Temporal LLM: Reasoning about Environments and Actions Haozhen Zheng et.al. 2507.05258 null
2025-07-07 Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions Yuanzhe Hu et.al. 2507.05257 null
2025-07-07 Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Yana Wei et.al. 2507.05255 null
2025-07-07 From Fragments to Facts: A Curriculum-Driven DPO Approach for Generating Hindi News Veracity Explanations Pulkit Bansal et.al. 2507.05179 null
2025-07-07 CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale Jonathan Hyun et.al. 2507.05178 null
2025-07-07 VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots Danil S. Grigorev et.al. 2507.05118 null
2025-07-07 From Autonomy to Agency: Agentic Vehicles for Human-Centered Mobility Systems Jiangbo Yu et.al. 2507.04996 null
2025-07-07 MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction Kaleem Ullah Qasim et.al. 2507.04893 null
2025-07-07 Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations A. Bochkov et.al. 2507.04886 null
2025-07-07 FurniMAS: Language-Guided Furniture Decoration using Multi-Agent System Toan Nguyen et.al. 2507.04770 null
2025-07-07 ABench-Physics: Benchmarking Physical Reasoning in LLMs via High-Difficulty and Dynamic Physics Problems Yiming Zhang et.al. 2507.04766 null
2025-07-07 Large Language Models for Network Intrusion Detection Systems: Foundations, Implementations, and Future Directions Shuo Yang et.al. 2507.04752 null
2025-07-07 LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction Sungmin Lee et.al. 2507.04748 null
2025-07-07 Why We Feel What We Feel: Joint Detection of Emotions and Their Opinion Triggers in E-commerce Arnav Attri et.al. 2507.04708 null
2025-07-07 UrbanMind: Towards Urban General Intelligence via Tool-Enhanced Retrieval-Augmented Generation and Multilevel Optimization Kai Yang et.al. 2507.04706 null
2025-07-07 Trojan Horse Prompting: Jailbreaking Conversational Multimodal Models by Forging Assistant Message Wei Duan et.al. 2507.04673 null
2025-07-07 VectorLLM: Human-like Extraction of Structured Building Contours vis Multimodal LLMs Tao Zhang et.al. 2507.04664 null
2025-07-07 Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? Yun Qu et.al. 2507.04632 null
2025-07-07 Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences Yusong Zhang et.al. 2507.04621 null
2025-07-07 “Lost-in-the-Later”: Framework for Quantifying Contextual Grounding in Large Language Models Yufei Tao et.al. 2507.05424 null
2025-07-07 Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning Jaedong Hwang et.al. 2507.05418 null
2025-07-07 On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study Riccardo Alberghi et.al. 2507.05362 null
2025-07-07 MindFlow: Revolutionizing E-commerce Customer Support with Multimodal LLM Agents Ming Gong et.al. 2507.05330 null
2025-07-07 Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving Zhenwen Liang et.al. 2507.06804 null
2025-07-07 DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning Shreyas Vinaya Sathyanarayana et.al. 2507.07060 null
2025-07-07 Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding Nidhi Bhatia et.al. 2507.07120 null
2025-07-06 KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs Yuzhang Xie et.al. 2507.02773 null
2025-07-06 ESSA: Evolutionary Strategies for Scalable Alignment Daria Korotyshova et.al. 2507.04453 null
2025-07-06 SFOOD: A Multimodal Benchmark for Comprehensive Food Attribute Analysis Beyond RGB with Spectral Insights Zhenbo Xu et.al. 2507.04412 null
2025-07-06 LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers Jingze Zhu et.al. 2507.04404 null
2025-07-06 Computed Tomography Visual Question Answering with Cross-modal Feature Graphing Yuanhe Tian et.al. 2507.04333 null
2025-07-06 LearnLens: LLM-Enabled Personalised, Curriculum-Grounded Feedback with Educators in the Loop Runcong Zhao et.al. 2507.04295 null
2025-07-06 AutoLayout: Closed-Loop Layout Synthesis via Slow-Fast Collaborative Reasoning Weixing Chen et.al. 2507.04293 null
2025-07-06 M$^3$-Med: A Benchmark for Multi-lingual, Multi-modal, and Multi-hop Reasoning in Medical Instructional Video Understanding Shenxi Liu et.al. 2507.04289 null
2025-07-05 SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs Jiahui Wang et.al. 2506.05344 link
2025-07-05 SymbolicThought: Integrating Language Models and Symbolic Reasoning for Consistent and Interpretable Human Relationship Understanding Runcong Zhao et.al. 2507.04189 null
2025-07-05 From Legal Text to Tech Specs: Generative AI’s Interpretation of Consent in Privacy Law Aniket Kesari et.al. 2507.04185 null
2025-07-05 Dissecting Clinical Reasoning in Language Models: A Comparative Study of Prompts and Model Adaptation Strategies Mael Jullien et.al. 2507.04142 null
2025-07-05 A Technical Survey of Reinforcement Learning Techniques for Large Language Models Saksham Sahai Srivastava et.al. 2507.04136 null
2025-07-05 BYOKG-RAG: Multi-Strategy Graph Retrieval for Knowledge Graph Question Answering Costas Mavromatis et.al. 2507.04127 null
2025-07-05 Beyond Independent Passages: Adaptive Passage Combination Retrieval for Retrieval Augmented Open-Domain Question Answering Ting-Wen Ko et.al. 2507.04069 null
2025-07-05 LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language Models Gaurav Srivastava et.al. 2507.04023 null
2025-07-05 Nunchi-Bench: Benchmarking Language Models on Cultural Reasoning with a Focus on Korean Superstition Kyuhee Kim et.al. 2507.04014 null
2025-07-05 Toward Better Generalisation in Uncertainty Estimators: Leveraging Data-Agnostic Features Thuy An Ha et.al. 2507.03998 null
2025-07-05 CoT-Segmenter: Enhancing OOD Detection in Dense Road Scenes via Chain-of-Thought Reasoning Jeonghyo Song et.al. 2507.03984 null
2025-07-05 A Comparative Study of Specialized LLMs as Dense Retrievers Hengran Zhang et.al. 2507.03958 null
2025-07-05 CortexDebate: Debating Sparsely and Equally for Multi-Agent Debate Yiliu Sun et.al. 2507.03928 null
2025-07-05 A Survey on Proactive Defense Strategies Against Misinformation in Large Language Models Shuliang Liu et.al. 2507.05288 null
2025-07-04 Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought Tencent Hunyuan Team et.al. 2505.15431 null
2025-07-04 Economic Evaluation of LLMs Michael J. Zellinger et.al. 2507.03834 null
2025-07-04 Agent-Based Detection and Resolution of Incompleteness and Ambiguity in Interactions with Large Language Models Riya Naik et.al. 2507.03726 null
2025-07-04 Towards Machine Theory of Mind with Large Language Model-Augmented Inverse Planning Rebekah A. Gelpí et.al. 2507.03682 null
2025-07-04 Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs Valentina Wu et.al. 2507.03659 null
2025-07-04 EvoAgentX: An Automated Framework for Evolving Agentic Workflows Yingxu Wang et.al. 2507.03616 null
2025-07-04 Benchmarking Vector, Graph and Hybrid Retrieval Augmented Generation (RAG) Pipelines for Open Radio Access Networks (ORAN) Sarat Ahmad et.al. 2507.03608 null
2025-07-04 Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation Tao Tang et.al. 2507.03585 null
2025-07-04 AI-VaxGuide: An Agentic RAG-Based LLM for Vaccination Decisions Abdellah Zeggai et.al. 2507.03493 null
2025-07-04 REAL: Benchmarking Abilities of Large Language Models for Housing Transactions and Services Kexin Zhu et.al. 2507.03477 null
2025-07-04 ElliottAgents: A Natural Language-Driven Multi-Agent System for Stock Market Analysis and Prediction Jarosław A. Chudziak et.al. 2507.03435 null
2025-07-04 Graph Repairs with Large Language Models: An Empirical Study Hrishikesh Terdalkar et.al. 2507.03410 null
2025-07-04 Effects of structure on reasoning in instance-level Self-Discover Sachith Gunasekara et.al. 2507.03347 null
2025-07-04 Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky Ashutosh Hathidara et.al. 2507.03336 null
2025-07-04 Read Quietly, Think Aloud: Decoupling Comprehension and Reasoning in LLMs Yuanxin Wang et.al. 2507.03327 null
2025-07-04 LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents Anand Gokhale et.al. 2507.03293 null
2025-07-04 CodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMs Bruce Yang et.al. 2507.03254 null
2025-07-04 Efficient Knowledge Graph Construction and Retrieval from Unstructured Text for Large-Scale RAG Systems Congmin Min et.al. 2507.03226 null
2025-07-03 Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding Chengyue Wu et.al. 2505.22618 null
2025-07-03 Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Ziyue Li et.al. 2506.21551 null
2025-07-03 Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations Wenhao Wang et.al. 2507.01930 null
2025-07-03 Self-Guided Process Reward Optimization with Redefined Step-wise Advantage for Process Reinforcement Learning Wu Fei et.al. 2507.01551 null
2025-07-03 Symbolic or Numerical? Understanding Physics Problem Solving in Reasoning LLMs Nifu Dan et.al. 2507.01334 null
2025-07-03 Mixture of Reasonings: Teach Large Language Models to Reason with Adaptive Strategies Tao Xiong et.al. 2507.00606 null
2025-07-03 OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding Ramchalam Kinattinkara Ramakrishnan et.al. 2507.02659 null
2025-07-03 Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation Jiaer Xia et.al. 2507.02859 null
2025-07-03 MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs Purbesh Mitra et.al. 2507.02851 null
2025-07-03 StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason Kaiyi Zhang et.al. 2507.02841 null
2025-07-03 SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model Wencheng Zhang et.al. 2507.02822 null
2025-07-03 Multimodal Mathematical Reasoning with Diverse Solving Perspective Wenhao Shi et.al. 2507.02804 null
2025-07-03 Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models Riccardo Cantini et.al. 2507.02799 null
2025-07-03 Moral Responsibility or Obedience: What Do We Want from AI? Joseph Boland et.al. 2507.02788 null
2025-07-03 Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs Ken Tsui et.al. 2507.02778 null
2025-07-03 Knowledge Protocol Engineering: A New Paradigm for AI in Domain-Specific Knowledge Work Guangwei Zhang et.al. 2507.02760 null
2025-07-03 Early Signs of Steganographic Capabilities in Frontier LLMs Artur Zolkowski et.al. 2507.02737 null
2025-07-03 Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving Matthieu Zimmer et.al. 2507.02726 null
2025-07-03 Control at Stake: Evaluating the Security Landscape of LLM-Driven Email Agents Jiangrong Wu et.al. 2507.02699 null
2025-07-03 VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning Siran Chen et.al. 2507.02626 null
2025-07-03 Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory Kenneth Payne et.al. 2507.02618 null
2025-07-03 DynamiCare: A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making Tianqi Shang et.al. 2507.02616 null
2025-07-03 WebSailor: Navigating Super-human Reasoning for Web Agent Kuan Li et.al. 2507.02592 null
2025-07-03 Clarifying Before Reasoning: A Coq Prover with Structural Context Yanzhen Lu et.al. 2507.02541 null
2025-07-03 CyberRAG: An agentic RAG cyber attack classification and reporting tool Francesco Blefari et.al. 2507.02424 null
2025-07-03 OMS: On-the-fly, Multi-Objective, Self-Reflective Ad Keyword Generation via LLM Agent Bowen Chen et.al. 2507.02353 null
2025-07-03 Misaligned from Within: Large Language Models Reproduce Our Double-Loop Learning Blindness Tim Rogers et.al. 2507.02283 null
2025-07-03 Uncertainty-aware Reward Design Process Yang Yang et.al. 2507.02256 null
2025-07-03 Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation Jungkoo Kang et.al. 2507.02253 null
2025-07-03 HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference Weishu Deng et.al. 2507.03153 null
2025-07-03 RCA Copilot: Transforming Network Data into Actionable Insights via Large Language Models Alexander Shan et.al. 2507.03224 null
2025-07-03 MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks Dumitran Adrian Marius et.al. 2507.03162 null
2025-07-03 ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models Boyang Xue et.al. 2507.03133 null
2025-07-03 RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents Peisong Wang et.al. 2507.03112 null
2025-07-03 Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization Marco Simoni et.al. 2507.03051 null
2025-07-03 Counterfactual Tuning for Temporal Sensitivity Enhancement in Large Language Model-based Recommendation Yutian Liu et.al. 2507.03047 null
2025-07-02 Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU He Sun et.al. 2506.20187 null
2025-07-02 EdgeLoRA: An Efficient Multi-Tenant LLM Serving System on Edge Devices Zheyu Shen et.al. 2507.01438 null
2025-07-02 The Thin Line Between Comprehension and Persuasion in LLMs Adrian de Wynter et.al. 2507.01936 null
2025-07-02 AI4Research: A Survey of Artificial Intelligence for Scientific Research Qiguang Chen et.al. 2507.01903 null
2025-07-02 MiCoTA: Bridging the Learnability Gap with Intermediate CoT and Teacher Assistants Dongyi Ding et.al. 2507.01887 null
2025-07-02 Bridging UI Design and chatbot Interactions: Applying Form-Based Principles to Conversational Agents Sanjay Krishna Anbalagan et.al. 2507.01862 null
2025-07-02 Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training Ismail Labiad et.al. 2507.01752 null
2025-07-02 Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture Bochen Han et.al. 2507.01701 null
2025-07-02 Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling Zeyu Huang et.al. 2507.01679 null
2025-07-02 Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems Zhaoyan Sun et.al. 2507.01599 null
2025-07-02 Is External Information Useful for Stance Detection with LLMs? Quang Minh Nguyen et.al. 2507.01543 null
2025-07-02 SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism Beitao Chen et.al. 2507.01513 null
2025-07-02 Agent-as-Tool: A Study on the Hierarchical Decision Making with Reinforcement Learning Yanfei Zhang et.al. 2507.01489 null
2025-07-02 BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments Yibo Qiu et.al. 2507.01485 null
2025-07-02 A Large Language Model for Chemistry and Retrosynthesis Predictions Yueqing Zhang et.al. 2507.01444 null
2025-07-02 RALLY: Role-Adaptive LLM-Driven Yoked Navigation for Agentic UAV Swarms Ziyao Wang et.al. 2507.01378 null
2025-07-02 AI Agents and Agentic AI-Navigating a Plethora of Concepts for Future Manufacturing Yinwang Ren et.al. 2507.01376 null
2025-07-02 Beyond Black-Box AI: Interpretable Hybrid Systems for Dementia Care Matthew JY Kang et.al. 2507.01282 null
2025-07-02 Evaluating Large Language Models for Multimodal Simulated Ophthalmic Decision-Making in Diabetic Retinopathy and Glaucoma Screening Cindy Lie Tabuse et.al. 2507.01278 null
2025-07-02 Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess Dongyoon Hwang et.al. 2507.00726 null
2025-07-02 $μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation Siyou Li et.al. 2507.00316 null
2025-07-02 Data Diversification Methods In Alignment Enhance Math Performance In LLMs Berkan Dokmeci et.al. 2507.02173 null
2025-07-02 Synergizing Logical Reasoning, Knowledge Management and Collaboration in Multi-Agent LLM System Adam Kostka et.al. 2507.02170 null
2025-07-02 Reasoning or Not? A Comprehensive Evaluation of Reasoning LLMs for Dialogue Summarization Keyan Jin et.al. 2507.02145 null
2025-07-02 Structural Code Search using Natural Language Queries Ben Limpanukorn et.al. 2507.02107 null
2025-07-02 Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs Mohammad Ali Alomrani et.al. 2507.02076 null
2025-07-02 Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges Sanjeda Akter et.al. 2507.02074 null
2025-07-01 SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Bo Liu et.al. 2506.24119 null
2025-07-01 PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning Xingke Yang et.al. 2507.01216 null
2025-07-01 FlashDP: Private Training Large Language Models with Efficient DP-SGD Liangyu Wang et.al. 2507.01154 null
2025-07-01 VEDA: Efficient LLM Generation Through Voting-based KV Cache Eviction and Dataflow-flexible Accelerator Zhican Wang et.al. 2507.00797 null
2025-07-01 EARN: Efficient Inference Acceleration for LLM-based Generative Recommendation by Register Tokens Chaoqun Yang et.al. 2507.00715 null
2025-07-01 Reasoning as an Adaptive Defense for Safety Taeyoun Kim et.al. 2507.00971 null
2025-07-01 Large Language Model Powered Intelligent Urban Agents: Concepts, Capabilities, and Applications Jindong Han et.al. 2507.00914 null
2025-07-01 Mathematics Isn’t Culture-Free: Probing Cultural Gaps via Entity and Scenario Perturbations Aditya Tomar et.al. 2507.00883 null
2025-07-01 HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning Zhi Jing et.al. 2507.00833 null
2025-07-01 ProxAnn: Use-Oriented Evaluations of Topic Models and Document Clustering Alexander Hoyle et.al. 2507.00828 null
2025-07-01 Many LLMs Are More Utilitarian Than One Anita Keshmirian et.al. 2507.00814 null
2025-07-01 Language-Unlocked ViT (LUViT): Empowering Self-Supervised Vision Transformers with LLMs Selim Kuzucu et.al. 2507.00754 null
2025-07-01 AI Analyst: Framework and Comprehensive Evaluation of Large Language Models for Financial Time Series Report Generation Elizabeth Fons et.al. 2507.00718 null
2025-07-01 Large Reasoning Models are not thinking straight: on the unreliability of thinking trajectories Jhouben Cuesta-Ramirez et.al. 2507.00711 null
2025-07-01 Toward Edge General Intelligence with Multiple-Large Language Model (Multi-LLM): Architecture, Trust, and Orchestration Haoxiang Luo et.al. 2507.00672 null
2025-07-01 Cognitive Load-Aware Inference: A Neuro-Symbolic Framework for Optimizing the Token Economy of Large Language Models Yilun Zhang et.al. 2507.00653 null
2025-07-01 ChatHLS: Towards Systematic Design Automation and Optimization for High-Level Synthesis Runkai Li et.al. 2507.00642 null
2025-07-01 Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Maggie Huan et.al. 2507.00432 null
2025-07-01 ASTRO: Teaching Language Models to Reason by Reflecting and Backtracking In-Context Joongwon Kim et.al. 2507.00417 null
2025-07-01 Causal Prompting for Implicit Sentiment Analysis with Large Language Models Jing Ren et.al. 2507.00389 null
2025-07-01 STELLA: Self-Evolving LLM Agent for Biomedical Research Ruofan Jin et.al. 2507.02004 null
2025-07-01 Dynamic Strategy Adaptation in Multi-Agent Environments with Large Language Models Shaurya Mallampati et.al. 2507.02002 null
2025-06-30 RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference Yaoqi Chen et.al. 2505.02922 null
2025-06-30 The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements Bingchen Zhao et.al. 2506.22419 null
2025-06-30 EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework Chen Wang et.al. 2506.22200 null
2025-06-30 Large Language Models Don’t Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective Anselm R. Strohmaier et.al. 2506.24006 null
2025-06-30 Advancing Multi-Step Mathematical Reasoning in Large Language Models through Multi-Layered Self-Reflection with Auto-Prompting André de Souza Loureiro et.al. 2506.23888 null
2025-06-30 Garbage In, Reasoning Out? Why Benchmark Scores are Unreliable and What to Do About It Seyed Mahed Mousavi et.al. 2506.23864 null
2025-06-30 A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents Hang Su et.al. 2506.23844 null
2025-06-30 DABstep: Data Agent Benchmark for Multi-step Reasoning Alex Egg et.al. 2506.23719 null
2025-06-30 If You Had to Pitch Your Ideal Software – Evaluating Large Language Models to Support User Scenario Writing for User Experience Experts and Laypersons Patrick Stadler et.al. 2506.23694 null
2025-06-30 PokéAI: A Goal-Generating, Battle-Optimizing Multi-agent System for Pokemon Red Zihao Liu et.al. 2506.23689 null
2025-06-30 Interactive Reasoning: Visualizing and Controlling Chain-of-Thought Reasoning in Large Language Models Rock Yuren Pang et.al. 2506.23678 null
2025-06-30 Act-With-Think: Chunk Auto-Regressive Modeling for Generative Recommendation Yifan Wang et.al. 2506.23643 null
2025-06-30 What to Keep and What to Drop: Adaptive Table Filtering Framework Jang Won June et.al. 2506.23463 null
2025-06-30 Two-Stage Reasoning-Infused Learning: Improving Classification with LLM-Generated Reasoning Mads Henrichsen et.al. 2507.00214 null
2025-06-30 Thinking About Thinking: SAGE-nano’s Inverse Reasoning for Self-Aware Language Models Basab Jha et.al. 2507.00092 null
2025-06-30 State and Memory is All You Need for Robust and Reliable AI Agents Matthew Muhoberac et.al. 2507.00081 null
2025-06-29 Comparative Evaluation of ChatGPT and DeepSeek Across Key NLP Tasks: Strengths, Weaknesses, and Domain-Specific Performance Wael Etaiwi et.al. 2506.18501 null
2025-06-29 Do LLMs Dream of Discrete Algorithms? Claudionor Coelho Jr et.al. 2506.23408 null
2025-06-29 GeoProg3D: Compositional Visual Reasoning for City-Scale 3D Language Fields Shunsuke Yasuki et.al. 2506.23352 null
2025-06-29 Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games David Guzman Piedrahita et.al. 2506.23276 null
2025-06-29 Predicting thinking time in Reasoning models Hans Peter Lynsgøe Raaschou-jensen et.al. 2506.23274 null
2025-06-29 Token Activation Map to Visually Explain Multimodal LLMs Yi Li et.al. 2506.23270 null
2025-06-29 Benchmarking Deep Search over Heterogeneous Enterprise Data Prafulla Kumar Choubey et.al. 2506.23139 null
2025-06-29 Format-Adapter: Improving Reasoning Capability of LLMs by Adapting Suitable Format Dingzirui Wang et.al. 2506.23133 null
2025-06-29 Are Large Language Models Capable of Deep Relational Reasoning? Insights from DeepSeek-R1 and Benchmark Comparisons Chi Chiu So et.al. 2506.23128 null
2025-06-29 Decoding Memes: Benchmarking Narrative Role Classification across Multilingual and Multimodal Models Shivam Sharma et.al. 2506.23122 null
2025-06-29 Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation Zhenhua Ning et.al. 2506.23120 null
2025-06-29 Repair Ingredients Are All You Need: Improving Large Language Model-Based Program Repair via Repair Ingredients Search Jiayi Zhang et.al. 2506.23100 null
2025-06-29 Boosting LLM’s Molecular Structure Elucidation with Knowledge Enhanced Tree Search Reasoning Xiang Zhuang et.al. 2506.23056 null
2025-06-29 AURA: Agent for Understanding, Reasoning, and Automated Tool Use in Voice-Driven Tasks Leander Melroy Maben et.al. 2506.23049 null
2025-06-28 Efficiently Serving Large Multimodal Models Using EPD Disaggregation Gursimran Singh et.al. 2501.05460 link
2025-06-28 Agent-to-Agent Theory of Mind: Testing Interlocutor Awareness among Large Language Models Younwoo Choi et.al. 2506.22957 null
2025-06-28 Evaluating and Improving Large Language Models for Competitive Program Generation Minnan Wei et.al. 2506.22954 null
2025-06-28 Improving Rationality in the Reasoning Process of Language Models through Self-playing Game Pinzheng Wang et.al. 2506.22920 null
2025-06-28 ReasonBridge: Efficient Reasoning Transfer from Closed to Open-Source Language Models Ziqi Zhong et.al. 2506.22865 null
2025-06-28 Prompting without Panic: Attribute-aware, Zero-shot, Test-Time Calibration Ramya Hebbalaguppe et.al. 2506.22819 null
2025-06-27 Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference Yaohua Tang et.al. 2502.15294 null
2025-06-27 Dynamic Knowledge Exchange and Dual-diversity Review: Concisely Unleashing the Potential of a Multi-Agent Research Team Weilun Yu et.al. 2506.18348 null
2025-06-27 SegChange-R1: LLM-Augmented Remote Sensing Change Detection Fei Zhou et.al. 2506.17944 null
2025-06-27 KunLunBaizeRAG: Reinforcement Learning Driven Inference Performance Leap for Large Language Models Cheng Li et.al. 2506.19466 null
2025-06-27 QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization Danush Khanna et.al. 2506.22396 null
2025-06-27 SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference Yongchao He et.al. 2506.22033 null
2025-06-27 A Survey of LLM Inference Systems James Pan et.al. 2506.21901 null
2025-06-27 Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment Yue Zhang et.al. 2506.22385 null
2025-06-27 Probabilistic Optimality for Inference-time Scaling Youkang Wang et.al. 2506.22376 null
2025-06-27 Concept-Level AI for Telecom: Moving Beyond Large Language Models Viswanath Kumarskandpriya et.al. 2506.22359 null
2025-06-27 Training Language Model to Critique for Better Refinement Tianshu Yu et.al. 2506.22157 null
2025-06-27 Lost at the Beginning of Reasoning Baohao Liao et.al. 2506.22058 null
2025-06-27 LMPVC and Policy Bank: Adaptive voice control for industrial robots with code generating LLMs and reusable Pythonic policies Ossi Parikka et.al. 2506.22028 null
2025-06-27 Literature-Grounded Novelty Assessment of Scientific Ideas Simra Shahid et.al. 2506.22026 null
2025-06-27 More Vulnerable than You Think: On the Stability of Tool-Integrated LLM Agents Weimin Xiong et.al. 2506.21967 null
2025-06-27 CAL-RAG: Retrieval-Augmented Multi-Agent Generation for Content-Aware Layout Design Najmeh Forouzandehmehr et.al. 2506.21934 null
2025-06-27 ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation Reza Yousefi Maragheh et.al. 2506.21931 null
2025-06-27 SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding Zhao Jin et.al. 2506.21924 null
2025-06-27 URSA: The Universal Research and Scientific Agent Michael Grosskopf et.al. 2506.22653 null
2025-06-27 ReCo: Reminder Composition Mitigates Hallucinations in Vision-Language Models Sotirios Panagiotis Chytas et.al. 2506.22636 null
2025-06-27 The Hidden Link Between RLHF and Contrastive Learning Xufei Lv et.al. 2506.22578 null
2025-06-27 MetaCipher: A General and Extensible Reinforcement Learning Framework for Obfuscation-Based Jailbreak Attacks on Black-Box LLMs Boyuan Chen et.al. 2506.22557 null
2025-06-26 From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents Weizhi Zhang et.al. 2506.18959 null
2025-06-26 Enhancing User Engagement in Socially-Driven Dialogue through Interactive LLM Alignments Jiashuo Wang et.al. 2506.21497 null
2025-06-26 Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning Xin Xu et.al. 2506.21285 null
2025-06-26 HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context Qize Yang et.al. 2506.21277 null
2025-06-26 Complexity-aware fine-tuning Andrey Goncharov et.al. 2506.21220 null
2025-06-26 Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? Haoang Chi et.al. 2506.21215 null
2025-06-26 $T^3$: Multi-level Tree-based Automatic Program Repair with Large Language Models Quanming Liu et.al. 2506.21211 null
2025-06-26 MT2-CSD: A New Dataset and Multi-Semantic Knowledge Fusion Method for Conversational Stance Detection Fuqiang Niu et.al. 2506.21053 null
2025-06-26 Large Language Models Acing Chartered Accountancy Jatin Gupta et.al. 2506.21031 null
2025-06-26 STEP Planner: Constructing cross-hierarchical subgoal tree as an embodied long-horizon task planner Zhou Tianxing et.al. 2506.21030 null
2025-06-26 LLM-guided Chemical Process Optimization with a Multi-Agent Approach Tong Zeng et.al. 2506.20921 null
2025-06-26 FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing Advait Gupta et.al. 2506.20911 null
2025-06-26 Evaluating List Construction and Temporal Understanding capabilities of Large Language Models Alexandru Dumitru et.al. 2506.21783 null
2025-06-26 THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning? Xin Wang et.al. 2506.21763 null
2025-06-26 Hierarchical Reasoning Model Guan Wang et.al. 2506.21734 null
2025-06-26 SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents Wanxin Tian et.al. 2506.21669 null
2025-06-26 APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization Minjie Hong et.al. 2506.21655 null
2025-06-26 Weak-to-Strong GraphRAG: Aligning Weak Retrievers with Large Language Models for Graph-based Retrieval Augmented Generation Deyu Zou et.al. 2506.22518 null
2025-06-25 No Free Lunch: Rethinking Internal Feedback for LLM Reasoning Yanzhi Zhang et.al. 2506.17219 null
2025-06-25 Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning Lixin Wu et.al. 2506.18330 null
2025-06-25 Thought Anchors: Which LLM Reasoning Steps Matter? Paul C. Bogdan et.al. 2506.19143 null
2025-06-25 Semantic Caching for Improving Web Affordability Hafsa Akbar et.al. 2506.20420 null
2025-06-25 Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs Sonia K. Murthy et.al. 2506.20666 null
2025-06-25 The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind Andrei Lupu et.al. 2506.20664 null
2025-06-25 Memento: Note-Taking for Your Future Self Chao Wan et.al. 2506.20642 null
2025-06-25 Video Perception Models for 3D Scene Synthesis Rui Huang et.al. 2506.20601 null
2025-06-25 Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios Wenbin Gan et.al. 2506.20531 null
2025-06-25 Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards Charles Arnal et.al. 2506.20520 null
2025-06-25 ReCode: Updating Code API Knowledge with Reinforcement Learning Haoze Wu et.al. 2506.20495 null
2025-06-25 Generative AI for Vulnerability Detection in 6G Wireless Networks: Advances, Case Study, and Future Directions Shuo Yang et.al. 2506.20488 null
2025-06-25 Automatic Demonstration Selection for LLM-based Tabular Data Classification Shuchu Han et.al. 2506.20451 null
2025-06-25 An Agentic System for Rare Disease Diagnosis with Traceable Reasoning Weike Zhao et.al. 2506.20430 null
2025-06-25 SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models Dipayan Saha et.al. 2506.20415 null
2025-06-25 Tabular Feature Discovery With Reasoning Type Exploration Sungwon Han et.al. 2506.20357 null
2025-06-25 Enterprise Large Language Model Evaluation Benchmark Liya Wang et.al. 2506.20274 null
2025-06-25 Enhancing Large Language Models through Structured Reasoning Yubo Dong et.al. 2506.20241 null
2025-06-25 SEED: A Structural Encoder for Embedding-Driven Decoding in Time Series Prediction with LLMs Fengze Li et.al. 2506.20167 null
2025-06-25 A Modular Multitask Reasoning Framework Integrating Spatio-temporal Models and LLMs Kethmi Hirushini Hettige et.al. 2506.20073 null
2025-06-25 Omniwise: Predicting GPU Kernels Performance with LLMs Zixian Wang et.al. 2506.20886 null
2025-06-25 Uncovering Hidden Violent Tendencies in LLMs: A Demographic Analysis via Behavioral Vignettes Quintin Myers et.al. 2506.20822 null
2025-06-25 MultiFinRAG: An Optimized Multimodal Retrieval-Augmented Generation (RAG) Framework for Financial Question Answering Chinmay Gondhalekar et.al. 2506.20821 null
2025-06-25 Dynamic Context-Aware Prompt Recommendation for Domain-Specific AI Applications Xinye Tang et.al. 2506.20815 null
2025-06-25 Towards Probabilistic Question Answering Over Tabular Data Chen Shen et.al. 2506.20747 null
2025-06-25 Test-time Scaling Techniques in Theoretical Physics – A Comparison of Methods on the TPBench Dataset Zhiqi Gao et.al. 2506.20729 null
2025-06-24 ReDit: Reward Dithering for Improved LLM Policy Optimization Chenxing Wei et.al. 2506.18631 null
2025-06-24 Understanding Reasoning in Thinking Language Models via Steering Vectors Constantin Venhoff et.al. 2506.18167 null
2025-06-24 KAG-Thinker: Interactive Thinking and Deep Reasoning in LLMs via Knowledge-Augmented Generation Dalong Zhang et.al. 2506.17728 null
2025-06-24 AnTKV: Anchor Token-Aware Sub-Bit Vector Quantization for KV Cache in Large Language Models Zeyu Li et.al. 2506.19505 null
2025-06-24 Mem4Nav: Boosting Vision-and-Language Navigation in Urban Environments with a Hierarchical Spatial-Cognition Long-Short Memory System Lixuan He et.al. 2506.19433 null
2025-06-24 JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning Ai Han et.al. 2506.19846 null
2025-06-24 MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration Yucheng Zhou et.al. 2506.19835 null
2025-06-24 KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality Baochang Ren et.al. 2506.19807 null
2025-06-24 KnowML: Improving Generalization of ML-NIDS with Attack Knowledge Graphs Xin Fan Guo et.al. 2506.19802 null
2025-06-24 Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study Yuqi Zhu et.al. 2506.19794 null
2025-06-24 Automatic Prompt Optimization for Knowledge Graph Construction: Insights from an Empirical Study Nandana Mihindukulasooriya et.al. 2506.19773 null
2025-06-24 SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning Yuqian Fu et.al. 2506.19767 null
2025-06-24 Breaking Barriers: Do Reinforcement Post Training Gains Transfer To Unseen Domains? Chuxuan Hu et.al. 2506.19733 null
2025-06-24 ECCoT: A Framework for Enhancing Effective Cognition via Chain of Thought in Large Language Model Zhenke Duan et.al. 2506.19599 null
2025-06-24 KnowMap: Efficient Knowledge-Driven Task Adaptation for LLMs Kelin Fu et.al. 2506.19527 null
2025-06-24 Commonsense Generation and Evaluation for Dialogue Systems using Large Language Models Marcos Estecha-Garitagoitia et.al. 2506.19483 null
2025-06-24 Can Large Language Models Capture Human Annotator Disagreements? Jingwei Ni et.al. 2506.19467 null
2025-06-24 RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1 Yu Xie et.al. 2506.19235 null
2025-06-24 Augmenting Multi-Agent Communication with State Delta Trajectory Yichen Tang et.al. 2506.19209 null
2025-06-24 Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning Saloni Dash et.al. 2506.20020 null
2025-06-24 Inference Scaled GraphRAG: Improving Multi Hop Question Answering on Knowledge Graphs Travis Thompson et.al. 2506.19967 null
2025-06-24 Prover Agent: An Agent-based Framework for Formal Mathematical Proofs Kaito Baba et.al. 2506.19923 null
2025-06-23 RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding Guanzheng Chen et.al. 2502.20330 link
2025-06-23 RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought Junbo Qiao et.al. 2506.16796 link
2025-06-23 SLR: An Automated Synthesis Framework for Scalable Logical Reasoning Lukas Helff et.al. 2506.15787 null
2025-06-23 CommVQ: Commutative Vector Quantization for KV Cache Compression Junyan Li et.al. 2506.18879 null
2025-06-23 ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs Jiaru Zou et.al. 2506.18896 null
2025-06-23 OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization Yiyou Sun et.al. 2506.18880 null
2025-06-23 LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Yuhao Wu et.al. 2506.18841 null
2025-06-23 Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories Islem Bouzenia et.al. 2506.18824 null
2025-06-23 Existing LLMs Are Not Self-Consistent For Simple Tasks Zhenru Lin et.al. 2506.18781 null
2025-06-23 Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training Jonathan Cook et.al. 2506.18777 null
2025-06-23 MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis Yuting Zhang et.al. 2506.18512 null
2025-06-23 MeRF: Motivation-enhanced Reinforcement Finetuning for Large Reasoning Models Junjie Zhang et.al. 2506.18485 null
2025-06-23 TReB: A Comprehensive Benchmark for Evaluating Table Reasoning Capabilities of Large Language Models Ce Li et.al. 2506.18421 null
2025-06-23 Evaluating Causal Explanation in Medical Reports with LLM-Based and Human-Aligned Metrics Yousang Cho et.al. 2506.18387 null
2025-06-23 LOGICPO: Efficient Translation of NL-based Logical Problems to FOL using LLMs and Preference Optimization Koushik Viswanadha et.al. 2506.18383 null
2025-06-23 Less Data Less Tokens: Multilingual Unification Learning for Efficient Test-Time Reasoning in LLMs Kang Chen et.al. 2506.18341 null
2025-06-23 TranslationCorrect: A Unified Framework for Machine Translation Post-Editing with Predictive Error Assistance Syed Mekael Wasti et.al. 2506.18337 null
2025-06-23 LLM-Integrated Digital Twins for Hierarchical Resource Allocation in 6G Networks Majumder Haider et.al. 2506.18293 null
2025-06-23 RLPR: Extrapolating RLVR to General Domains without Verifiers Tianyu Yu et.al. 2506.18254 null
2025-06-23 Distilling Tool Knowledge into Language Models via Back-Translated Traces Xingyue Huang et.al. 2506.19171 null
2025-06-23 Command-V: Pasting LLM Behaviors via Activation Profiles Barry Wang et.al. 2506.19140 null
2025-06-23 Human-Aligned Faithfulness in Toxicity Explanations of LLMs Ramaravind K. Mothilal et.al. 2506.19113 null
2025-06-23 Baba is LLM: Reasoning in a Game with Dynamic Rules Fien van Wetten et.al. 2506.19095 null
2025-06-23 Language Models Might Not Understand You: Evaluating Theory of Mind via Story Prompting Nathaniel Getachew et.al. 2506.19089 null
2025-06-23 MFTCXplain: A Multilingual Benchmark Dataset for Evaluating the Moral Reasoning of LLMs through Hate Speech Multi-hop Explanation Jackson Trager et.al. 2506.19073 null
2025-06-23 Mirage of Mastery: Memorization Tricks LLMs into Artificially Inflated Self-Knowledge Sahil Kale et.al. 2506.18998 null
2025-06-23 SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications Jinyang Li et.al. 2506.18951 null
2025-06-22 Integrating LLMs and Digital Twins for Adaptive Multi-Robot Task Allocation in Construction Min Deng et.al. 2506.18178 null
2025-06-22 Programming Quantum Computers with Large Language Models Elena R. Henderson et.al. 2506.18125 null
2025-06-22 Mental Health Equity in LLMs: Leveraging Multi-Hop Question Answering to Detect Amplified and Silenced Perspectives Batool Haider et.al. 2506.18116 null
2025-06-22 InspireDebate: Multi-Dimensional Subjective-Objective Evaluation-Guided Reasoning and Optimization for Debating Fuyu Wang et.al. 2506.18102 null
2025-06-22 Deep Research Agents: A Systematic Examination And Roadmap Yuxuan Huang et.al. 2506.18096 null
2025-06-22 Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective Jianyu Wang et.al. 2506.17930 null
2025-06-22 Leveraging Large Language Model for Intelligent Log Processing and Autonomous Debugging in Cloud AI Platforms Cheng Ji et.al. 2506.17900 null
2025-06-22 How Alignment Shrinks the Generative Horizon Chenghao Yang et.al. 2506.17871 null
2025-06-21 Bayesian Social Deduction with Graph-Informed Language Models Shahab Rahimirad et.al. 2506.17788 null
2025-06-21 PAGENT: Learning to Patch Software Engineering Agents Haoran Xue et.al. 2506.17772 null
2025-06-21 Towards a Unified Textual Graph Framework for Spectral Reasoning via Physical and Chemical Information Fusion Jiheng Liang et.al. 2506.17761 null
2025-06-21 Resource-Friendly Dynamic Enhancement Chain for Multi-Hop Question Answering Binquan Ji et.al. 2506.17692 null
2025-06-21 Measuring and Augmenting Large Language Models for Solving Capture-the-Flag Challenges Zimo Ji et.al. 2506.17644 null
2025-06-21 Answer-Centric or Reasoning-Driven? Uncovering the Latent Memory Anchor in LLMs Yang Wu et.al. 2506.17630 null
2025-06-21 CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning Kailing Li et.al. 2506.17629 null
2025-06-21 Scene-R1: Video-Grounded Large Language Models for 3D Scene Reasoning without 3D Annotations Zhihao Yuan et.al. 2506.17545 null
2025-06-21 DuaShepherd: Integrating Stepwise Correctness and Potential Rewards for Mathematical Reasoning Yuanhao Wu et.al. 2506.17533 null
2025-06-21 Do LLMs Know When to Flip a Coin? Strategic Randomization through Reasoning and Experience Lingyu Yang et.al. 2506.18928 null
2025-06-20 Domain Specific Benchmarks for Evaluating Multimodal Large Language Models Khizar Anjum et.al. 2506.12958 null
2025-06-20 Towards AI Search Paradigm Yuchen Li et.al. 2506.17188 null
2025-06-20 When Can Model-Free Reinforcement Learning be Enough for Thinking? Josiah P. Hanna et.al. 2506.17124 null
2025-06-20 Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving Chuxue Cao et.al. 2506.17104 null
2025-06-20 Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation Jiahao Cheng et.al. 2506.17088 null
2025-06-20 Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs Ricardo Rei et.al. 2506.17080 null
2025-06-20 From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers Jingtong Su et.al. 2506.17052 null
2025-06-20 Latent Concept Disentanglement in Transformer-based Language Models Guan Zhe Hong et.al. 2506.16975 null
2025-06-20 LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation Tongtian Yue et.al. 2506.16691 null
2025-06-20 Distilling On-device Language Models for Robot Planning with Minimal Human Intervention Zachary Ravichandran et.al. 2506.17486 null
2025-06-20 Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? Mingyuan Wu et.al. 2506.17417 null
2025-06-19 Serving Large Language Models on Huawei CloudMatrix384 Pengfei Zuo et.al. 2506.12708 null
2025-06-19 MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation Xueqing Peng et.al. 2506.14028 null
2025-06-19 LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning Haoyue Zhang et.al. 2506.15969 null
2025-06-19 SemAgent: A Semantics Aware Program Repair Agent Anvith Pabba et.al. 2506.16650 null
2025-06-19 LLM-based Satisfiability Checking of String Requirements by Consistent Data and Checker Generation Boqi Chen et.al. 2506.16639 null
2025-06-19 Robust Reward Modeling via Causal Rubrics Pragya Srivastava et.al. 2506.16507 null
2025-06-19 SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity Samir Khaki et.al. 2506.16500 null
2025-06-19 ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning Zexi Liu et.al. 2506.16499 null
2025-06-19 Grounding Language Models with Semantic Digital Twins for Robotic Planning Mehreen Naeem et.al. 2506.16493 null
2025-06-19 How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering? Giuseppe Lando et.al. 2506.16450 null
2025-06-19 Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights Zhiyuan Liang et.al. 2506.16406 null
2025-06-19 TrajSceneLLM: A Multimodal Perspective on Semantic GPS Trajectory Analysis Chunhou Ji et.al. 2506.16401 link
2025-06-19 OJBench: A Competition Level Code Benchmark For Large Language Models Zhexu Wang et.al. 2506.16395 null
2025-06-19 From LLM-anation to LLM-orchestrator: Coordinating Small Models for Data Labeling Yao Lu et.al. 2506.16393 null
2025-06-19 RiOT: Efficient Prompt Refinement with Residual Optimization Tree Chenyi Zhou et.al. 2506.16389 link
2025-06-19 Large Language Models in Argument Mining: A Survey Hao Li et.al. 2506.16383 null
2025-06-19 SHREC and PHEONA: Using Large Language Models to Advance Next-Generation Computational Phenotyping Sarah Pungitore et.al. 2506.16359 null
2025-06-19 Explainable Rule Application via Structured Prompting: A Neural-Symbolic Approach Albert Sadowski et.al. 2506.16335 link
2025-06-19 SGIC: A Self-Guided Iterative Calibration Framework for RAG Guanhua Chen et.al. 2506.16172 null
2025-06-19 Under the Shadow of Babel: How Language Shapes Reasoning in LLMs Chenxi Wang et.al. 2506.16151 null
2025-06-19 GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning Yi Chen et.al. 2506.16141 link
2025-06-19 Seeing is Fixing: Cross-Modal Reasoning with Multimodal LLMs for Visual Software Issue Fixing Kai Huang et.al. 2506.16136 null
2025-06-19 AutoV: Learning to Retrieve Visual Prompt for Large Vision-Language Models Yuan Zhang et.al. 2506.16112 null
2025-06-19 Human-Centered Shared Autonomy for Motor Planning, Learning, and Control Applications MH Farhadi et.al. 2506.16044 null
2025-06-19 DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling Fei Wang et.al. 2506.16043 null
2025-06-19 SimuPanel: A Novel Immersive Multi-Agent System to Simulate Interactive Expert Panel Discussion Xiangyang He et.al. 2506.16010 null
2025-06-19 Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases Yubeen Bae et.al. 2506.17336 link
2025-06-19 LMR-BENCH: Evaluating LLM Agent’s Ability on Reproducing Language Modeling Research Shuo Yan et.al. 2506.17335 null
2025-06-19 Large Language Models for Spreadsheets: Benchmarking Progress and Evaluating Performance with FLARE Simon Thorne et.al. 2506.17330 null
2025-06-18 Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations Amey Agrawal et.al. 2409.17264 null
2025-06-18 Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs Ling Team et.al. 2506.14731 null
2025-06-18 AIn’t Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation Leah von der Heyde et.al. 2506.14634 null
2025-06-18 Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models Chenchen Yuan et.al. 2506.14625 link
2025-06-18 eLLM: Elastic Memory Management Framework for Efficient LLM Serving Jiale Xu et.al. 2506.15155 null
2025-06-18 CC-LEARN: Cohort-based Consistency Learning Xiao Ye et.al. 2506.15662 null
2025-06-18 Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability Yusuke Sakai et.al. 2506.15629 null
2025-06-18 Managing Complex Failure Analysis Workflows with LLM-based Reasoning and Acting Agents Aline Dobrovsky et.al. 2506.15567 null
2025-06-18 Lessons from Training Grounded LLMs with Verifiable Rewards Shang Hong Sim et.al. 2506.15522 null
2025-06-18 Optimizing Web-Based AI Query Retrieval with GPT Integration in LangChain A CoT-Enhanced Prompt Engineering Approach Wenqi Guan et.al. 2506.15512 null
2025-06-18 SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling Md Imbesat Hassan Rizvi et.al. 2506.15498 link
2025-06-18 RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation Xinnuo Xu et.al. 2506.15455 null
2025-06-18 AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need Zhouhong Gu et.al. 2506.15451 link
2025-06-18 DeVisE: Behavioral Testing of Medical Large Language Models Camila Zurdo Tagliabue et.al. 2506.15339 null
2025-06-18 Cohort Discovery: A Survey on LLM-Assisted Clinical Trial Recruitment Shrestha Ghosh et.al. 2506.15301 null
2025-06-18 MinosEval: Distinguishing Factoid and Non-Factoid for Tailored Open-Ended QA Evaluation with LLMs Yongqi Fan et.al. 2506.15215 link
2025-06-18 ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs Feng He et.al. 2506.15211 null
2025-06-18 Learning-Time Encoding Shapes Unlearning in LLMs Ruihan Wu et.al. 2506.15076 link
2025-06-18 HEAL: An Empirical Study on Hallucinations in Embodied Agents Driven by Large Language Models Trishna Chakraborty et.al. 2506.15065 null
2025-06-18 Truncated Proximal Policy Optimization Tiantian Fan et.al. 2506.15050 null
2025-06-18 Language Models can perform Single-Utterance Self-Correction of Perturbed Reasoning Sam Silver et.al. 2506.15894 null
2025-06-18 Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute Sheng Liu et.al. 2506.15882 null
2025-06-18 MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents Zijian Zhou et.al. 2506.15841 null
2025-06-18 Context Matters! Relaxing Goals with LLMs for Feasible 3D Scene Planning Emanuele Musumeci et.al. 2506.15828 null
2025-06-18 Veracity: An Open-Source AI Fact-Checking System Taylor Lynn Curtis et.al. 2506.15794 null
2025-06-18 ETrace:Event-Driven Vulnerability Detection in Smart Contracts via LLM-Based Trace Analysis Chenyang Peng et.al. 2506.15790 null
2025-06-18 PentaRAG: Large-Scale Intelligent Knowledge Retrieval for Enterprise LLM Applications Abu Hanif Muhammad Syarubany et.al. 2506.21593 null
2025-06-18 Representation Consistency for Accurate and Coherent LLM Answer Aggregation Junqi Jiang et.al. 2506.21590 null
2025-06-17 Unified Software Engineering agent as AI Software Engineer Leonhard Applis et.al. 2506.14683 null
2025-06-17 Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality Yuto Harada et.al. 2506.14681 null
2025-06-17 Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot Xiang Cheng et.al. 2506.14641 null
2025-06-17 NetRoller: Interfacing General and Specialized Models for End-to-End Autonomous Driving Ren Xin et.al. 2506.14589 link
2025-06-17 Automatic Qiskit Code Refactoring Using Large Language Models José Manuel Suárez et.al. 2506.14535 null
2025-06-17 M2BeamLLM: Multimodal Sensing-empowered mmWave Beam Prediction with Large Language Models Can Zheng et.al. 2506.14532 null
2025-06-17 SIRI-Bench: Challenging VLMs’ Spatial Intelligence through Complex Reasoning Tasks Zijian Song et.al. 2506.14512 null
2025-06-17 LLM-Powered Swarms: A New Frontier or a Conceptual Stretch? Muhammad Atta Ur Rahman et.al. 2506.14496 null
2025-06-17 How Far Can LLMs Improve from Experience? Measuring Test-Time Learning Ability in LLMs with Human Comparison Jiayin Wang et.al. 2506.14448 link
2025-06-17 Excessive Reasoning Attack on Reasoning LLMs Wai Man Si et.al. 2506.14374 null
2025-06-17 ELLIS Alicante at CQs-Gen 2025: Winning the critical thinking questions shared task: LLM-based question generation and selection Lucile Favero et.al. 2506.14371 null
2025-06-17 A Vision for Geo-Temporal Deep Research Systems: Towards Comprehensive, Transparent, and Reproducible Geo-Temporal Information Synthesis Bruno Martins et.al. 2506.14345 null
2025-06-17 ADRD: LLM-Driven Autonomous Driving Based on Rule-based Decision Systems Fanzhi Zeng et.al. 2506.14299 null
2025-06-17 Large Language Model Empowered Design of Fluid Antenna Systems: Challenges, Frameworks, and Case Studies for 6G Chao Wang et.al. 2506.14288 null
2025-06-17 Improving LoRA with Variational Learning Bai Cong et.al. 2506.14280 null
2025-06-17 Don’t throw the baby out with the bathwater: How and why deep learning for ARC Jack Cole et.al. 2506.14276 null
2025-06-17 Re-Initialization Token Learning for Tool-Augmented Large Language Models Chenghao Li et.al. 2506.14248 link
2025-06-17 Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs Xumeng Wen et.al. 2506.14245 null
2025-06-17 Causes in neuron diagrams, and testing causal reasoning in Large Language Models. A glimpse of the future of philosophy? Louis Vervoort et.al. 2506.14239 null
2025-06-17 Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team Md Tanzib Hosain et.al. 2506.14234 null
2025-06-17 MIST: Towards Multi-dimensional Implicit Bias and Stereotype Evaluation of LLMs via Theory of Mind Yanlin Li et.al. 2506.14161 null
2025-06-17 S$^4$C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models Tao He et.al. 2506.14158 null
2025-06-17 InsertRank: LLMs can reason over BM25 scores to Improve Listwise Reranking Rahul Seetharaman et.al. 2506.14086 null
2025-06-17 AI-Facilitated Analysis of Abstracts and Conclusions: Flagging Unsubstantiated Claims and Ambiguous Pronouns Evgeny Markhasin et.al. 2506.13172 null
2025-06-17 AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving Wentao Zhang et.al. 2506.12508 link
2025-06-17 LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification Penghui Yang et.al. 2502.17421 link
2025-06-17 Cost-Efficient Serving of LLM Agents via Test-Time Plan Caching Qizheng Zhang et.al. 2506.14852 null
2025-06-17 Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Zhoujun Cheng et.al. 2506.14965 link
2025-06-17 Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework Mohna Chakraborty et.al. 2506.14948 null
2025-06-17 CALM: Contextual Analog Logic with Multimodality Maxwell J. Jacobson et.al. 2506.14936 null
2025-06-17 MDBench: A Synthetic Multi-Document Reasoning Benchmark Generated with Knowledge Guidance Joseph J. Peper et.al. 2506.14927 null
2025-06-16 Steering LLM Thinking with Budget Guidance Junyan Li et.al. 2506.13752 link
2025-06-16 Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability Shova Kuikel et.al. 2506.13746 link
2025-06-16 TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning Junru Zhang et.al. 2506.13705 link
2025-06-16 Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text Amr Mohamed et.al. 2506.14012 link
2025-06-16 Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences Stas Bekman et.al. 2506.13996 link
2025-06-16 How Does LLM Reasoning Work for Code? A Survey and a Call to Action Ira Ceka et.al. 2506.13932 null
2025-06-16 Spec2RTL-Agent: Automated Hardware Code Generation from Complex Specifications Using LLM Agent Systems Zhongzhi Yu et.al. 2506.13905 null
2025-06-16 Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles Antara Raaghavi Bhattacharya et.al. 2506.13886 null
2025-06-16 Balancing Knowledge Delivery and Emotional Comfort in Healthcare Conversational Systems Shang-Chi Tsai et.al. 2506.13692 null
2025-06-16 An LLM’s Apology: Outsourcing Awkwardness in the Age of AI Twm Stone et.al. 2506.13685 link
2025-06-16 LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning Miho Koda et.al. 2506.13841 link
2025-06-16 EvolvTrip: Enhancing Literary Character Understanding with Temporal Theory-of-Mind Graphs Bohao Yang et.al. 2506.13641 link
2025-06-16 An Empirical Study of LLM-as-a-Judge: How Design Choices Impact Evaluation Reliability Yusuke Yamauchi et.al. 2506.13639 null
2025-06-16 FreeQ-Graph: Free-form Querying with Semantic Consistent Scene Graph for 3D Scene Understanding Chenlu Zhan et.al. 2506.13629 null
2025-06-16 CAMS: A CityGPT-Powered Agentic Framework for Urban Human Mobility Simulation Yuwei Du et.al. 2506.13599 null
2025-06-16 Understand the Implication: Learning to Think for Pragmatic Understanding Settaluri Lakshmi Sravanthi et.al. 2506.13559 null
2025-06-16 Implicit and Explicit Research Quality Score Probabilities from ChatGPT Mike Thelwall et.al. 2506.13525 null
2025-06-16 BOW: Bottlenecked Next Word Exploration Ming Shen et.al. 2506.13502 null
2025-06-16 Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study Zhengyu Hu et.al. 2506.13464 null
2025-06-16 From Promise to Peril: Rethinking Cybersecurity Red and Blue Teaming in the Age of LLMs Alsharif Abuadbba et.al. 2506.13434 null
2025-06-16 RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis Pengzuo Wu et.al. 2506.13405 null
2025-06-16 Decompositional Reasoning for Graph Retrieval with Large Language Models Valentin Six et.al. 2506.13380 null
2025-06-16 Socratic RL: A Novel Framework for Efficient Knowledge Acquisition through Iterative Reflection and Viewpoint Distillation Xiangfan Wu et.al. 2506.13358 null
2025-06-16 StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns Luanbo Wan et.al. 2506.13356 null
2025-06-16 Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks Yifei Xu et.al. 2506.13351 null
2025-06-16 Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers Wooseok Seo et.al. 2506.13342 link
2025-06-16 Towards Pervasive Distributed Agentic Generative AI – A State of The Art Gianni Molinari et.al. 2506.13324 null
2025-06-16 Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models James Chua et.al. 2506.13206 null
2025-06-16 Breaking Thought Patterns: A Multi-Dimensional Reasoning Framework for LLMs Xintong Tang et.al. 2506.13192 null
2025-06-16 Enhancing Large Language Models with Reliable Knowledge Graphs Qinggang Zhang et.al. 2506.13178 null
2025-06-16 Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs Gyutaek Oh et.al. 2506.13102 null
2025-06-16 Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs Daniel Kilov et.al. 2506.13082 null
2025-06-16 MotiveBench: How Far Are We From Human-Like Motivational Reasoning in Large Language Models? Xixian Yong et.al. 2506.13065 null
2025-06-16 Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning Haibo Qiu et.al. 2506.13056 null
2025-06-16 Just Go Parallel: Improving the Multilingual Capabilities of Large Language Models Muhammad Reza Qorib et.al. 2506.13044 null
2025-06-16 Knowledge Graph Fusion with Large Language Models for Accurate, Explainable Manufacturing Process Planning Danny Hoang et.al. 2506.13026 null
2025-06-16 Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making Claudio Fanconi et.al. 2506.11887 null
2025-06-15 I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference Zibo Gao et.al. 2505.06738 null
2025-06-15 SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models Xinyi Zhao et.al. 2506.12992 link
2025-06-15 Multi-document Summarization through Multi-document Event Relation Graph Reasoning in LLMs: a case study in Framing Bias Mitigation Yuanyuan Lei et.al. 2506.12978 null
2025-06-15 Scaling Test-time Compute for LLM Agents King Zhu et.al. 2506.12928 null
2025-06-15 PersonaFeedback: A Large-scale Human-annotated Benchmark For Personalization Meiling Tao et.al. 2506.12915 null
2025-06-15 SciDA: Scientific Dynamic Assessor of LLMs Junting Zhou et.al. 2506.12909 null
2025-06-15 WereWolf-Plus: An Update of Werewolf Game setting Based on DSGBench Xinyuan Xia et.al. 2506.12841 null
2025-06-15 Mastering Da Vinci Code: A Comparative Study of Transformer, LLM, and PPO-based Agents LeCheng Zhang et.al. 2506.12801 null
2025-06-15 MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution Yibo Wang et.al. 2506.12728 null
2025-06-15 Humanity’s Last Code Exam: Can Advanced LLMs Conquer Human’s Hardest Code Competition? Xiangyang Li et.al. 2506.12713 link
2025-06-15 Building Trustworthy AI by Addressing its 16+2 Desiderata with Goal-Directed Commonsense Reasoning Alexis R. Tudor et.al. 2506.12667 null
2025-06-14 Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics Jiarui Liu et.al. 2506.12657 null
2025-06-14 Towards Building General Purpose Embedding Models for Industry 4.0 Agents Christodoulos Constantinides et.al. 2506.12607 null
2025-06-14 OneEval: Benchmarking LLM Knowledge-intensive Reasoning over Diverse Knowledge Bases Yongrui Chen et.al. 2506.12577 null
2025-06-14 RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking Shuo Yang et.al. 2506.12538 null
2025-06-14 Detection, Classification, and Mitigation of Gender Bias in Large Language Models Xiaoqing Cheng et.al. 2506.12527 null
2025-06-14 Graph of Verification: Structured Verification of LLM Reasoning with Directed Acyclic Graphs Jiwei Fang et.al. 2506.12509 null
2025-06-14 From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time Alignment Bin Xie et.al. 2506.12446 null
2025-06-14 Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics Asifullah khan et.al. 2506.12365 null
2025-06-14 QiMeng-Attention: SOTA Attention Operator is generated by SOTA Attention Algorithm Qirui Zhou et.al. 2506.12355 null
2025-06-14 Information Suppression in Large Language Models: Auditing, Quantifying, and Characterizing Censorship in DeepSeek Peiran Qiu et.al. 2506.12349 null
2025-06-14 Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning Xiaotian Zhang et.al. 2506.12307 null
2025-06-14 Unveiling Confirmation Bias in Chain-of-Thought Reasoning Yue Wan et.al. 2506.12301 null
2025-06-14 The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason Shanchao Liang et.al. 2506.12286 null
2025-06-14 ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression Guangda Liu et.al. 2412.03213 link
2025-06-13 Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache Xiaoran Liu et.al. 2506.11886 null
2025-06-13 Lag-Relative Sparse Attention In Long Context Training Manlai Liang et.al. 2506.11498 null
2025-06-13 Efficient Long-Context LLM Inference via KV Cache Clustering Jie Hu et.al. 2506.11418 null
2025-06-13 Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles Qingyan Wei et.al. 2506.10848 link
2025-06-13 Investigating the Potential of Large Language Model-Based Router Multi-Agent Architectures for Foundation Design Automation: A Task Classification and Expert Selection Study Sompote Youwai et.al. 2506.13811 null
2025-06-13 From Emergence to Control: Probing and Modulating Self-Reflection in Language Models Xudong Zhu et.al. 2506.12217 link
2025-06-13 Supernova Event Dataset: Interpreting Large Language Model’s Personality through Critical Event Analysis Pranav Agarwal et.al. 2506.12189 null
2025-06-13 Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs Chenqian Le et.al. 2506.12182 null
2025-06-13 code_transformed: The Influence of Large Language Models on Code Yuliang Xu et.al. 2506.12014 null
2025-06-13 Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making Xiaopeng Yuan et.al. 2506.12012 null
2025-06-13 How Visual Representations Map to Language Feature Space in Multimodal LLMs Constantin Venhoff et.al. 2506.11976 null
2025-06-13 Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback Dongwei Jiang et.al. 2506.11930 null
2025-06-13 LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? Zihan Zheng et.al. 2506.11928 null
2025-06-13 TreeRL: LLM Reinforcement Learning with On-Policy Tree Search Zhenyu Hou et.al. 2506.11902 link
2025-06-13 MapQaTor: An Extensible Framework for Efficient Annotation of Map-Based QA Datasets Mahir Labib Dihan et.al. 2412.21015 link
2025-06-12 SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding Ziyi Zhang et.al. 2506.11309 null
2025-06-11 SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems Peiran Li et.al. 2506.07564 null
2025-06-10 Activated LoRA: Fine-tuned LLMs for Intrinsics Kristjan Greenewald et.al. 2504.12397 link
2025-06-09 Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models Haoyu Wang et.al. 2506.07334 null
2025-06-09 MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts Wei Tao et.al. 2506.07533 null
2025-06-08 Paged Attention Meets FlexAttention: Unlocking Long-Context Efficiency in Deployed Inference Thomas Joshi et.al. 2506.07311 null
2025-06-08 MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache Akshat Sharma et.al. 2411.18077 null
2025-06-05 Inference-Time Hyper-Scaling with KV Cache Compression Adrian Łańcucki et.al. 2506.05345 null
2025-06-05 Unleashing Hour-Scale Video Training for Long Video-Language Understanding Jingyang Lin et.al. 2506.05332 null
2025-06-05 MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs Zhenyan Lu et.al. 2506.13772 null
2025-06-05 ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration Xianglong Yan et.al. 2505.24357 null
2025-06-04 Homogeneous Keys, Heterogeneous Values: Exploiting Local KV Cache Asymmetry for Long-Context LLMs Wanyun Cui et.al. 2506.05410 null
2025-06-04 AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models Yifeng Gu et.al. 2506.03762 null
2025-06-04 AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism Zhepei Wei et.al. 2506.03700 link
2025-06-04 HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing Minghui Liu et.al. 2412.16187 null
2025-06-04 KVPR: Efficient LLM Inference with I/O-Aware KV Cache Partial Recomputation Chaoyi Jiang et.al. 2411.17089 link
2025-06-03 A$^2$ATS: Retrieval-Based KV Cache Reduction via Windowed Rotary Position Embedding and Query-Aware Vector Quantization Junhui He et.al. 2502.12665 null
2025-06-03 SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation Jialong Wu et.al. 2412.13649 link
2025-06-02 Memory Access Characterization of Large Language Models in CPU Environment and its Potential Impacts Spencer Banasik et.al. 2506.01827 null
2025-06-02 SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation Aurick Qiao et.al. 2410.03960 null
2025-06-02 SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications Gabriele Oliaro et.al. 2411.04975 link
2025-06-01 Earley-Driven Dynamic Pruning for Efficient Structured Decoding Xintong Sun et.al. 2506.01151 null
2025-06-01 A Survey of LLM $\times$ DATA Xuanhe Zhou et.al. 2505.18458 link
2025-05-31 Accelerating Diffusion LLMs via Adaptive Parallel Decoding Daniel Israel et.al. 2506.00413 null
2025-05-31 QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design Benjamin Schneider et.al. 2505.16175 link
2025-05-31 KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference Xing Li et.al. 2502.04420 link
2025-05-30 HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts Neil He et.al. 2505.24722 link
2025-05-30 Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching Juan Wisznia et.al. 2505.24643 null
2025-05-30 SkyLB: A Locality-Aware Cross-Region Load Balancer for LLM Inference Tian Xia et.al. 2505.24095 null
2025-05-30 RaaS: Reasoning-Aware Attention Sparsity for Efficient LLM Reasoning Junhao Hu et.al. 2502.11147 null
2025-05-30 Learn from the Past: Fast Sparse Indexing for Large Language Model Decoding Feiyu Yao et.al. 2506.15704 null
2025-05-29 EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse Tianyu Guo et.al. 2505.21889 link
2025-05-29 Wireless Agentic AI with Retrieval-Augmented Multimodal Semantic Perception Guangyuan Liu et.al. 2505.23275 null
2025-05-29 EmbAdvisor: Adaptive Cache Management for Sustainable LLM Serving Yuyang Tian et.al. 2505.23970 null
2025-05-29 KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction Jang-Hyun Kim et.al. 2505.23416 link
2025-05-28 Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference Yue Zhu et.al. 2505.21919 null
2025-05-28 Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference Donghyeon Joo et.al. 2505.22913 link
2025-05-28 Scaling Reasoning without Attention Xueliang Zhao et.al. 2505.22425 null
2025-05-28 InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing Shuaiyi Li et.al. 2505.22156 null
2025-05-28 gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling Tianyu Guo et.al. 2504.14775 link
2025-05-27 Hardware-Efficient Attention for Fast Decoding Ted Zadouri et.al. 2505.21487 null
2025-05-27 SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences Jungyoub Cha et.al. 2505.20776 link
2025-05-27 TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization Dingyu Yao et.al. 2505.19586 link
2025-05-27 EPIC: Efficient Position-Independent Caching for Serving Large Language Models Junhao Hu et.al. 2410.15332 null
2025-05-26 HAMburger: Accelerating LLM Inference via Token Smashing Jingyu Liu et.al. 2505.20438 null
2025-05-26 O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering Jianbiao Mei et.al. 2505.16582 link
2025-05-26 RAP: Runtime-Adaptive Pruning for LLM Inference Huanrong Liu et.al. 2505.17138 null
2025-05-26 SLOT: Sample-specific Language Model Optimization at Test-time Yang Hu et.al. 2505.12392 link
2025-05-26 PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving Ahmet Caner Yüzügüler et.al. 2501.08192 null
2025-05-26 UniICL: An Efficient Unified Framework Unifying Compression, Selection, and Generation Jun Gao et.al. 2405.17062 null
2025-05-26 BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems Yuxin Wang et.al. 2401.17644 link
2025-05-25 Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps Jie Ou et.al. 2505.12731 null
2025-05-24 Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing Zhaoyuan Su et.al. 2506.02006 null
2025-05-24 Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query Yixuan Wang et.al. 2505.20334 null
2025-05-24 PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs Tengxuan Liu et.al. 2505.18610 link
2025-05-24 PersonaX: A Recommendation Agent Oriented User Modeling Framework for Long Behavior Sequence Yunxiao Shi et.al. 2503.02398 link
2025-05-23 FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding Zhibin Wang et.al. 2505.17694 null
2025-05-23 Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence Amirhosein Ghasemabadi et.al. 2505.20325 null
2025-05-23 NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache Donghyun Son et.al. 2505.18231 null
2025-05-23 Titanus: Enabling KV Cache Pruning and Quantization On-the-Fly for LLM Acceleration Peilin Chen et.al. 2505.17787 link
2025-05-23 ThinkLess: A Training-Free Inference-Efficient Method for Reducing Reasoning Redundancy Gengyang Li et.al. 2505.15684 null
2025-05-23 Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Gleb Rodionov et.al. 2504.06261 link
2025-05-22 Zebra-Llama: Towards Extremely Efficient Hybrid Models Mingyu Yang et.al. 2505.17272 null
2025-05-22 T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning Amartya Chakraborty et.al. 2505.16986 null
2025-05-22 NQKV: A KV Cache Quantization Scheme Based on Normal Distribution Characteristics Zhihang Cai et.al. 2505.16210 null
2025-05-22 HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving Zhiwen Chen et.al. 2505.15793 null
2025-05-21 Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models Jingcong Liang et.al. 2505.16056 link
2025-05-21 A Federated Splitting Framework for LLMs: Security, Efficiency, and Adaptability Zishuai Zhang et.al. 2505.15683 link
2025-05-21 FlowKV: Enhancing Multi-Turn Conversational Coherence in LLMs via Isolated Key-Value Cache Management Xiang Liu et.al. 2505.15347 null
2025-05-21 LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval Zhenyu Ning et.al. 2505.15269 null
2025-05-21 AutoData: A Multi-Agent System for Open Web Data Collection Tianyi Ma et.al. 2505.15859 link
2025-05-21 Effective and Efficient Schema-aware Information Extraction Using On-Device Large Language Models Zhihao Wen et.al. 2505.14992 null
2025-05-21 Can LLMs Maintain Fundamental Abilities under KV Cache Compression? Xiang Liu et.al. 2502.01941 null
2025-05-20 Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning Jiwon Song et.al. 2505.13866 link
2025-05-20 SkyMemory: A LEO Edge Cache for Transformer Inference Optimization and Scale Out Thomas Sandholm et.al. 2505.14427 null
2025-05-20 Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation Peter Baile Chen et.al. 2505.14398 null
2025-05-20 CE-LSLM: Efficient Large-Small Language Model Inference and Communication via Cloud-Edge Collaboration Pengyan Zhu et.al. 2505.14085 null
2025-05-20 KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments Junyoung Park et.al. 2504.15364 null
2025-05-20 Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding Sakhinana Sagar Srinivas et.al. 2504.01281 null
2025-05-20 Online Scheduling for LLM Inference with KV Cache Constraints Patrick Jaillet et.al. 2502.07115 null
2025-05-19 FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference Guangda Liu et.al. 2505.13109 null
2025-05-19 AD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection Tiankai Yang et.al. 2505.12594 link
2025-05-19 SubGCache: Accelerating Graph-based RAG with Subgraph-level KV Cache Qiuyu Zhu et.al. 2505.10951 null
2025-05-19 FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension Jushi Kai et.al. 2505.00570 null
2025-05-18 KVmix: Gradient-Based Layer Importance-Aware Mixed-Precision Quantization for KV Cache Fei Li et.al. 2506.08018 null
2025-05-16 Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models Camille Couturier et.al. 2505.11271 null
2025-05-16 Accurate KV Cache Quantization with Outlier Tokens Tracing Yi Su et.al. 2505.10938 link
2025-05-16 KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse Huan Yang et.al. 2503.16525 null
2025-05-14 SALM: A Multi-Agent Framework for Language Model-Driven Social Network Simulation Gaurav Koley et.al. 2505.09081 link
2025-05-14 Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization Minsu Kim et.al. 2503.18599 null
2025-05-13 Enhancing Cache-Augmented Generation (CAG) with Adaptive Contextual Compression for Scalable Knowledge Integration Rishabh Agrawal et.al. 2505.08261 null
2025-05-13 Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs Lucas Maisonnave et.al. 2504.13989 null
2025-05-12 SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models Hang Wu et.al. 2505.07680 null
2025-05-12 Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains Ibne Farabi Shihab et.al. 2505.07274 null
2025-05-12 Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity Guang Yan et.al. 2505.07239 null
2025-05-12 PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications Kuntai Du et.al. 2505.07203 null
2025-05-11 Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-aware Cache Compression Feng Cheng et.al. 2505.06901 null
2025-05-09 Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM Zehao Fan et.al. 2505.05772 null
2025-05-08 A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency Sihyeong Park et.al. 2505.01658 link
2025-05-05 Large Language Model Partitioning for Low-Latency Inference at the Edge Dimitrios Kafetzis et.al. 2505.02533 null
2025-05-01 Spill The Beans: Exploiting CPU Cache Side-Channels to Leak Tokens from Large Language Models Andrew Adiletta et.al. 2505.00817 null
2025-05-01 QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving Yujun Lin et.al. 2405.04532 link
2025-04-29 CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks Rui Wang et.al. 2504.21228 null
2025-04-28 semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage Ke Hong et.al. 2504.19867 null
2025-04-25 ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Hanshi Sun et.al. 2410.21465 link
2025-04-24 L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference Qingyuan Liu et.al. 2504.17584 null
2025-04-22 SeaLLM: Service-Aware and Latency-Optimized Resource Sharing for Large Language Model Inference Yihao Zhao et.al. 2504.15720 null
2025-04-22 Optimizing SLO-oriented LLM Serving with PD-Multiplexing Weihao Cui et.al. 2504.14489 null
2025-04-21 Splitwiser: Efficient LM inference with constrained resources Asad Aali et.al. 2505.03763 link
2025-04-21 LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention Shang Yang et.al. 2502.14866 link
2025-04-21 FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving Zihao Ye et.al. 2501.01005 link
2025-04-21 Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design Rui Xie et.al. 2503.18869 null
2025-04-20 Understanding and Optimizing Multi-Stage AI Inference Pipelines Abhimanyu Rajeshkumar Bambhaniya et.al. 2504.09775 null
2025-04-19 Improving the Serving Performance of Multi-LoRA Large Language Models via Efficient LoRA and KV Cache Management Hang Zhang et.al. 2505.03756 null
2025-04-18 LogicTree: Structured Proof Exploration for Coherent and Rigorous Logical Reasoning with Large Language Models Kang He et.al. 2504.14089 null
2025-04-18 HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing Myunghyun Rhee et.al. 2504.16112 null
2025-04-16 Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading Kihyun Kim et.al. 2504.11816 link
2025-04-16 Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs Hyungwoo Lee et.al. 2504.11765 null
2025-04-15 Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints Ruicheng Ao et.al. 2504.11320 link
2025-04-14 AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference Yangshen Deng et.al. 2504.10326 null
2025-04-14 KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference Yuxuan Tian et.al. 2504.09936 null
2025-04-13 Efficient LLM Serving on Hybrid Real-time and Best-effort Requests Wan Borui et.al. 2504.09590 null
2025-04-11 Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash Fucheng Jia et.al. 2504.08378 null
2025-04-11 Boosting Universal LLM Reward Design through Heuristic Reward Observation Space Evolution Zen Kit Heng et.al. 2504.07596 null
2025-04-10 Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving Shihong Gao et.al. 2504.07494 link
2025-04-10 Marconi: Prefix Caching for the Era of Hybrid LLMs Rui Pan et.al. 2411.19379 null
2025-04-10 UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference Weikai Xu et.al. 2504.07479 null
2025-04-09 Saliency-driven Dynamic Token Pruning for Large Language Models Yao Tao et.al. 2504.04514 null
2025-04-08 Unifying KV Cache Compression for Large Language Models with LeanKV Yanqi Zhang et.al. 2412.03131 null
2025-04-08 SPIRe: Boosting LLM Inference Throughput with Speculative Decoding Sanjit Neelam et.al. 2504.06419 null
2025-04-08 HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference Shuzhang Zhong et.al. 2504.05897 link
2025-04-08 Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching Yanhao Dong et.al. 2504.06319 null
2025-04-07 AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design Yanbiao Liang et.al. 2505.03745 null
2025-04-03 CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion Jiayi Yao et.al. 2405.16444 link
2025-04-03 HyperRAG: Enhancing Quality-Efficiency Tradeoffs in Retrieval-Augmented Generation with Reranker KV-Cache Reuse Yuwei An et.al. 2504.02921 null
2025-04-03 LLM Library Learning Fails: A LEGO-Prover Case Study Ian Berlot-Attwell et.al. 2504.03048 null
2025-04-02 MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Ranajoy Sadhukhan et.al. 2408.11049 link
2025-04-01 SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching Yuxuan Zhu et.al. 2504.00970 null
2025-04-01 Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB Anas Dorbani et.al. 2504.01157 null
2025-04-01 Knowledge-Aware Iterative Retrieval for Multi-Agent Systems Seyoung Song et.al. 2503.13275 null
2025-03-31 Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving Wei Gao et.al. 2503.24000 link
2025-03-30 PQCache: Product Quantization-based KVCache for Long Context LLM Inference Hailin Zhang et.al. 2407.12820 null
2025-03-30 Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference Wei Tao et.al. 2503.23294 null
2025-03-27 Solving AI Foundational Model Latency with Telco Infrastructure Sebastian Barros et.al. 2504.03708 null
2025-03-27 WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference Youhui Zuo et.al. 2503.17922 link
2025-03-25 LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation Han Chen et.al. 2503.19950 link
2025-03-24 Jenga: Effective Memory Management for Serving LLM with Heterogeneity Chen Zhang et.al. 2503.18292 null
2025-03-24 Mitigating KV Cache Competition to Enhance User Experience in LLM Inference Haiying Shen et.al. 2503.13773 null
2025-03-24 EconoServe: Maximizing Multi-Resource Utilization with SLO Guarantees in LLM Serving Haiying Shen et.al. 2411.06364 null
2025-03-24 xKV: Cross-Layer SVD for KV-Cache Compression Chi-Chih Chang et.al. 2503.18893 link
2025-03-21 MKG-Rank: Enhancing Large Language Models with Knowledge Graph for Multilingual Medical Question Answering Feiyang Li et.al. 2503.16131 null
2025-03-20 Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models Keda Tao et.al. 2503.16257 null
2025-03-20 SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs Shibo Jie et.al. 2503.16163 null
2025-03-17 AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications Haiying Shen et.al. 2503.13737 null
2025-03-16 CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences Ziran Qin et.al. 2503.12491 null
2025-03-12 PRISM: Efficient Long-Range Reasoning With Short-Context LLMs Dulhan Jayalath et.al. 2412.18914 null
2025-03-11 FastCache: Optimizing Multimodal LLM Serving through Lightweight KV-Cache Compression Framework Jianian Zhu et.al. 2503.08461 null
2025-03-09 Seesaw: High-throughput LLM Inference via Model Re-sharding Qidong Su et.al. 2503.06433 null
2025-03-07 DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference Jinwei Yao et.al. 2404.00242 null
2025-03-06 LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression Souvik Kundu et.al. 2503.04982 null
2025-03-06 Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning Giulio Corallo et.al. 2503.04973 null
2025-03-06 Markov Chain of Thought for Efficient Mathematical Reasoning Wen Yang et.al. 2410.17635 null
2025-03-05 Enhancing Memory Efficiency in Large Language Model Training Through Chronos-aware Pipeline Parallelism Xinyuan Lin et.al. 2503.03182 null
2025-03-03 WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models Jian Yuan et.al. 2503.01330 null
2025-03-01 Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving Qihui Zhou et.al. 2503.00392 null
2025-02-27 Dynamic Parallel Tree Search for Efficient LLM Reasoning Yifu Ding et.al. 2502.16235 null
2025-02-27 ThinK: Thinner Key Cache by Query-Driven Pruning Yuhui Xu et.al. 2407.21018 null
2025-02-24 ELMo-Tune-V2: LLM-Assisted Full-Cycle Auto-Tuning to Optimize LSM-Based Key-Value Stores Viraj Thakkar et.al. 2502.17606 link
2025-02-24 The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve? Zhenheng Tang et.al. 2502.17535 null
2025-02-22 AIBrix: Towards Scalable, Cost-Effective Large Language Model Inference Infrastructure The AIBrix Team et.al. 2504.03648 null
2025-02-20 SpinQuant: LLM quantization with learned rotations Zechun Liu et.al. 2405.16406 null
2025-02-20 Compute Or Load KV Cache? Why Not Both? Shuowei Jin et.al. 2410.03065 null
2025-02-17 Does RAG Really Perform Bad For Long-Context Processing? Kun Luo et.al. 2502.11444 null
2025-02-12 The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems Linke Song et.al. 2409.20002 null
2025-02-11 HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment Youhe Jiang et.al. 2502.07903 null
2025-02-10 MARM: Unlocking the Future of Recommendation Systems through Memory Augmentation and Scalable Complexity Xiao Lv et.al. 2411.09425 null
2025-02-08 ProMoE: Fast MoE-based LLM Serving using Proactive Caching Xiaoniu Song et.al. 2410.22134 null
2025-02-07 fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving Hanfei Yu et.al. 2502.05370 null
2025-02-05 Accessible and Portable LLM Inference by Compiling Computational Graphs into SQL Wenbo Sun et.al. 2502.02818 null
2025-02-05 Qrazor: Reliable and Effortless 4-bit LLM Quantization by Significant Data Razoring Dongyoung Lee et.al. 2501.13331 null
2025-02-05 Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation Shubham Agarwal et.al. 2502.15734 null
2025-02-04 LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation Xuan Zhang et.al. 2410.13846 link
2025-02-02 RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations Zunhai Su et.al. 2501.16383 null
2025-02-01 QSpec: Speculative Decoding with Complementary Quantization Schemes Juntao Zhao et.al. 2410.11305 null
2025-01-30 State Stream Transformer (SST) : Emergent Metacognitive Behaviours Through Latent State Persistence Thea Aviss et.al. 2501.18356 null
2025-01-29 vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention Ramya Prabhu et.al. 2405.04437 link
2025-01-27 PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization Mengzhao Chen et.al. 2410.05265 link
2025-01-25 Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads Xingyang He et.al. 2501.15113 null
2025-01-24 Locality-aware Fair Scheduling in LLM Serving Shiyi Cao et.al. 2501.14312 null
2025-01-24 Serving Long-Context LLMs at the Mobile Edge: Test-Time Reinforcement Learning-based Model Caching and Inference Offloading Minrui Xu et.al. 2501.14205 null
2025-01-24 EchoLM: Accelerating LLM Serving with Real-time Knowledge Distillation Yifan Yu et.al. 2501.12689 null
2025-01-23 A Training-free Sub-quadratic Cost Transformer Model Serving Framework With Hierarchically Pruned Attention Heejun Lee et.al. 2406.09827 null
2025-01-22 Yi-Lightning Technical Report Alan Wake et.al. 2412.01253 null
2025-01-17 BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching Zhen Zheng et.al. 2412.03594 null
2025-01-12 Mell: Memory-Efficient Large Language Model Serving via Multi-GPU KV Cache Management Liu Qianli et.al. 2501.06709 null
2025-01-06 The Power of Negative Zero: Datatype Customization for Quantized Large Language Models Yuzong Chen et.al. 2501.04052 link
2025-01-02 MSWA: Refining Local Attention with Multi-ScaleWindow Attention Yixing Xu et.al. 2501.01039 null
2025-01-02 A Survey on Large Language Model Acceleration based on KV Cache Management Haoyang Li et.al. 2412.19442 link
2024-12-31 RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Di Liu et.al. 2409.10516 link
2024-12-23 Deliberation in Latent Space via Differentiable Cache Augmentation Luyang Liu et.al. 2412.17747 null
2024-12-21 SYMPHONY: Improving Memory Management for LLM Inference Workloads Saurabh Agarwal et.al. 2412.16434 null
2024-12-21 MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool Cunchen Hu et.al. 2406.17565 null
2024-12-18 MagicPIG: LSH Sampling for Efficient LLM Generation Zhuoming Chen et.al. 2410.16179 link
2024-12-18 Semantic Convergence: Harmonizing Recommender Systems via Two-Stage Alignment and Behavioral Semantic Tokenization Guanghan Li et.al. 2412.13771 null
2024-12-17 A System for Microserving of LLMs Hongyi Jin et.al. 2412.12488 null
2024-12-16 CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation Hongxuan Zhang et.al. 2412.11741 null
2024-12-13 KVDirect: Distributed Disaggregated LLM Inference Shiyang Chen et.al. 2501.14743 null
2024-12-12 PowerInfer-2: Fast Large Language Model Inference on a Smartphone Zhenliang Xue et.al. 2406.06282 null
2024-12-05 A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts Suyu Ge et.al. 2410.01485 null
2024-11-27 FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving Ao Shen et.al. 2411.18424 null
2024-11-24 Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments Nikoleta Iliakopoulou et.al. 2411.17741 null
2024-11-14 Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning Yu Fu et.al. 2410.19258 link
2024-11-08 Eigen Attention: Attention in Low-Rank Space for KV Cache Compression Utkarsh Saxena et.al. 2408.05646 link
2024-11-02 NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference Xuanlin Jiang et.al. 2411.01142 null
2024-10-31 ALISE: Accelerating Large Language Model Serving with Speculative Scheduling Youpeng Zhao et.al. 2410.23537 null
2024-10-29 LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism Bingyang Wu et.al. 2404.09526 link
2024-10-25 Fast Inference for Augmented Large Language Models Rana Shahout et.al. 2410.18248 null
2024-10-24 Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design Ruisi Cai et.al. 2410.19123 link
2024-10-23 Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching Jie Peng et.al. 2410.14740 null
2024-10-23 ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference Xin He et.al. 2410.17954 null
2024-10-21 Do Large Language Models Need a Content Delivery Network? Yihua Cheng et.al. 2409.13761 link
2024-10-16 COMET: Towards Partical W4A4KV4 LLMs Serving Lian Liu et.al. 2410.12168 null
2024-10-09 LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management Yi Xiong et.al. 2410.00428 null
2024-10-08 KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches Jiayi Yuan et.al. 2407.01527 link
2024-10-07 Fast State Restoration in LLM Serving with HCache Shiwei Gao et.al. 2410.05004 null
2024-10-07 KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head Isaac Rehg et.al. 2410.00161 link
2024-10-04 LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy Rongzhi Zhang et.al. 2410.03111 null
2024-10-04 Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization Seungwoo Son et.al. 2406.12016 null
2024-10-03 Preble: Efficient Distributed Prompt Scheduling for LLM Serving Vikranth Srivatsa et.al. 2407.00023 link
2024-10-01 Self-controller: Controlling LLMs with Multi-round Step-by-step Self-awareness Xiao Peng et.al. 2410.00359 null
2024-09-23 Steward: Natural Language Web Automation Brian Tang et.al. 2409.15441 link
2024-09-23 BlockLLM: Multi-tenant Finer-grained Serving for Large Language Models Bodun Hu et.al. 2404.18322 null
2024-09-23 SEAL: Suite for Evaluating API-use of LLMs Woojeong Kim et.al. 2409.15523 null
2024-09-21 LLM-dCache: Improving Tool-Augmented LLMs with GPT-Driven Localized Data Caching Simranjit Singh et.al. 2406.06799 null
2024-09-11 Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU Zhenyu Ning et.al. 2409.09086 null
2024-09-04 SparQ Attention: Bandwidth-Efficient LLM Inference Luka Ribar et.al. 2312.04985 link
2024-08-05 SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving Andreas Kosmas Kakolyris et.al. 2408.05235 null
2024-08-04 TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding Hanshi Sun et.al. 2404.11912 link
2024-08-01 Intermittent Semi-working Mask: A New Masking Paradigm for LLMs Mingcong Lu et.al. 2408.00539 null
2024-08-01 ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition Lu Ye et.al. 2402.15220 link
2024-07-22 vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving Jiale Xu et.al. 2407.15309 link
2024-07-22 Dissecting Multiplication in Transformers: Insights into LLMs Luyu Qiu et.al. 2407.15360 link
2024-07-21 Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks Zheng Wang et.al. 2407.08454 null
2024-07-18 QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead Amir Zandieh et.al. 2406.03482 link
2024-07-11 Bifurcated Attention: Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs Ben Athiwaratkun et.al. 2403.08845 null
2024-07-09 Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving Ruoyu Qin et.al. 2407.00079 link
2024-06-30 Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention Bin Gao et.al. 2403.19708 null
2024-06-28 InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management Wonbeom Lee et.al. 2406.19707 null
2024-06-19 VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework Zhi Yao et.al. 2406.13399 null
2024-06-16 EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism Yanxi Chen et.al. 2312.04916 link
2024-06-08 QCQA: Quality and Capacity-aware grouped Query Attention Vinay Joshi et.al. 2406.10247 null
2024-06-06 SGLang: Efficient Execution of Structured Language Model Programs Lianmin Zheng et.al. 2312.07104 link
2024-05-31 Cached Model-as-a-Resource: Provisioning Large Language Model Agents for Edge Intelligence in Space-air-ground Integrated Networks Minrui Xu et.al. 2403.05826 null
2024-05-13 Hydragen: High-Throughput LLM Inference with Shared Prefixes Jordan Juravsky et.al. 2402.05099 link
2024-04-15 Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models Siyan Zhao et.al. 2404.09529 link
2024-03-26 ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching Youpeng Zhao et.al. 2403.17312 null
2024-03-18 FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines Jiaao He et.al. 2403.11421 null
2024-03-12 GPT-4V(ision) is a Generalist Web Agent, if Grounded Boyuan Zheng et.al. 2401.01614 link
2024-03-11 Large Language Models as Tool Makers Tianle Cai et.al. 2305.17126 link
2024-02-16 When Large Language Model Agents Meet 6G Networks: Perception, Grounding, and Alignment Minrui Xu et.al. 2401.07764 null
2024-02-04 LLM-Enhanced Data Management Xuanhe Zhou et.al. 2402.02643 link
2024-01-16 GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching Cong Guo et.al. 2401.08156 link
2024-01-16 LLMs for Test Input Generation for Semantic Caches Zafaryab Rasool et.al. 2401.08138 null
2023-06-09 S$^{3}$: Increasing GPU Utilization during Generative Inference for Higher Throughput Yunho Jin et.al. 2306.06000 null