LLM Papers
Updated on 2025.08.02
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-07-23 | KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider | Jiahao Wang et.al. | 2506.02634 | link |
2025-07-23 | GTA: Grouped-head latenT Attention | Luoyang Sun et.al. | 2506.17286 | null |
2025-07-23 | DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training | Zhixin Wang et.al. | 2507.13833 | null |
2025-07-23 | Learning to Extract Rational Evidence via Reinforcement Learning for Retrieval-Augmented Generation | Xinping Zhao et.al. | 2507.15586 | null |
2025-07-23 | Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning | Yanjun Zheng et.al. | 2507.16802 | null |
2025-07-23 | WAKENLLM: Evaluating Reasoning Potential and Stability in LLMs via Fine-Grained Benchmarking | Zipeng Ling et.al. | 2507.16199 | null |
2025-07-23 | HyDRA: A Hybrid-Driven Reasoning Architecture for Verifiable Knowledge Graphs | Adrian Kaiser et.al. | 2507.15917 | null |
2025-07-23 | Thinking Isn’t an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations | Zhao Song et.al. | 2507.17699 | null |
2025-07-23 | Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks | Ilias Chatzistefanidis et.al. | 2507.17695 | null |
2025-07-23 | CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning | Lingxiao Tang et.al. | 2507.17548 | null |
2025-07-23 | Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning | Yu Li et.al. | 2507.17512 | null |
2025-07-23 | An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models | Haoran Sun et.al. | 2507.17477 | null |
2025-07-23 | MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs | Alexander R. Fabbri et.al. | 2507.17476 | null |
2025-07-23 | Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning | Situo Zhang et.al. | 2507.17448 | null |
2025-07-23 | HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs | Zhaolin Cai et.al. | 2507.17394 | null |
2025-07-23 | DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning | Chuzhan Hao et.al. | 2507.17365 | null |
2025-07-23 | R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning | Zhuokun Chen et.al. | 2507.17307 | null |
2025-07-23 | Triple X: A LLM-Based Multilingual Speech Recognition System for the INTERSPEECH2025 MLC-SLM Challenge | Miaomiao Gao et.al. | 2507.17288 | null |
2025-07-23 | Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance | Rishi Parekh et.al. | 2507.17273 | null |
2025-07-23 | Agent Identity Evals: Measuring Agentic Identity | Elija Perrier et.al. | 2507.17257 | null |
2025-07-23 | R4ec: A Reasoning, Reflection, and Refinement Framework for Recommendation Systems | Hao Gu et.al. | 2507.17249 | null |
2025-07-23 | HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery | Haoran Jiang et.al. | 2507.17209 | null |
2025-07-23 | Improving LLMs’ Generalized Reasoning Abilities by Graph Problems | Qifan Zhang et.al. | 2507.17168 | null |
2025-07-23 | CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards | Cheng Liu et.al. | 2507.17147 | null |
2025-07-23 | Resilient Multi-Agent Negotiation for Medical Supply Chains:Integrating LLMs and Blockchain for Transparent Coordination | Mariam ALMutairi et.al. | 2507.17134 | null |
2025-07-22 | CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning | Xiaoya Li et.al. | 2507.14111 | null |
2025-07-22 | Reasoning Models Can be Easily Hacked by Fake Reasoning Bias | Qian Wang et.al. | 2507.13758 | null |
2025-07-22 | Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters | Shanbo Cheng et.al. | 2507.13618 | null |
2025-07-22 | Gemini 2.5 Pro Capable of Winning Gold at IMO 2025 | Yichen Huang et.al. | 2507.15855 | null |
2025-07-22 | X-Intelligence 3.0: Training and Evaluating Reasoning LLM for Semiconductor Display | Xiaolin Yan et.al. | 2507.14430 | null |
2025-07-22 | Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning | Hongyin Luo et.al. | 2507.16784 | null |
2025-07-22 | WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding | Ran Wang et.al. | 2507.16768 | null |
2025-07-22 | Towards Compute-Optimal Many-Shot In-Context Learning | Shahriar Golchin et.al. | 2507.16217 | null |
2025-07-22 | ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning | Chi-Pin Huang et.al. | 2507.16815 | null |
2025-07-22 | LingBench++: A Linguistically-Informed Benchmark and Reasoning Framework for Multi-Step and Cross-Cultural Inference with LLMs | Da-Chen Lian et.al. | 2507.16809 | null |
2025-07-22 | When LLMs Copy to Think: Uncovering Copy-Guided Attacks in Reasoning LLMs | Yue Li et.al. | 2507.16773 | null |
2025-07-22 | P-CoT: A Pedagogically-motivated Participatory Chain-of-Thought Prompting for Phonological Reasoning in LLMs | Dongjun Jang et.al. | 2507.16656 | null |
2025-07-22 | Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications | Jean Lelong et.al. | 2507.16507 | null |
2025-07-22 | Learning Temporal Abstractions via Variational Homomorphisms in Option-Induced Abstract MDPs | Chang Li et.al. | 2507.16473 | null |
2025-07-22 | LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning | Bo Hou et.al. | 2507.16395 | null |
2025-07-22 | Re:Form – Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny | Chuanhao Yan et.al. | 2507.16331 | null |
2025-07-22 | Mind the Gap: Evaluating the Representativeness of Quantitative Medical Language Reasoning LLM Benchmarks for African Disease Burdens | Fred Mutisya et.al. | 2507.16322 | null |
2025-07-22 | Perovskite-R1: A Domain-Specialized LLM for Intelligent Discovery of Precursor Additives and Experimental Design | Xin-De Wang et.al. | 2507.16307 | null |
2025-07-22 | Do Large Language Models Have a Planning Theory of Mind? Evidence from MindGames: a Multi-Step Persuasion Task | Jared Moore et.al. | 2507.16196 | null |
2025-07-22 | Emergent Cognitive Convergence via Implementation: A Structured Loop Reflecting Four Theories of Mind (A Position Paper) | Myung Ho Kim et.al. | 2507.16184 | null |
2025-07-22 | LoRA is All You Need for Safety Alignment of Reasoning LLMs | Yihao Xue et.al. | 2507.17075 | null |
2025-07-22 | Controllable Hybrid Captioner for Improved Long-form Video Understanding | Kuleen Sasse et.al. | 2507.17047 | null |
2025-07-22 | Text-to-SPARQL Goes Beyond English: Multilingual Question Answering Over Knowledge Graphs through Human-Inspired Reasoning | Aleksandr Perevalov et.al. | 2507.16971 | null |
2025-07-22 | AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation | Nima Fathi et.al. | 2507.16940 | null |
2025-07-22 | CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos | Xuchen Li et.al. | 2507.16878 | null |
2025-07-21 | A Survey of Context Engineering for Large Language Models | Lingrui Mei et.al. | 2507.13334 | null |
2025-07-21 | The Impact of Language Mixing on Bilingual LLM Reasoning | Yihao Li et.al. | 2507.15849 | null |
2025-07-21 | Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning | Sneheel Sarangi et.al. | 2507.15788 | null |
2025-07-21 | Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR | Jiakang Wang et.al. | 2507.15778 | null |
2025-07-21 | A Framework for Analyzing Abnormal Emergence in Service Ecosystems Through LLM-based Agent Intention Mining | Yifan Shen et.al. | 2507.15770 | null |
2025-07-21 | Understanding Large Language Models’ Ability on Interdisciplinary Research | Yuanhao Shen et.al. | 2507.15736 | null |
2025-07-21 | BEnchmarking LLMs for Ophthalmology (BELO) for Ophthalmological Knowledge and Reasoning | Sahana Srinivasan et.al. | 2507.15717 | null |
2025-07-21 | Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked? | Seok Hwan Song et.al. | 2507.15707 | null |
2025-07-21 | CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models | Congmin Zheng et.al. | 2507.15698 | null |
2025-07-21 | P3: Prompts Promote Prompting | Xinyu Zhang et.al. | 2507.15675 | null |
2025-07-21 | BugScope: Learn to Find Bugs Like Human | Jinyao Guo et.al. | 2507.15671 | null |
2025-07-21 | PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors | Yimeng Chen et.al. | 2507.15550 | null |
2025-07-21 | LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning | Cole Robertson et.al. | 2507.15521 | null |
2025-07-21 | Step-level Verifier-guided Hybrid Test-Time Scaling for Large Language Models | Kaiyan Chang et.al. | 2507.15512 | null |
2025-07-21 | AlgoSimBench: Identifying Algorithmically Similar Problems for Competitive Programming | Jierui Li et.al. | 2507.15378 | null |
2025-07-21 | StackTrans: From Large Language Model to Large Pushdown Automata Model | Kechi Zhang et.al. | 2507.15343 | null |
2025-07-21 | Reasoning Models are Test Exploiters: Rethinking Multiple-Choice | Narun Raman et.al. | 2507.15337 | null |
2025-07-21 | Input Reduction Enhanced LLM-based Program Repair | Boyang Yang et.al. | 2507.15251 | null |
2025-07-21 | SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search | Xiaofeng Shi et.al. | 2507.15245 | null |
2025-07-21 | FaultLine: Automated Proof-of-Vulnerability Generation Using LLM Agents | Vikram Nitin et.al. | 2507.15241 | null |
2025-07-21 | Solving Formal Math Problems by Decomposition and Iterative Reflection | Yichi Zhou et.al. | 2507.15225 | null |
2025-07-21 | Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization | Shengchao Liu et.al. | 2507.16110 | null |
2025-07-21 | Deep Researcher with Test-Time Diffusion | Rujun Han et.al. | 2507.16075 | null |
2025-07-21 | Learning without training: The implicit dynamics of in-context learning | Benoit Dherin et.al. | 2507.16003 | null |
2025-07-21 | Does More Inference-Time Compute Really Help Robustness? | Tong Wu et.al. | 2507.15974 | null |
2025-07-20 | Lizard: An Efficient Linearization Framework for Large Language Models | Chien Van Nguyen et.al. | 2507.09025 | null |
2025-07-20 | Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback | Yiyuan Yang et.al. | 2507.15066 | null |
2025-07-20 | WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization | Zhengwei Tao et.al. | 2507.15061 | null |
2025-07-20 | Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding | Yuanhan Zhang et.al. | 2507.15028 | null |
2025-07-20 | RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback | Qiaoyu Tang et.al. | 2507.15024 | null |
2025-07-20 | EduThink4AI: Translating Educational Critical Thinking into Multi-Agent LLM Systems | Xinmeng Hou et.al. | 2507.15015 | null |
2025-07-20 | AlphaAlign: Incentivizing Safety Alignment with Extremely Simplified Reinforcement Learning | Yi Zhang et.al. | 2507.14987 | null |
2025-07-20 | MUR: Momentum Uncertainty guided Reasoning for Large Language Models | Hang Yan et.al. | 2507.14958 | null |
2025-07-20 | LEKIA: A Framework for Architectural Alignment via Expert Knowledge Injection | Boning Zhao et.al. | 2507.14944 | null |
2025-07-20 | Feedback-Induced Performance Decline in LLM-Based Decision-Making | Xiao Yang et.al. | 2507.14906 | null |
2025-07-20 | InsightX Agent: An LMM-based Agentic Framework with Integrated Tools for Reliable X-ray NDT Analysis | Jiale Liu et.al. | 2507.14899 | null |
2025-07-20 | MEKiT: Multi-source Heterogeneous Knowledge Injection Method via Instruction Tuning for Emotion-Cause Pair Extraction | Shiyi Mu et.al. | 2507.14887 | null |
2025-07-20 | Large Language Model as An Operator: An Experience-Driven Solution for Distribution Network Voltage Control | Xu Yang et.al. | 2507.14800 | null |
2025-07-20 | Exploring the In-Context Learning Capabilities of LLMs for Money Laundering Detection in Financial Graphs | Erfan Pirmorad et.al. | 2507.14785 | null |
2025-07-20 | LeAdQA: LLM-Driven Context-Aware Temporal Grounding for Video Question Answering | Xinxin Dong et.al. | 2507.14784 | null |
2025-07-20 | Omni-Think: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards | Derek Li et.al. | 2507.14783 | null |
2025-07-19 | Draft-based Approximate Inference for LLMs | Kevin Galim et.al. | 2506.08373 | link |
2025-07-19 | Dynamic Context Tuning for Retrieval-Augmented Generation: Enhancing Multi-Turn Planning and Tool Adaptation | Jubin Abhishek Soni et.al. | 2506.11092 | null |
2025-07-19 | Mind the Gap: A Review of Arabic Post-Training Datasets and Their Limitations | Mohammed Alkhowaiter et.al. | 2507.14688 | null |
2025-07-19 | Agentic Satellite-Augmented Low-Altitude Economy and Terrestrial Networks: A Survey on Generative Approaches | Xiaozheng Gao et.al. | 2507.14633 | null |
2025-07-19 | Retrieval-Augmented Clinical Benchmarking for Contextual Model Testing in Kenyan Primary Care: A Methodology Paper | Fred Mutisya et.al. | 2507.14615 | null |
2025-07-19 | What do Large Language Models know about materials? | Adrian Ehrenhofer et.al. | 2507.14586 | null |
2025-07-19 | Explainable Collaborative Problem Solving Diagnosis with BERT using SHAP and its Implications for Teacher Adoption | Kester Wong et.al. | 2507.14584 | null |
2025-07-19 | Amico: An Event-Driven Modular Framework for Persistent and Embedded Autonomy | Hongyi Yang et.al. | 2507.14513 | null |
2025-07-18 | LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues | Haoyang Li et.al. | 2507.13681 | null |
2025-07-18 | DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration | Xiyun Li et.al. | 2507.14088 | null |
2025-07-18 | Efficient Temporal Tokenization for Mobility Prediction with Large Language Models | Haoyu He et.al. | 2507.14017 | null |
2025-07-18 | DUALRec: A Hybrid Sequential and Language Model Framework for Context-Aware Movie Recommendation | Yitong Li et.al. | 2507.13957 | null |
2025-07-18 | Cross-modal Causal Intervention for Alzheimer’s Disease Prediction | Yutao Jin et.al. | 2507.13956 | null |
2025-07-18 | InTraVisTo: Inside Transformer Visualisation Tool | Nicolò Brunello et.al. | 2507.13858 | null |
2025-07-18 | Team of One: Cracking Complex Video QA with Model Synergy | Jun Xie et.al. | 2507.13820 | null |
2025-07-18 | Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques | Niveen O. Jaffal et.al. | 2507.13629 | null |
2025-07-18 | BifrostRAG: Bridging Dual Knowledge Graphs for Multi-Hop Question Answering in Construction Safety | Yuxin Zhang et.al. | 2507.13625 | null |
2025-07-18 | Fail Fast, or Ask: Mitigating the Deficiencies of Reasoning LLMs with Human-in-the-Loop Systems Engineering | Michael J. Zellinger et.al. | 2507.14406 | null |
2025-07-18 | NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers | Sarunas Kalade et.al. | 2507.14403 | null |
2025-07-18 | NetIntent: Leveraging Large Language Models for End-to-End Intent-Based SDN Automation | Md. Kamrul Hossain et.al. | 2507.14398 | null |
2025-07-18 | ProofCompass: Enhancing Specialized Provers with LLM Guidance | Nicolas Wischermann et.al. | 2507.14335 | null |
2025-07-18 | How LLMs Comprehend Temporal Meaning in Narratives: A Case Study in Cognitive Evaluation of LLMs | Karin de Langis et.al. | 2507.14307 | null |
2025-07-18 | A Simple “Try Again” Can Elicit Multi-Turn LLM Reasoning | Licheng Liu et.al. | 2507.14295 | null |
2025-07-18 | Impact of Code Context and Prompting Strategies on Automated Unit Test Generation with Modern General-Purpose Large Language Models | Jakub Walczak et.al. | 2507.14256 | null |
2025-07-17 | LLM-Driven Dual-Level Multi-Interest Modeling for Recommendation | Ziyan Wang et.al. | 2507.10917 | null |
2025-07-17 | MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks | Artem Chervyakov et.al. | 2507.12284 | null |
2025-07-17 | Aime: Towards Fully-Autonomous Multi-Agent Framework | Yexuan Shi et.al. | 2507.11988 | null |
2025-07-17 | VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding | Shihao Wang et.al. | 2507.13353 | null |
2025-07-17 | VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning | Senqiao Yang et.al. | 2507.13348 | null |
2025-07-17 | Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes | Tyler Loakman et.al. | 2507.13335 | null |
2025-07-17 | The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner | Zhouqi Hua et.al. | 2507.13332 | null |
2025-07-17 | QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation | Jiazheng Li et.al. | 2507.13266 | null |
2025-07-17 | HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models | Ashray Gupta et.al. | 2507.13238 | null |
2025-07-17 | Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities | Hao Sun et.al. | 2507.13158 | null |
2025-07-17 | SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models | Xiangyu Dong et.al. | 2507.13152 | null |
2025-07-17 | MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems | Yu Cui et.al. | 2507.13038 | null |
2025-07-17 | Probabilistic Soundness Guarantees in LLM Reasoning Chains | Weiqiu You et.al. | 2507.12948 | null |
2025-07-17 | Agentar-DeepFinance-300K: A Large-Scale Financial Dataset via Systematic Chain-of-Thought Synthesis Optimization | Xiaoke Zhao et.al. | 2507.12901 | null |
2025-07-17 | VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks | Jian Yao et.al. | 2507.12885 | null |
2025-07-17 | DEMONSTRATE: Zero-shot Language to Robotic Control via Multi-task Demonstration Learning | Rahel Rickenbach et.al. | 2507.12855 | null |
2025-07-17 | A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models | Weijieying Ren et.al. | 2507.12774 | null |
2025-07-17 | osmAG-LLM: Zero-Shot Open-Vocabulary Object Navigation via Semantic Maps and Large Language Models Reasoning | Fujing Xie et.al. | 2507.12753 | null |
2025-07-17 | TransEvalnia: Reasoning-based Evaluation and Ranking of Translations | Richard Sproat et.al. | 2507.12724 | null |
2025-07-17 | Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation | Genki Kusano et.al. | 2507.13525 | null |
2025-07-17 | Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers | Liang Lin et.al. | 2507.13474 | null |
2025-07-17 | Intent-Based Network for RAN Management with Large Language Models | Fransiscus Asisi Bimo et.al. | 2507.14230 | null |
2025-07-17 | Why Braking? Scenario Extraction and Reasoning Utilizing LLM | Yin Wu et.al. | 2507.15874 | null |
2025-07-16 | Simple Mechanistic Explanations for Out-Of-Context Reasoning | Atticus Wang et.al. | 2507.08218 | null |
2025-07-16 | The Challenge of Teaching Reasoning to LLMs Without RL or Distillation | Wei Du et.al. | 2507.09850 | null |
2025-07-16 | Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs | Yangning Li et.al. | 2507.09477 | null |
2025-07-16 | Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize? | Yanjian Zhang et.al. | 2507.11423 | null |
2025-07-16 | GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning | Ziru Liu et.al. | 2507.10628 | null |
2025-07-16 | IAM: Efficient Inference through Attention Mapping between Different-scale LLMs | Yi Zhao et.al. | 2507.11953 | null |
2025-07-16 | Assessing the Value of Visual Input: A Benchmark of Multimodal Large Language Models for Robotic Path Planning | Jacinto Colan et.al. | 2507.12391 | null |
2025-07-16 | Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics | Meysam Alizadeh et.al. | 2507.12372 | null |
2025-07-16 | Thought Purity: Defense Paradigm For Chain-of-Thought Attack | Zihao Xue et.al. | 2507.12314 | null |
2025-07-16 | Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning | Yuhao Chen et.al. | 2507.12215 | null |
2025-07-16 | Findings of MEGA: Maths Explanation with LLMs using the Socratic Method for Active Learning | Tosin Adewumi et.al. | 2507.12079 | null |
2025-07-16 | Evaluating the Ability of Large Language Models to Reason about Cardinal Directions, Revisited | Anthony G Cohn et.al. | 2507.12059 | null |
2025-07-16 | Watch, Listen, Understand, Mislead: Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation | Sahid Hossain Mustakim et.al. | 2507.11968 | null |
2025-07-16 | PoTPTQ: A Two-step Power-of-Two Post-training for LLMs | Xinyu Wang et.al. | 2507.11959 | null |
2025-07-16 | The benefits of query-based KGQA systems for complex and temporal questions in LLM era | Artem Alekseev et.al. | 2507.11954 | null |
2025-07-16 | Hyperphantasia: A Benchmark for Evaluating the Mental Visualization Capabilities of Multimodal LLMs | Mohammad Shahab Sepehri et.al. | 2507.11932 | null |
2025-07-16 | Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training | Mingjie Liu et.al. | 2507.12507 | null |
2025-07-16 | PARAM-1 BharatGen 2.9B Model | Kundeshwar Pundalik et.al. | 2507.13390 | null |
2025-07-15 | ContextCache: Context-Aware Semantic Cache for Multi-Turn Queries in Large Language Models | Jianxin Yan et.al. | 2506.22791 | null |
2025-07-15 | VerifyBench: A Systematic Benchmark for Evaluating Reasoning Verifiers Across Domains | Xuzhao Li et.al. | 2507.09884 | null |
2025-07-15 | Mixture of LoRA Experts with Multi-Modal and Multi-Granularity LLM Generative Error Correction for Accented Speech Recognition | Bingshen Mu et.al. | 2507.09116 | null |
2025-07-15 | Bridging Literature and the Universe Via A Multi-Agent Large Language Model System | Xiaowen Zhang et.al. | 2507.08958 | null |
2025-07-15 | MIRAGE: KV Cache Optimization through Parameter Remapping for Multi-tenant LLM Serving | Ruihao Li et.al. | 2507.11507 | null |
2025-07-15 | KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding | Luohe Shi et.al. | 2507.11273 | null |
2025-07-15 | How Many Instructions Can LLMs Follow at Once? | Daniel Jaroslawicz et.al. | 2507.11538 | null |
2025-07-15 | DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering | Yinsheng Li et.al. | 2507.11527 | null |
2025-07-15 | Modeling Code: Is Text All You Need? | Daniel Nichols et.al. | 2507.11467 | null |
2025-07-15 | LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer | Yaoxian Dong et.al. | 2507.11457 | null |
2025-07-15 | KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? | Soumadeep Saha et.al. | 2507.11408 | null |
2025-07-15 | DCR: Quantifying Data Contamination in LLMs Evaluation | Cheng Xu et.al. | 2507.11405 | null |
2025-07-15 | Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs | Gabriel Bo et.al. | 2507.11371 | null |
2025-07-15 | Guiding LLM Decision-Making with Fairness Reward Models | Zara Hall et.al. | 2507.11344 | null |
2025-07-15 | LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification | Fengxiao Tang et.al. | 2507.11310 | null |
2025-07-15 | Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems | Dany Moshkovich et.al. | 2507.11277 | null |
2025-07-15 | FMC: Formalization of Natural Language Mathematical Competition Problems | Jiaxuan Xie et.al. | 2507.11275 | null |
2025-07-15 | An Agentic Flow for Finite State Machine Extraction using Prompt Chaining | Fares Wael et.al. | 2507.11222 | null |
2025-07-15 | LLM-Augmented Symptom Analysis for Cardiovascular Disease Risk Prediction: A Clinical NLP | Haowei Yang et.al. | 2507.11052 | null |
2025-07-15 | Teach Me Sign: Stepwise Prompting LLM for Sign Language Production | Zhaoyi An et.al. | 2507.10972 | null |
2025-07-15 | Modeling Understanding of Story-Based Analogies Using Large Language Models | Kalit Inani et.al. | 2507.10957 | null |
2025-07-15 | Artificial Finance: How AI Thinks About Money | Orhan Erdem et.al. | 2507.10933 | null |
2025-07-15 | Evaluating Generated Commit Messages with Large Language Models | Qunhong Zeng et.al. | 2507.10906 | null |
2025-07-15 | General Modular Harness for LLM Agents in Multi-Turn Gaming Environments | Yuxuan Zhang et.al. | 2507.11633 | null |
2025-07-14 | InstCache: A Predictive Cache for LLM Serving | Longwei Zou et.al. | 2411.13820 | null |
2025-07-14 | DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving | Yuhan Liu et.al. | 2411.02820 | null |
2025-07-14 | GR-LLMs: Recent Advances in Generative Recommendation Based on Large Language Models | Zhen Yang et.al. | 2507.06507 | null |
2025-07-14 | PyVision: Agentic Vision with Dynamic Tooling | Shitian Zhao et.al. | 2507.07998 | null |
2025-07-14 | Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code | Keqin Bao et.al. | 2507.07498 | null |
2025-07-14 | ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism | Zedong Liu et.al. | 2507.10069 | null |
2025-07-14 | Fusing LLM Capabilities with Routing Data | Tao Feng et.al. | 2507.10540 | null |
2025-07-14 | CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks | Hongchao Jiang et.al. | 2507.10535 | null |
2025-07-14 | Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination | Mingqi Wu et.al. | 2507.10532 | null |
2025-07-14 | DeepResearch$^{\text{Eco}}$: A Recursive Agentic Workflow for Complex Scientific Question Answering in Ecology | Jennifer D’Souza et.al. | 2507.10522 | null |
2025-07-14 | Referential ambiguity and clarification requests: comparing human and LLM behaviour | Chris Madge et.al. | 2507.10445 | null |
2025-07-14 | Prompt Informed Reinforcement Learning for Visual Coverage Path Planning | Venkat Margapuri et.al. | 2507.10284 | null |
2025-07-14 | Toward Real-World Table Agents: Capabilities, Workflows, and Design Principles for LLM-based Table Intelligence | Jiaming Tian et.al. | 2507.10281 | null |
2025-07-14 | Breaking the Myth: Can Small Models Infer Postconditions Too? | Gehao Zhang et.al. | 2507.10182 | null |
2025-07-14 | Fusing Large Language Models with Temporal Transformers for Time Series Forecasting | Chen Su et.al. | 2507.10098 | null |
2025-07-14 | Foundation Model Driven Robotics: A Comprehensive Review | Muhammad Tayyab Khan et.al. | 2507.10087 | null |
2025-07-14 | LLMShot: Reducing snapshot testing maintenance via LLMs | Ergün Batuhan Kaynak et.al. | 2507.10062 | null |
2025-07-14 | Towards Applying Large Language Models to Complement Single-Cell Foundation Models | Steven Palayew et.al. | 2507.10039 | null |
2025-07-14 | Deep Hidden Cognition Facilitates Reliable Chain-of-Thought Reasoning | Zijun Chen et.al. | 2507.10007 | null |
2025-07-14 | DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models | Luolin Xiong et.al. | 2507.09955 | null |
2025-07-14 | Mechanistic Interpretability of LoRA-Adapted Language Models for Nuclear Reactor Safety Applications | Yoon Pyo Lee et.al. | 2507.09931 | null |
2025-07-14 | ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models | Yongheng Zhang et.al. | 2507.09876 | null |
2025-07-14 | Warehouse Spatial Question Answering with LLM Agent | Hsiang-Wei Huang et.al. | 2507.10778 | null |
2025-07-14 | ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning | Zhengyue Zhao et.al. | 2507.11500 | null |
2025-07-14 | Enhancing the Capabilities of Large Language Models for API calls through Knowledge Graphs | Ye Yang et.al. | 2507.10630 | null |
2025-07-14 | Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning | Zheng Zhang et.al. | 2507.10624 | null |
2025-07-14 | Game Theory Meets LLM and Agentic AI: Reimagining Cybersecurity for the Age of Intelligent Threats | Quanyan Zhu et.al. | 2507.10621 | null |
2025-07-14 | Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding | Ishraq Khan et.al. | 2507.12482 | null |
2025-07-14 | LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models | Dachuan Shi et.al. | 2507.14204 | null |
2025-07-13 | Perception-Aware Policy Optimization for Multimodal Reasoning | Zhenhailong Wang et.al. | 2507.06448 | null |
2025-07-13 | Prompting for Performance: Exploring LLMs for Configuring Software | Helge Spieker et.al. | 2507.09790 | null |
2025-07-13 | Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations | Bradley P. Allen et.al. | 2507.09751 | null |
2025-07-13 | Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces | Baturay Saglam et.al. | 2507.09709 | null |
2025-07-13 | Can AI Rely on the Systematicity of Truth? The Challenge of Modelling Normative Domains | Matthieu Queloz et.al. | 2507.09676 | null |
2025-07-13 | Can Group Relative Policy Optimization Improve Thai Legal Reasoning and Question Answering? | Pawitsapak Akarajaradwong et.al. | 2507.09638 | null |
2025-07-13 | AICrypto: A Comprehensive Benchmark For Evaluating Cryptography Capabilities of Large Language Models | Yu Wang et.al. | 2507.09580 | null |
2025-07-13 | Reframing SAR Target Recognition as Visual Reasoning: A Chain-of-Thought Dataset with Multimodal LLMs | Chaoran Li et.al. | 2507.09535 | null |
2025-07-13 | Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them | Neel Rajani et.al. | 2507.10616 | null |
2025-07-12 | DrugMCTS: a drug repurposing framework combining multi-agent, RAG and Monte Carlo Tree Search | Zerui Yang et.al. | 2507.07426 | null |
2025-07-12 | LLM-Stackelberg Games: Conjectural Reasoning Equilibria and Their Applications to Spearphishing | Quanyan Zhu et.al. | 2507.09407 | null |
2025-07-12 | StockSim: A Dual-Mode Order-Level Simulator for Evaluating Multi-Agent LLMs in Financial Markets | Charidimos Papadakis et.al. | 2507.09255 | null |
2025-07-12 | Towards Spatial Audio Understanding via Question Answering | Parthasaarathy Sudarsanam et.al. | 2507.09195 | null |
2025-07-12 | Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models | Ameen Ali et.al. | 2507.09185 | null |
2025-07-12 | OPENXRD: A Comprehensive Benchmark and Enhancement Framework for LLM/MLLM XRD Question Answering | Ali Vosoughi et.al. | 2507.09155 | null |
2025-07-12 | CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards | Taolin Zhang et.al. | 2507.09104 | null |
2025-07-12 | Learning from Synthetic Labs: Language Models as Auction Participants | Anand Shah et.al. | 2507.09083 | null |
2025-07-12 | Emergence of Hierarchical Emotion Organization in Large Language Models | Bo Zhao et.al. | 2507.10599 | null |
2025-07-12 | PLEX: Perturbation-free Local Explanations for LLM-Based Text Classification | Yogachandran Rahulamathavan et.al. | 2507.10596 | null |
2025-07-12 | LLM-Powered Quantum Code Transpilation | Nazanin Siavash et.al. | 2507.12480 | null |
2025-07-11 | Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model | Jing Liang et.al. | 2507.06892 | null |
2025-07-11 | StarDojo: Benchmarking Open-Ended Behaviors of Agentic Multimodal LLMs in Production-Living Simulations with Stardew Valley | Weihao Tan et.al. | 2507.07445 | null |
2025-07-11 | InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching | Yilun Wang et.al. | 2507.08523 | null |
2025-07-11 | xpSHACL: Explainable SHACL Validation using Retrieval-Augmented Generation and Large Language Models | Gustavo Correa Publio et.al. | 2507.08432 | null |
2025-07-11 | One Token to Fool LLM-as-a-Judge | Yulai Zhao et.al. | 2507.08794 | null |
2025-07-11 | ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way | Rajarshi Roy et.al. | 2507.08679 | null |
2025-07-11 | Introspection of Thought Helps AI Agents | Haoran Sun et.al. | 2507.08664 | null |
2025-07-11 | Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning | Xingguang Ji et.al. | 2507.08649 | null |
2025-07-11 | A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1 | Marcin Pietroń et.al. | 2507.08621 | null |
2025-07-11 | Agentic Large Language Models for Conceptual Systems Engineering and Design | Soheyl Massoudi et.al. | 2507.08619 | null |
2025-07-11 | AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs | Florian Grötschla et.al. | 2507.08616 | null |
2025-07-11 | AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling | Preslav Aleksandrov et.al. | 2507.08567 | null |
2025-07-11 | The AI Language Proficiency Monitor – Tracking the Progress of LLMs on Multilingual Benchmarks | David Pomerenke et.al. | 2507.08538 | null |
2025-07-11 | From Language to Logic: A Bi-Level Framework for Structured Reasoning | Keying Yang et.al. | 2507.08501 | null |
2025-07-11 | LLaPa: A Vision-Language Model Framework for Counterfactual-Aware Procedural Planning | Shibo Sun et.al. | 2507.08496 | null |
2025-07-11 | Using Large Language Models for Legal Decision-Making in Austrian Value-Added Tax Law: An Experimental Study | Marina Luketina et.al. | 2507.08468 | null |
2025-07-11 | ChainEdit: Propagating Ripple Effects in LLM Knowledge Editing through Logical Rule-Guided Chains | Zilu Dong et.al. | 2507.08427 | null |
2025-07-11 | Understanding Driving Risks using Large Language Models: Toward Elderly Driver Assessment | Yuki Yoshihara et.al. | 2507.08367 | null |
2025-07-11 | What Factors Affect LLMs and RLLMs in Financial Question Answering? | Peng Wang et.al. | 2507.08339 | null |
2025-07-11 | Agent Safety Alignment via Reinforcement Learning | Zeyang Sha et.al. | 2507.08270 | null |
2025-07-11 | A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning | Hiroshi Yoshihara et.al. | 2507.08267 | null |
2025-07-11 | InsightBuild: LLM-Powered Causal Reasoning in Smart Building Systems | Pinaki Prasad Guha Neogi et.al. | 2507.08235 | null |
2025-07-11 | Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural Reasoning | Chan Young Park et.al. | 2507.08224 | null |
2025-07-11 | OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique | Wasi Uddin Ahmad et.al. | 2507.09075 | null |
2025-07-11 | Infinite Video Understanding | Dell Zhang et.al. | 2507.09068 | null |
2025-07-11 | ALIGN: Prompt-based Attribute Alignment for Reliable, Responsible, and Personalized LLM-based Decision-Making | Bharadwaj Ravichandran et.al. | 2507.09037 | null |
2025-07-11 | How to Train a Leader: Hierarchical Reasoning in Multi-Agent LLMs | Andrew Estornell et.al. | 2507.08960 | null |
2025-07-11 | GraphRunner: A Multi-Stage Framework for Efficient and Accurate Graph-Based Retrieval | Savini Kashmira et.al. | 2507.08945 | null |
2025-07-11 | Optimizing Sequential Multi-Step Tasks with Parallel LLM Agents | Enhao Zhang et.al. | 2507.08944 | null |
2025-07-11 | From Sequence to Structure: Uncovering Substructure Reasoning in Transformers | Xinnan Dai et.al. | 2507.10435 | null |
2025-07-11 | An Offline Mobile Conversational Agent for Mental Health Support: Learning from Emotional Dialogues and Psychological Texts with Student-Centered Evaluation | Vimaleswar A et.al. | 2507.10580 | null |
2025-07-11 | Can Large Language Models Understand As Well As Apply Patent Regulations to Pass a Hands-On Patent Attorney Test? | Bhakti Khera et.al. | 2507.10576 | null |
2025-07-10 | Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs | Jiakun Fan et.al. | 2506.03296 | null |
2025-07-10 | A Survey on Latent Reasoning | Rui-Jie Zhu et.al. | 2507.06203 | null |
2025-07-10 | Skywork-R1V3 Technical Report | Wei Shen et.al. | 2507.06167 | null |
2025-07-10 | Rethinking Verification for LLM Code Generation: From Generation to Testing | Zihan Ma et.al. | 2507.06920 | null |
2025-07-10 | Shifting from Ranking to Set Selection for Retrieval Augmented Generation | Dahyun Lee et.al. | 2507.06838 | null |
2025-07-10 | Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs | Jeongseok Hyun et.al. | 2507.07990 | null |
2025-07-10 | KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows | Zaifeng Pan et.al. | 2507.07400 | null |
2025-07-10 | Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs | Ziyue Li et.al. | 2507.07996 | null |
2025-07-10 | Automating Expert-Level Medical Reasoning Evaluation of Large Language Models | Shuang Zhou et.al. | 2507.07988 | null |
2025-07-10 | MIRIX: Multi-Agent Memory System for LLM-Based Agents | Yu Wang et.al. | 2507.07957 | null |
2025-07-10 | DocCHA: Towards LLM-Augmented Interactive Online diagnosis System | Xinyi Liu et.al. | 2507.07870 | null |
2025-07-10 | MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving | Lu Xu et.al. | 2507.07818 | null |
2025-07-10 | SURPRISE3D: A Dataset for Spatial Understanding and Reasoning in Complex 3D Scenes | Jiaxin Huang et.al. | 2507.07781 | null |
2025-07-10 | When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance | Peizhang Shao et.al. | 2507.07748 | null |
2025-07-10 | Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization | Chengtao Jian et.al. | 2507.07723 | null |
2025-07-10 | Rationale-Enhanced Decoding for Multi-modal Chain-of-Thought | Shin’ya Yamaguchi et.al. | 2507.07685 | null |
2025-07-10 | PlanQA: A Benchmark for Spatial Reasoning in LLMs using Structured Representations | Fedor Rodionov et.al. | 2507.07644 | null |
2025-07-10 | Position: We Need An Algorithmic Understanding of Generative AI | Oliver Eberle et.al. | 2507.07544 | null |
2025-07-10 | PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving | Mihir Parmar et.al. | 2507.07495 | null |
2025-07-10 | RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning | Hongzhi Zhang et.al. | 2507.07451 | null |
2025-07-10 | SAND: Boosting LLM Agents with Self-Taught Action Deliberation | Yu Xia et.al. | 2507.07441 | null |
2025-07-10 | Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores | Vivek Chari et.al. | 2507.08143 | null |
2025-07-10 | Krul: Efficient State Restoration for Multi-turn Conversations with Dynamic Cross-layer KV Sharing | Junyi Wen et.al. | 2507.08045 | null |
2025-07-10 | Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions | Quanyan Zhu et.al. | 2507.08208 | null |
2025-07-10 | CTRLS: Chain-of-Thought Reasoning via Latent State-Transition | Junda Wu et.al. | 2507.08182 | null |
2025-07-10 | TableReasoner: Advancing Table Reasoning Framework with Large Language Models | Sishi Xiong et.al. | 2507.08046 | null |
2025-07-09 | Saffron-1: Safety Inference Scaling | Ruizhong Qiu et.al. | 2506.06444 | link |
2025-07-09 | Can LLMs Play Ô Ăn Quan Game? A Study of Multi-Step Planning and Decision Making | Sang Quang Nguyen et.al. | 2507.03711 | null |
2025-07-09 | FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models | Bo Pang et.al. | 2507.06057 | null |
2025-07-09 | Measuring how changes in code readability attributes affect code quality evaluation by Large Language Models | Igor Regis da Silva Simoes et.al. | 2507.05289 | null |
2025-07-09 | SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference | Qian Chen et.al. | 2507.06567 | null |
2025-07-09 | SpindleKV: A Novel KV Cache Reduction Method Balancing Both Shallow and Deep Layers | Zicong Tang et.al. | 2507.06517 | null |
2025-07-09 | Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor | Vatsal Agarwal et.al. | 2507.07106 | null |
2025-07-09 | Evaluating Large Multimodal Models for Nutrition Analysis: A Benchmark Enriched with Contextual Metadata | Bruce Coburn et.al. | 2507.07048 | null |
2025-07-09 | First Return, Entropy-Eliciting Explore | Tianyu Zheng et.al. | 2507.07017 | null |
2025-07-09 | Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs | Yahan Yu et.al. | 2507.06999 | null |
2025-07-09 | Are They All Good? Evaluating the Quality of CoTs in LLM-based Code Generation | Binquan Zhang et.al. | 2507.06980 | null |
2025-07-09 | Adaptive Termination for Multi-round Parallel Reasoning: An Universal Semantic Entropy-Guided Framework | Zenan Xu et.al. | 2507.06829 | null |
2025-07-09 | PenTest2.0: Towards Autonomous Privilege Escalation Using GenAI | Haitham S. Al-Sinani et.al. | 2507.06742 | null |
2025-07-09 | A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding | Zhenyang Liu et.al. | 2507.06719 | null |
2025-07-09 | From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization | Xinjie Chen et.al. | 2507.06573 | null |
2025-07-09 | Gradientsys: A Multi-Agent LLM Scheduler with ReAct Orchestration | Xinyuan Song et.al. | 2507.06520 | null |
2025-07-09 | Towards LLM-based Root Cause Analysis of Hardware Design Failures | Siyu Qiu et.al. | 2507.06512 | null |
2025-07-09 | Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning | Ziyang Wang et.al. | 2507.06485 | null |
2025-07-09 | Bridging the Plausibility-Validity Gap by Fine-Tuning a Reasoning-Enhanced LLM for Chemical Synthesis and Discovery | Malikussaid et.al. | 2507.07328 | null |
2025-07-09 | Frontier LLMs Still Struggle with Simple Reasoning Tasks | Alan Malek et.al. | 2507.07313 | null |
2025-07-09 | ViDove: A Translation Agent System with Multimodal Context and Memory-Augmented Reasoning | Yichen Lu et.al. | 2507.07306 | null |
2025-07-09 | CRISP: Complex Reasoning with Interpretable Step-based Plans | Matan Vetzler et.al. | 2507.08037 | null |
2025-07-09 | Integrating External Tools with Large Language Models to Improve Accuracy | Nripesh Niketan et.al. | 2507.08034 | null |
2025-07-09 | RAG Safety: Exploring Knowledge Poisoning Attacks to Retrieval-Augmented Generation | Tianzhe Zhao et.al. | 2507.08862 | null |
2025-07-08 | Activation Steering for Chain-of-Thought Compression | Seyedarmin Azizi et.al. | 2507.04742 | null |
2025-07-08 | MemOS: A Memory OS for AI System | Zhiyu Li et.al. | 2507.03724 | null |
2025-07-08 | Coding Triangle: How Does Large Language Model Understand Code? | Taolin Zhang et.al. | 2507.06138 | null |
2025-07-08 | PrefixAgent: An LLM-Powered Design Framework for Efficient Prefix Adder Optimization | Dongsheng Zuo et.al. | 2507.06127 | null |
2025-07-08 | Hierarchical Interaction Summarization and Contrastive Prompting for Explainable Recommendations | Yibin Liu et.al. | 2507.06044 | null |
2025-07-08 | Conditional Multi-Stage Failure Recovery for Embodied Agents | Youmna Farag et.al. | 2507.06016 | null |
2025-07-08 | CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation | Kushal Gajjar et.al. | 2507.06013 | null |
2025-07-08 | DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations | Nicholas Popovič et.al. | 2507.05997 | null |
2025-07-08 | Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval | Haiwen Li et.al. | 2507.05970 | null |
2025-07-08 | Current Practices for Building LLM-Powered Reasoning Tools Are Ad Hoc – and We Can Do Better | Aaron Bembenek et.al. | 2507.05886 | null |
2025-07-08 | KERAG_R: Knowledge-Enhanced Retrieval-Augmented Generation for Recommendation | Zeyuan Meng et.al. | 2507.05863 | null |
2025-07-08 | Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models | L’ea Dubois et.al. | 2507.05822 | null |
2025-07-08 | Creating a customisable freely-accessible Socratic AI physics tutor | Eugenio Tufino et.al. | 2507.05795 | null |
2025-07-08 | LeAD: The LLM Enhanced Planning System Converged with End-to-end Autonomous Driving | Yuhang Zhang et.al. | 2507.05754 | null |
2025-07-08 | ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark | He Wang et.al. | 2507.05727 | null |
2025-07-08 | Large Language Models for Agent-Based Modelling: Current and possible uses across the modelling cycle | Loïs Vanhée et.al. | 2507.05723 | null |
2025-07-08 | LLMs are Introvert | Litian Zhang et.al. | 2507.05638 | null |
2025-07-08 | Flipping Knowledge Distillation: Leveraging Small Models’ Expertise to Enhance LLMs in Text Matching | Mingzhe Li et.al. | 2507.05617 | null |
2025-07-08 | Structured Task Solving via Modular Embodied Intelligence: A Case Study on Rubik’s Cube | Chongshan Fan et.al. | 2507.05607 | null |
2025-07-08 | MLlm-DR: Towards Explainable Depression Recognition with MultiModal Large Language Models | Wei Zhang et.al. | 2507.05591 | null |
2025-07-08 | ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models | Jiaxu Tian et.al. | 2507.05568 | null |
2025-07-08 | Enhancing Test-Time Scaling of Large Language Models with Hierarchical Retrieval-Augmented MCTS | Alex ZH Dou et.al. | 2507.05557 | null |
2025-07-08 | An Ensemble Embedding Approach for Improving Semantic Caching Performance in LLM-based Systems | Shervin Ghaffari et.al. | 2507.07061 | null |
2025-07-08 | Exploring Task Performance with Interpretable Models via Sparse Auto-Encoders | Shun Wang et.al. | 2507.06427 | null |
2025-07-08 | Bridging AI and Software Security: A Comparative Vulnerability Assessment of LLM Agent Deployment Paradigms | Tarek Gasmi et.al. | 2507.06323 | null |
2025-07-08 | Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate | A. Bochkov et.al. | 2507.07129 | null |
2025-07-08 | “Amazing, They All Lean Left” – Analyzing the Political Temperaments of Current LLMs | W. Russell Neuman et.al. | 2507.08027 | null |
2025-07-07 | Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages | Samridhi Raj Sinha et.al. | 2507.01853 | null |
2025-07-07 | StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling | Meng Wei et.al. | 2507.05240 | null |
2025-07-07 | The Case for Instance-Optimized LLMs in OLAP Databases | Bardia Mohammadi et.al. | 2507.04967 | null |
2025-07-07 | Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation | Daichi Mukunoki et.al. | 2507.04697 | null |
2025-07-07 | Spatio-Temporal LLM: Reasoning about Environments and Actions | Haozhen Zheng et.al. | 2507.05258 | null |
2025-07-07 | Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions | Yuanzhe Hu et.al. | 2507.05257 | null |
2025-07-07 | Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning | Yana Wei et.al. | 2507.05255 | null |
2025-07-07 | From Fragments to Facts: A Curriculum-Driven DPO Approach for Generating Hindi News Veracity Explanations | Pulkit Bansal et.al. | 2507.05179 | null |
2025-07-07 | CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale | Jonathan Hyun et.al. | 2507.05178 | null |
2025-07-07 | VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots | Danil S. Grigorev et.al. | 2507.05118 | null |
2025-07-07 | From Autonomy to Agency: Agentic Vehicles for Human-Centered Mobility Systems | Jiangbo Yu et.al. | 2507.04996 | null |
2025-07-07 | MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction | Kaleem Ullah Qasim et.al. | 2507.04893 | null |
2025-07-07 | Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations | A. Bochkov et.al. | 2507.04886 | null |
2025-07-07 | FurniMAS: Language-Guided Furniture Decoration using Multi-Agent System | Toan Nguyen et.al. | 2507.04770 | null |
2025-07-07 | ABench-Physics: Benchmarking Physical Reasoning in LLMs via High-Difficulty and Dynamic Physics Problems | Yiming Zhang et.al. | 2507.04766 | null |
2025-07-07 | Large Language Models for Network Intrusion Detection Systems: Foundations, Implementations, and Future Directions | Shuo Yang et.al. | 2507.04752 | null |
2025-07-07 | LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction | Sungmin Lee et.al. | 2507.04748 | null |
2025-07-07 | Why We Feel What We Feel: Joint Detection of Emotions and Their Opinion Triggers in E-commerce | Arnav Attri et.al. | 2507.04708 | null |
2025-07-07 | UrbanMind: Towards Urban General Intelligence via Tool-Enhanced Retrieval-Augmented Generation and Multilevel Optimization | Kai Yang et.al. | 2507.04706 | null |
2025-07-07 | Trojan Horse Prompting: Jailbreaking Conversational Multimodal Models by Forging Assistant Message | Wei Duan et.al. | 2507.04673 | null |
2025-07-07 | VectorLLM: Human-like Extraction of Structured Building Contours vis Multimodal LLMs | Tao Zhang et.al. | 2507.04664 | null |
2025-07-07 | Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? | Yun Qu et.al. | 2507.04632 | null |
2025-07-07 | Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences | Yusong Zhang et.al. | 2507.04621 | null |
2025-07-07 | “Lost-in-the-Later”: Framework for Quantifying Contextual Grounding in Large Language Models | Yufei Tao et.al. | 2507.05424 | null |
2025-07-07 | Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning | Jaedong Hwang et.al. | 2507.05418 | null |
2025-07-07 | On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study | Riccardo Alberghi et.al. | 2507.05362 | null |
2025-07-07 | MindFlow: Revolutionizing E-commerce Customer Support with Multimodal LLM Agents | Ming Gong et.al. | 2507.05330 | null |
2025-07-07 | Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving | Zhenwen Liang et.al. | 2507.06804 | null |
2025-07-07 | DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning | Shreyas Vinaya Sathyanarayana et.al. | 2507.07060 | null |
2025-07-07 | Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding | Nidhi Bhatia et.al. | 2507.07120 | null |
2025-07-06 | KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs | Yuzhang Xie et.al. | 2507.02773 | null |
2025-07-06 | ESSA: Evolutionary Strategies for Scalable Alignment | Daria Korotyshova et.al. | 2507.04453 | null |
2025-07-06 | SFOOD: A Multimodal Benchmark for Comprehensive Food Attribute Analysis Beyond RGB with Spectral Insights | Zhenbo Xu et.al. | 2507.04412 | null |
2025-07-06 | LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers | Jingze Zhu et.al. | 2507.04404 | null |
2025-07-06 | Computed Tomography Visual Question Answering with Cross-modal Feature Graphing | Yuanhe Tian et.al. | 2507.04333 | null |
2025-07-06 | LearnLens: LLM-Enabled Personalised, Curriculum-Grounded Feedback with Educators in the Loop | Runcong Zhao et.al. | 2507.04295 | null |
2025-07-06 | AutoLayout: Closed-Loop Layout Synthesis via Slow-Fast Collaborative Reasoning | Weixing Chen et.al. | 2507.04293 | null |
2025-07-06 | M$^3$-Med: A Benchmark for Multi-lingual, Multi-modal, and Multi-hop Reasoning in Medical Instructional Video Understanding | Shenxi Liu et.al. | 2507.04289 | null |
2025-07-05 | SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs | Jiahui Wang et.al. | 2506.05344 | link |
2025-07-05 | SymbolicThought: Integrating Language Models and Symbolic Reasoning for Consistent and Interpretable Human Relationship Understanding | Runcong Zhao et.al. | 2507.04189 | null |
2025-07-05 | From Legal Text to Tech Specs: Generative AI’s Interpretation of Consent in Privacy Law | Aniket Kesari et.al. | 2507.04185 | null |
2025-07-05 | Dissecting Clinical Reasoning in Language Models: A Comparative Study of Prompts and Model Adaptation Strategies | Mael Jullien et.al. | 2507.04142 | null |
2025-07-05 | A Technical Survey of Reinforcement Learning Techniques for Large Language Models | Saksham Sahai Srivastava et.al. | 2507.04136 | null |
2025-07-05 | BYOKG-RAG: Multi-Strategy Graph Retrieval for Knowledge Graph Question Answering | Costas Mavromatis et.al. | 2507.04127 | null |
2025-07-05 | Beyond Independent Passages: Adaptive Passage Combination Retrieval for Retrieval Augmented Open-Domain Question Answering | Ting-Wen Ko et.al. | 2507.04069 | null |
2025-07-05 | LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language Models | Gaurav Srivastava et.al. | 2507.04023 | null |
2025-07-05 | Nunchi-Bench: Benchmarking Language Models on Cultural Reasoning with a Focus on Korean Superstition | Kyuhee Kim et.al. | 2507.04014 | null |
2025-07-05 | Toward Better Generalisation in Uncertainty Estimators: Leveraging Data-Agnostic Features | Thuy An Ha et.al. | 2507.03998 | null |
2025-07-05 | CoT-Segmenter: Enhancing OOD Detection in Dense Road Scenes via Chain-of-Thought Reasoning | Jeonghyo Song et.al. | 2507.03984 | null |
2025-07-05 | A Comparative Study of Specialized LLMs as Dense Retrievers | Hengran Zhang et.al. | 2507.03958 | null |
2025-07-05 | CortexDebate: Debating Sparsely and Equally for Multi-Agent Debate | Yiliu Sun et.al. | 2507.03928 | null |
2025-07-05 | A Survey on Proactive Defense Strategies Against Misinformation in Large Language Models | Shuliang Liu et.al. | 2507.05288 | null |
2025-07-04 | Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought | Tencent Hunyuan Team et.al. | 2505.15431 | null |
2025-07-04 | Economic Evaluation of LLMs | Michael J. Zellinger et.al. | 2507.03834 | null |
2025-07-04 | Agent-Based Detection and Resolution of Incompleteness and Ambiguity in Interactions with Large Language Models | Riya Naik et.al. | 2507.03726 | null |
2025-07-04 | Towards Machine Theory of Mind with Large Language Model-Augmented Inverse Planning | Rebekah A. Gelpí et.al. | 2507.03682 | null |
2025-07-04 | Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs | Valentina Wu et.al. | 2507.03659 | null |
2025-07-04 | EvoAgentX: An Automated Framework for Evolving Agentic Workflows | Yingxu Wang et.al. | 2507.03616 | null |
2025-07-04 | Benchmarking Vector, Graph and Hybrid Retrieval Augmented Generation (RAG) Pipelines for Open Radio Access Networks (ORAN) | Sarat Ahmad et.al. | 2507.03608 | null |
2025-07-04 | Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation | Tao Tang et.al. | 2507.03585 | null |
2025-07-04 | AI-VaxGuide: An Agentic RAG-Based LLM for Vaccination Decisions | Abdellah Zeggai et.al. | 2507.03493 | null |
2025-07-04 | REAL: Benchmarking Abilities of Large Language Models for Housing Transactions and Services | Kexin Zhu et.al. | 2507.03477 | null |
2025-07-04 | ElliottAgents: A Natural Language-Driven Multi-Agent System for Stock Market Analysis and Prediction | Jarosław A. Chudziak et.al. | 2507.03435 | null |
2025-07-04 | Graph Repairs with Large Language Models: An Empirical Study | Hrishikesh Terdalkar et.al. | 2507.03410 | null |
2025-07-04 | Effects of structure on reasoning in instance-level Self-Discover | Sachith Gunasekara et.al. | 2507.03347 | null |
2025-07-04 | Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky | Ashutosh Hathidara et.al. | 2507.03336 | null |
2025-07-04 | Read Quietly, Think Aloud: Decoupling Comprehension and Reasoning in LLMs | Yuanxin Wang et.al. | 2507.03327 | null |
2025-07-04 | LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents | Anand Gokhale et.al. | 2507.03293 | null |
2025-07-04 | CodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMs | Bruce Yang et.al. | 2507.03254 | null |
2025-07-04 | Efficient Knowledge Graph Construction and Retrieval from Unstructured Text for Large-Scale RAG Systems | Congmin Min et.al. | 2507.03226 | null |
2025-07-03 | Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding | Chengyue Wu et.al. | 2505.22618 | null |
2025-07-03 | Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test | Ziyue Li et.al. | 2506.21551 | null |
2025-07-03 | Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations | Wenhao Wang et.al. | 2507.01930 | null |
2025-07-03 | Self-Guided Process Reward Optimization with Redefined Step-wise Advantage for Process Reinforcement Learning | Wu Fei et.al. | 2507.01551 | null |
2025-07-03 | Symbolic or Numerical? Understanding Physics Problem Solving in Reasoning LLMs | Nifu Dan et.al. | 2507.01334 | null |
2025-07-03 | Mixture of Reasonings: Teach Large Language Models to Reason with Adaptive Strategies | Tao Xiong et.al. | 2507.00606 | null |
2025-07-03 | OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding | Ramchalam Kinattinkara Ramakrishnan et.al. | 2507.02659 | null |
2025-07-03 | Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation | Jiaer Xia et.al. | 2507.02859 | null |
2025-07-03 | MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs | Purbesh Mitra et.al. | 2507.02851 | null |
2025-07-03 | StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason | Kaiyi Zhang et.al. | 2507.02841 | null |
2025-07-03 | SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model | Wencheng Zhang et.al. | 2507.02822 | null |
2025-07-03 | Multimodal Mathematical Reasoning with Diverse Solving Perspective | Wenhao Shi et.al. | 2507.02804 | null |
2025-07-03 | Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models | Riccardo Cantini et.al. | 2507.02799 | null |
2025-07-03 | Moral Responsibility or Obedience: What Do We Want from AI? | Joseph Boland et.al. | 2507.02788 | null |
2025-07-03 | Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs | Ken Tsui et.al. | 2507.02778 | null |
2025-07-03 | Knowledge Protocol Engineering: A New Paradigm for AI in Domain-Specific Knowledge Work | Guangwei Zhang et.al. | 2507.02760 | null |
2025-07-03 | Early Signs of Steganographic Capabilities in Frontier LLMs | Artur Zolkowski et.al. | 2507.02737 | null |
2025-07-03 | Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving | Matthieu Zimmer et.al. | 2507.02726 | null |
2025-07-03 | Control at Stake: Evaluating the Security Landscape of LLM-Driven Email Agents | Jiangrong Wu et.al. | 2507.02699 | null |
2025-07-03 | VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning | Siran Chen et.al. | 2507.02626 | null |
2025-07-03 | Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory | Kenneth Payne et.al. | 2507.02618 | null |
2025-07-03 | DynamiCare: A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making | Tianqi Shang et.al. | 2507.02616 | null |
2025-07-03 | WebSailor: Navigating Super-human Reasoning for Web Agent | Kuan Li et.al. | 2507.02592 | null |
2025-07-03 | Clarifying Before Reasoning: A Coq Prover with Structural Context | Yanzhen Lu et.al. | 2507.02541 | null |
2025-07-03 | CyberRAG: An agentic RAG cyber attack classification and reporting tool | Francesco Blefari et.al. | 2507.02424 | null |
2025-07-03 | OMS: On-the-fly, Multi-Objective, Self-Reflective Ad Keyword Generation via LLM Agent | Bowen Chen et.al. | 2507.02353 | null |
2025-07-03 | Misaligned from Within: Large Language Models Reproduce Our Double-Loop Learning Blindness | Tim Rogers et.al. | 2507.02283 | null |
2025-07-03 | Uncertainty-aware Reward Design Process | Yang Yang et.al. | 2507.02256 | null |
2025-07-03 | Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation | Jungkoo Kang et.al. | 2507.02253 | null |
2025-07-03 | HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference | Weishu Deng et.al. | 2507.03153 | null |
2025-07-03 | RCA Copilot: Transforming Network Data into Actionable Insights via Large Language Models | Alexander Shan et.al. | 2507.03224 | null |
2025-07-03 | MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks | Dumitran Adrian Marius et.al. | 2507.03162 | null |
2025-07-03 | ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models | Boyang Xue et.al. | 2507.03133 | null |
2025-07-03 | RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents | Peisong Wang et.al. | 2507.03112 | null |
2025-07-03 | Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization | Marco Simoni et.al. | 2507.03051 | null |
2025-07-03 | Counterfactual Tuning for Temporal Sensitivity Enhancement in Large Language Model-based Recommendation | Yutian Liu et.al. | 2507.03047 | null |
2025-07-02 | Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU | He Sun et.al. | 2506.20187 | null |
2025-07-02 | EdgeLoRA: An Efficient Multi-Tenant LLM Serving System on Edge Devices | Zheyu Shen et.al. | 2507.01438 | null |
2025-07-02 | The Thin Line Between Comprehension and Persuasion in LLMs | Adrian de Wynter et.al. | 2507.01936 | null |
2025-07-02 | AI4Research: A Survey of Artificial Intelligence for Scientific Research | Qiguang Chen et.al. | 2507.01903 | null |
2025-07-02 | MiCoTA: Bridging the Learnability Gap with Intermediate CoT and Teacher Assistants | Dongyi Ding et.al. | 2507.01887 | null |
2025-07-02 | Bridging UI Design and chatbot Interactions: Applying Form-Based Principles to Conversational Agents | Sanjay Krishna Anbalagan et.al. | 2507.01862 | null |
2025-07-02 | Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training | Ismail Labiad et.al. | 2507.01752 | null |
2025-07-02 | Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture | Bochen Han et.al. | 2507.01701 | null |
2025-07-02 | Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling | Zeyu Huang et.al. | 2507.01679 | null |
2025-07-02 | Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems | Zhaoyan Sun et.al. | 2507.01599 | null |
2025-07-02 | Is External Information Useful for Stance Detection with LLMs? | Quang Minh Nguyen et.al. | 2507.01543 | null |
2025-07-02 | SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism | Beitao Chen et.al. | 2507.01513 | null |
2025-07-02 | Agent-as-Tool: A Study on the Hierarchical Decision Making with Reinforcement Learning | Yanfei Zhang et.al. | 2507.01489 | null |
2025-07-02 | BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments | Yibo Qiu et.al. | 2507.01485 | null |
2025-07-02 | A Large Language Model for Chemistry and Retrosynthesis Predictions | Yueqing Zhang et.al. | 2507.01444 | null |
2025-07-02 | RALLY: Role-Adaptive LLM-Driven Yoked Navigation for Agentic UAV Swarms | Ziyao Wang et.al. | 2507.01378 | null |
2025-07-02 | AI Agents and Agentic AI-Navigating a Plethora of Concepts for Future Manufacturing | Yinwang Ren et.al. | 2507.01376 | null |
2025-07-02 | Beyond Black-Box AI: Interpretable Hybrid Systems for Dementia Care | Matthew JY Kang et.al. | 2507.01282 | null |
2025-07-02 | Evaluating Large Language Models for Multimodal Simulated Ophthalmic Decision-Making in Diabetic Retinopathy and Glaucoma Screening | Cindy Lie Tabuse et.al. | 2507.01278 | null |
2025-07-02 | Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess | Dongyoon Hwang et.al. | 2507.00726 | null |
2025-07-02 | $μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation | Siyou Li et.al. | 2507.00316 | null |
2025-07-02 | Data Diversification Methods In Alignment Enhance Math Performance In LLMs | Berkan Dokmeci et.al. | 2507.02173 | null |
2025-07-02 | Synergizing Logical Reasoning, Knowledge Management and Collaboration in Multi-Agent LLM System | Adam Kostka et.al. | 2507.02170 | null |
2025-07-02 | Reasoning or Not? A Comprehensive Evaluation of Reasoning LLMs for Dialogue Summarization | Keyan Jin et.al. | 2507.02145 | null |
2025-07-02 | Structural Code Search using Natural Language Queries | Ben Limpanukorn et.al. | 2507.02107 | null |
2025-07-02 | Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs | Mohammad Ali Alomrani et.al. | 2507.02076 | null |
2025-07-02 | Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges | Sanjeda Akter et.al. | 2507.02074 | null |
2025-07-01 | SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning | Bo Liu et.al. | 2506.24119 | null |
2025-07-01 | PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning | Xingke Yang et.al. | 2507.01216 | null |
2025-07-01 | FlashDP: Private Training Large Language Models with Efficient DP-SGD | Liangyu Wang et.al. | 2507.01154 | null |
2025-07-01 | VEDA: Efficient LLM Generation Through Voting-based KV Cache Eviction and Dataflow-flexible Accelerator | Zhican Wang et.al. | 2507.00797 | null |
2025-07-01 | EARN: Efficient Inference Acceleration for LLM-based Generative Recommendation by Register Tokens | Chaoqun Yang et.al. | 2507.00715 | null |
2025-07-01 | Reasoning as an Adaptive Defense for Safety | Taeyoun Kim et.al. | 2507.00971 | null |
2025-07-01 | Large Language Model Powered Intelligent Urban Agents: Concepts, Capabilities, and Applications | Jindong Han et.al. | 2507.00914 | null |
2025-07-01 | Mathematics Isn’t Culture-Free: Probing Cultural Gaps via Entity and Scenario Perturbations | Aditya Tomar et.al. | 2507.00883 | null |
2025-07-01 | HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning | Zhi Jing et.al. | 2507.00833 | null |
2025-07-01 | ProxAnn: Use-Oriented Evaluations of Topic Models and Document Clustering | Alexander Hoyle et.al. | 2507.00828 | null |
2025-07-01 | Many LLMs Are More Utilitarian Than One | Anita Keshmirian et.al. | 2507.00814 | null |
2025-07-01 | Language-Unlocked ViT (LUViT): Empowering Self-Supervised Vision Transformers with LLMs | Selim Kuzucu et.al. | 2507.00754 | null |
2025-07-01 | AI Analyst: Framework and Comprehensive Evaluation of Large Language Models for Financial Time Series Report Generation | Elizabeth Fons et.al. | 2507.00718 | null |
2025-07-01 | Large Reasoning Models are not thinking straight: on the unreliability of thinking trajectories | Jhouben Cuesta-Ramirez et.al. | 2507.00711 | null |
2025-07-01 | Toward Edge General Intelligence with Multiple-Large Language Model (Multi-LLM): Architecture, Trust, and Orchestration | Haoxiang Luo et.al. | 2507.00672 | null |
2025-07-01 | Cognitive Load-Aware Inference: A Neuro-Symbolic Framework for Optimizing the Token Economy of Large Language Models | Yilun Zhang et.al. | 2507.00653 | null |
2025-07-01 | ChatHLS: Towards Systematic Design Automation and Optimization for High-Level Synthesis | Runkai Li et.al. | 2507.00642 | null |
2025-07-01 | Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning | Maggie Huan et.al. | 2507.00432 | null |
2025-07-01 | ASTRO: Teaching Language Models to Reason by Reflecting and Backtracking In-Context | Joongwon Kim et.al. | 2507.00417 | null |
2025-07-01 | Causal Prompting for Implicit Sentiment Analysis with Large Language Models | Jing Ren et.al. | 2507.00389 | null |
2025-07-01 | STELLA: Self-Evolving LLM Agent for Biomedical Research | Ruofan Jin et.al. | 2507.02004 | null |
2025-07-01 | Dynamic Strategy Adaptation in Multi-Agent Environments with Large Language Models | Shaurya Mallampati et.al. | 2507.02002 | null |
2025-06-30 | RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference | Yaoqi Chen et.al. | 2505.02922 | null |
2025-06-30 | The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements | Bingchen Zhao et.al. | 2506.22419 | null |
2025-06-30 | EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework | Chen Wang et.al. | 2506.22200 | null |
2025-06-30 | Large Language Models Don’t Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective | Anselm R. Strohmaier et.al. | 2506.24006 | null |
2025-06-30 | Advancing Multi-Step Mathematical Reasoning in Large Language Models through Multi-Layered Self-Reflection with Auto-Prompting | André de Souza Loureiro et.al. | 2506.23888 | null |
2025-06-30 | Garbage In, Reasoning Out? Why Benchmark Scores are Unreliable and What to Do About It | Seyed Mahed Mousavi et.al. | 2506.23864 | null |
2025-06-30 | A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents | Hang Su et.al. | 2506.23844 | null |
2025-06-30 | DABstep: Data Agent Benchmark for Multi-step Reasoning | Alex Egg et.al. | 2506.23719 | null |
2025-06-30 | If You Had to Pitch Your Ideal Software – Evaluating Large Language Models to Support User Scenario Writing for User Experience Experts and Laypersons | Patrick Stadler et.al. | 2506.23694 | null |
2025-06-30 | PokéAI: A Goal-Generating, Battle-Optimizing Multi-agent System for Pokemon Red | Zihao Liu et.al. | 2506.23689 | null |
2025-06-30 | Interactive Reasoning: Visualizing and Controlling Chain-of-Thought Reasoning in Large Language Models | Rock Yuren Pang et.al. | 2506.23678 | null |
2025-06-30 | Act-With-Think: Chunk Auto-Regressive Modeling for Generative Recommendation | Yifan Wang et.al. | 2506.23643 | null |
2025-06-30 | What to Keep and What to Drop: Adaptive Table Filtering Framework | Jang Won June et.al. | 2506.23463 | null |
2025-06-30 | Two-Stage Reasoning-Infused Learning: Improving Classification with LLM-Generated Reasoning | Mads Henrichsen et.al. | 2507.00214 | null |
2025-06-30 | Thinking About Thinking: SAGE-nano’s Inverse Reasoning for Self-Aware Language Models | Basab Jha et.al. | 2507.00092 | null |
2025-06-30 | State and Memory is All You Need for Robust and Reliable AI Agents | Matthew Muhoberac et.al. | 2507.00081 | null |
2025-06-29 | Comparative Evaluation of ChatGPT and DeepSeek Across Key NLP Tasks: Strengths, Weaknesses, and Domain-Specific Performance | Wael Etaiwi et.al. | 2506.18501 | null |
2025-06-29 | Do LLMs Dream of Discrete Algorithms? | Claudionor Coelho Jr et.al. | 2506.23408 | null |
2025-06-29 | GeoProg3D: Compositional Visual Reasoning for City-Scale 3D Language Fields | Shunsuke Yasuki et.al. | 2506.23352 | null |
2025-06-29 | Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games | David Guzman Piedrahita et.al. | 2506.23276 | null |
2025-06-29 | Predicting thinking time in Reasoning models | Hans Peter Lynsgøe Raaschou-jensen et.al. | 2506.23274 | null |
2025-06-29 | Token Activation Map to Visually Explain Multimodal LLMs | Yi Li et.al. | 2506.23270 | null |
2025-06-29 | Benchmarking Deep Search over Heterogeneous Enterprise Data | Prafulla Kumar Choubey et.al. | 2506.23139 | null |
2025-06-29 | Format-Adapter: Improving Reasoning Capability of LLMs by Adapting Suitable Format | Dingzirui Wang et.al. | 2506.23133 | null |
2025-06-29 | Are Large Language Models Capable of Deep Relational Reasoning? Insights from DeepSeek-R1 and Benchmark Comparisons | Chi Chiu So et.al. | 2506.23128 | null |
2025-06-29 | Decoding Memes: Benchmarking Narrative Role Classification across Multilingual and Multimodal Models | Shivam Sharma et.al. | 2506.23122 | null |
2025-06-29 | Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation | Zhenhua Ning et.al. | 2506.23120 | null |
2025-06-29 | Repair Ingredients Are All You Need: Improving Large Language Model-Based Program Repair via Repair Ingredients Search | Jiayi Zhang et.al. | 2506.23100 | null |
2025-06-29 | Boosting LLM’s Molecular Structure Elucidation with Knowledge Enhanced Tree Search Reasoning | Xiang Zhuang et.al. | 2506.23056 | null |
2025-06-29 | AURA: Agent for Understanding, Reasoning, and Automated Tool Use in Voice-Driven Tasks | Leander Melroy Maben et.al. | 2506.23049 | null |
2025-06-28 | Efficiently Serving Large Multimodal Models Using EPD Disaggregation | Gursimran Singh et.al. | 2501.05460 | link |
2025-06-28 | Agent-to-Agent Theory of Mind: Testing Interlocutor Awareness among Large Language Models | Younwoo Choi et.al. | 2506.22957 | null |
2025-06-28 | Evaluating and Improving Large Language Models for Competitive Program Generation | Minnan Wei et.al. | 2506.22954 | null |
2025-06-28 | Improving Rationality in the Reasoning Process of Language Models through Self-playing Game | Pinzheng Wang et.al. | 2506.22920 | null |
2025-06-28 | ReasonBridge: Efficient Reasoning Transfer from Closed to Open-Source Language Models | Ziqi Zhong et.al. | 2506.22865 | null |
2025-06-28 | Prompting without Panic: Attribute-aware, Zero-shot, Test-Time Calibration | Ramya Hebbalaguppe et.al. | 2506.22819 | null |
2025-06-27 | Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference | Yaohua Tang et.al. | 2502.15294 | null |
2025-06-27 | Dynamic Knowledge Exchange and Dual-diversity Review: Concisely Unleashing the Potential of a Multi-Agent Research Team | Weilun Yu et.al. | 2506.18348 | null |
2025-06-27 | SegChange-R1: LLM-Augmented Remote Sensing Change Detection | Fei Zhou et.al. | 2506.17944 | null |
2025-06-27 | KunLunBaizeRAG: Reinforcement Learning Driven Inference Performance Leap for Large Language Models | Cheng Li et.al. | 2506.19466 | null |
2025-06-27 | QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization | Danush Khanna et.al. | 2506.22396 | null |
2025-06-27 | SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference | Yongchao He et.al. | 2506.22033 | null |
2025-06-27 | A Survey of LLM Inference Systems | James Pan et.al. | 2506.21901 | null |
2025-06-27 | Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment | Yue Zhang et.al. | 2506.22385 | null |
2025-06-27 | Probabilistic Optimality for Inference-time Scaling | Youkang Wang et.al. | 2506.22376 | null |
2025-06-27 | Concept-Level AI for Telecom: Moving Beyond Large Language Models | Viswanath Kumarskandpriya et.al. | 2506.22359 | null |
2025-06-27 | Training Language Model to Critique for Better Refinement | Tianshu Yu et.al. | 2506.22157 | null |
2025-06-27 | Lost at the Beginning of Reasoning | Baohao Liao et.al. | 2506.22058 | null |
2025-06-27 | LMPVC and Policy Bank: Adaptive voice control for industrial robots with code generating LLMs and reusable Pythonic policies | Ossi Parikka et.al. | 2506.22028 | null |
2025-06-27 | Literature-Grounded Novelty Assessment of Scientific Ideas | Simra Shahid et.al. | 2506.22026 | null |
2025-06-27 | More Vulnerable than You Think: On the Stability of Tool-Integrated LLM Agents | Weimin Xiong et.al. | 2506.21967 | null |
2025-06-27 | CAL-RAG: Retrieval-Augmented Multi-Agent Generation for Content-Aware Layout Design | Najmeh Forouzandehmehr et.al. | 2506.21934 | null |
2025-06-27 | ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation | Reza Yousefi Maragheh et.al. | 2506.21931 | null |
2025-06-27 | SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding | Zhao Jin et.al. | 2506.21924 | null |
2025-06-27 | URSA: The Universal Research and Scientific Agent | Michael Grosskopf et.al. | 2506.22653 | null |
2025-06-27 | ReCo: Reminder Composition Mitigates Hallucinations in Vision-Language Models | Sotirios Panagiotis Chytas et.al. | 2506.22636 | null |
2025-06-27 | The Hidden Link Between RLHF and Contrastive Learning | Xufei Lv et.al. | 2506.22578 | null |
2025-06-27 | MetaCipher: A General and Extensible Reinforcement Learning Framework for Obfuscation-Based Jailbreak Attacks on Black-Box LLMs | Boyuan Chen et.al. | 2506.22557 | null |
2025-06-26 | From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents | Weizhi Zhang et.al. | 2506.18959 | null |
2025-06-26 | Enhancing User Engagement in Socially-Driven Dialogue through Interactive LLM Alignments | Jiashuo Wang et.al. | 2506.21497 | null |
2025-06-26 | Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning | Xin Xu et.al. | 2506.21285 | null |
2025-06-26 | HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context | Qize Yang et.al. | 2506.21277 | null |
2025-06-26 | Complexity-aware fine-tuning | Andrey Goncharov et.al. | 2506.21220 | null |
2025-06-26 | Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? | Haoang Chi et.al. | 2506.21215 | null |
2025-06-26 | $T^3$: Multi-level Tree-based Automatic Program Repair with Large Language Models | Quanming Liu et.al. | 2506.21211 | null |
2025-06-26 | MT2-CSD: A New Dataset and Multi-Semantic Knowledge Fusion Method for Conversational Stance Detection | Fuqiang Niu et.al. | 2506.21053 | null |
2025-06-26 | Large Language Models Acing Chartered Accountancy | Jatin Gupta et.al. | 2506.21031 | null |
2025-06-26 | STEP Planner: Constructing cross-hierarchical subgoal tree as an embodied long-horizon task planner | Zhou Tianxing et.al. | 2506.21030 | null |
2025-06-26 | LLM-guided Chemical Process Optimization with a Multi-Agent Approach | Tong Zeng et.al. | 2506.20921 | null |
2025-06-26 | FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing | Advait Gupta et.al. | 2506.20911 | null |
2025-06-26 | Evaluating List Construction and Temporal Understanding capabilities of Large Language Models | Alexandru Dumitru et.al. | 2506.21783 | null |
2025-06-26 | THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning? | Xin Wang et.al. | 2506.21763 | null |
2025-06-26 | Hierarchical Reasoning Model | Guan Wang et.al. | 2506.21734 | null |
2025-06-26 | SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents | Wanxin Tian et.al. | 2506.21669 | null |
2025-06-26 | APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization | Minjie Hong et.al. | 2506.21655 | null |
2025-06-26 | Weak-to-Strong GraphRAG: Aligning Weak Retrievers with Large Language Models for Graph-based Retrieval Augmented Generation | Deyu Zou et.al. | 2506.22518 | null |
2025-06-25 | No Free Lunch: Rethinking Internal Feedback for LLM Reasoning | Yanzhi Zhang et.al. | 2506.17219 | null |
2025-06-25 | Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning | Lixin Wu et.al. | 2506.18330 | null |
2025-06-25 | Thought Anchors: Which LLM Reasoning Steps Matter? | Paul C. Bogdan et.al. | 2506.19143 | null |
2025-06-25 | Semantic Caching for Improving Web Affordability | Hafsa Akbar et.al. | 2506.20420 | null |
2025-06-25 | Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs | Sonia K. Murthy et.al. | 2506.20666 | null |
2025-06-25 | The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind | Andrei Lupu et.al. | 2506.20664 | null |
2025-06-25 | Memento: Note-Taking for Your Future Self | Chao Wan et.al. | 2506.20642 | null |
2025-06-25 | Video Perception Models for 3D Scene Synthesis | Rui Huang et.al. | 2506.20601 | null |
2025-06-25 | Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios | Wenbin Gan et.al. | 2506.20531 | null |
2025-06-25 | Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards | Charles Arnal et.al. | 2506.20520 | null |
2025-06-25 | ReCode: Updating Code API Knowledge with Reinforcement Learning | Haoze Wu et.al. | 2506.20495 | null |
2025-06-25 | Generative AI for Vulnerability Detection in 6G Wireless Networks: Advances, Case Study, and Future Directions | Shuo Yang et.al. | 2506.20488 | null |
2025-06-25 | Automatic Demonstration Selection for LLM-based Tabular Data Classification | Shuchu Han et.al. | 2506.20451 | null |
2025-06-25 | An Agentic System for Rare Disease Diagnosis with Traceable Reasoning | Weike Zhao et.al. | 2506.20430 | null |
2025-06-25 | SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models | Dipayan Saha et.al. | 2506.20415 | null |
2025-06-25 | Tabular Feature Discovery With Reasoning Type Exploration | Sungwon Han et.al. | 2506.20357 | null |
2025-06-25 | Enterprise Large Language Model Evaluation Benchmark | Liya Wang et.al. | 2506.20274 | null |
2025-06-25 | Enhancing Large Language Models through Structured Reasoning | Yubo Dong et.al. | 2506.20241 | null |
2025-06-25 | SEED: A Structural Encoder for Embedding-Driven Decoding in Time Series Prediction with LLMs | Fengze Li et.al. | 2506.20167 | null |
2025-06-25 | A Modular Multitask Reasoning Framework Integrating Spatio-temporal Models and LLMs | Kethmi Hirushini Hettige et.al. | 2506.20073 | null |
2025-06-25 | Omniwise: Predicting GPU Kernels Performance with LLMs | Zixian Wang et.al. | 2506.20886 | null |
2025-06-25 | Uncovering Hidden Violent Tendencies in LLMs: A Demographic Analysis via Behavioral Vignettes | Quintin Myers et.al. | 2506.20822 | null |
2025-06-25 | MultiFinRAG: An Optimized Multimodal Retrieval-Augmented Generation (RAG) Framework for Financial Question Answering | Chinmay Gondhalekar et.al. | 2506.20821 | null |
2025-06-25 | Dynamic Context-Aware Prompt Recommendation for Domain-Specific AI Applications | Xinye Tang et.al. | 2506.20815 | null |
2025-06-25 | Towards Probabilistic Question Answering Over Tabular Data | Chen Shen et.al. | 2506.20747 | null |
2025-06-25 | Test-time Scaling Techniques in Theoretical Physics – A Comparison of Methods on the TPBench Dataset | Zhiqi Gao et.al. | 2506.20729 | null |
2025-06-24 | ReDit: Reward Dithering for Improved LLM Policy Optimization | Chenxing Wei et.al. | 2506.18631 | null |
2025-06-24 | Understanding Reasoning in Thinking Language Models via Steering Vectors | Constantin Venhoff et.al. | 2506.18167 | null |
2025-06-24 | KAG-Thinker: Interactive Thinking and Deep Reasoning in LLMs via Knowledge-Augmented Generation | Dalong Zhang et.al. | 2506.17728 | null |
2025-06-24 | AnTKV: Anchor Token-Aware Sub-Bit Vector Quantization for KV Cache in Large Language Models | Zeyu Li et.al. | 2506.19505 | null |
2025-06-24 | Mem4Nav: Boosting Vision-and-Language Navigation in Urban Environments with a Hierarchical Spatial-Cognition Long-Short Memory System | Lixuan He et.al. | 2506.19433 | null |
2025-06-24 | JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning | Ai Han et.al. | 2506.19846 | null |
2025-06-24 | MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration | Yucheng Zhou et.al. | 2506.19835 | null |
2025-06-24 | KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality | Baochang Ren et.al. | 2506.19807 | null |
2025-06-24 | KnowML: Improving Generalization of ML-NIDS with Attack Knowledge Graphs | Xin Fan Guo et.al. | 2506.19802 | null |
2025-06-24 | Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study | Yuqi Zhu et.al. | 2506.19794 | null |
2025-06-24 | Automatic Prompt Optimization for Knowledge Graph Construction: Insights from an Empirical Study | Nandana Mihindukulasooriya et.al. | 2506.19773 | null |
2025-06-24 | SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning | Yuqian Fu et.al. | 2506.19767 | null |
2025-06-24 | Breaking Barriers: Do Reinforcement Post Training Gains Transfer To Unseen Domains? | Chuxuan Hu et.al. | 2506.19733 | null |
2025-06-24 | ECCoT: A Framework for Enhancing Effective Cognition via Chain of Thought in Large Language Model | Zhenke Duan et.al. | 2506.19599 | null |
2025-06-24 | KnowMap: Efficient Knowledge-Driven Task Adaptation for LLMs | Kelin Fu et.al. | 2506.19527 | null |
2025-06-24 | Commonsense Generation and Evaluation for Dialogue Systems using Large Language Models | Marcos Estecha-Garitagoitia et.al. | 2506.19483 | null |
2025-06-24 | Can Large Language Models Capture Human Annotator Disagreements? | Jingwei Ni et.al. | 2506.19467 | null |
2025-06-24 | RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1 | Yu Xie et.al. | 2506.19235 | null |
2025-06-24 | Augmenting Multi-Agent Communication with State Delta Trajectory | Yichen Tang et.al. | 2506.19209 | null |
2025-06-24 | Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning | Saloni Dash et.al. | 2506.20020 | null |
2025-06-24 | Inference Scaled GraphRAG: Improving Multi Hop Question Answering on Knowledge Graphs | Travis Thompson et.al. | 2506.19967 | null |
2025-06-24 | Prover Agent: An Agent-based Framework for Formal Mathematical Proofs | Kaito Baba et.al. | 2506.19923 | null |
2025-06-23 | RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding | Guanzheng Chen et.al. | 2502.20330 | link |
2025-06-23 | RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought | Junbo Qiao et.al. | 2506.16796 | link |
2025-06-23 | SLR: An Automated Synthesis Framework for Scalable Logical Reasoning | Lukas Helff et.al. | 2506.15787 | null |
2025-06-23 | CommVQ: Commutative Vector Quantization for KV Cache Compression | Junyan Li et.al. | 2506.18879 | null |
2025-06-23 | ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs | Jiaru Zou et.al. | 2506.18896 | null |
2025-06-23 | OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization | Yiyou Sun et.al. | 2506.18880 | null |
2025-06-23 | LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning | Yuhao Wu et.al. | 2506.18841 | null |
2025-06-23 | Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories | Islem Bouzenia et.al. | 2506.18824 | null |
2025-06-23 | Existing LLMs Are Not Self-Consistent For Simple Tasks | Zhenru Lin et.al. | 2506.18781 | null |
2025-06-23 | Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training | Jonathan Cook et.al. | 2506.18777 | null |
2025-06-23 | MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis | Yuting Zhang et.al. | 2506.18512 | null |
2025-06-23 | MeRF: Motivation-enhanced Reinforcement Finetuning for Large Reasoning Models | Junjie Zhang et.al. | 2506.18485 | null |
2025-06-23 | TReB: A Comprehensive Benchmark for Evaluating Table Reasoning Capabilities of Large Language Models | Ce Li et.al. | 2506.18421 | null |
2025-06-23 | Evaluating Causal Explanation in Medical Reports with LLM-Based and Human-Aligned Metrics | Yousang Cho et.al. | 2506.18387 | null |
2025-06-23 | LOGICPO: Efficient Translation of NL-based Logical Problems to FOL using LLMs and Preference Optimization | Koushik Viswanadha et.al. | 2506.18383 | null |
2025-06-23 | Less Data Less Tokens: Multilingual Unification Learning for Efficient Test-Time Reasoning in LLMs | Kang Chen et.al. | 2506.18341 | null |
2025-06-23 | TranslationCorrect: A Unified Framework for Machine Translation Post-Editing with Predictive Error Assistance | Syed Mekael Wasti et.al. | 2506.18337 | null |
2025-06-23 | LLM-Integrated Digital Twins for Hierarchical Resource Allocation in 6G Networks | Majumder Haider et.al. | 2506.18293 | null |
2025-06-23 | RLPR: Extrapolating RLVR to General Domains without Verifiers | Tianyu Yu et.al. | 2506.18254 | null |
2025-06-23 | Distilling Tool Knowledge into Language Models via Back-Translated Traces | Xingyue Huang et.al. | 2506.19171 | null |
2025-06-23 | Command-V: Pasting LLM Behaviors via Activation Profiles | Barry Wang et.al. | 2506.19140 | null |
2025-06-23 | Human-Aligned Faithfulness in Toxicity Explanations of LLMs | Ramaravind K. Mothilal et.al. | 2506.19113 | null |
2025-06-23 | Baba is LLM: Reasoning in a Game with Dynamic Rules | Fien van Wetten et.al. | 2506.19095 | null |
2025-06-23 | Language Models Might Not Understand You: Evaluating Theory of Mind via Story Prompting | Nathaniel Getachew et.al. | 2506.19089 | null |
2025-06-23 | MFTCXplain: A Multilingual Benchmark Dataset for Evaluating the Moral Reasoning of LLMs through Hate Speech Multi-hop Explanation | Jackson Trager et.al. | 2506.19073 | null |
2025-06-23 | Mirage of Mastery: Memorization Tricks LLMs into Artificially Inflated Self-Knowledge | Sahil Kale et.al. | 2506.18998 | null |
2025-06-23 | SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications | Jinyang Li et.al. | 2506.18951 | null |
2025-06-22 | Integrating LLMs and Digital Twins for Adaptive Multi-Robot Task Allocation in Construction | Min Deng et.al. | 2506.18178 | null |
2025-06-22 | Programming Quantum Computers with Large Language Models | Elena R. Henderson et.al. | 2506.18125 | null |
2025-06-22 | Mental Health Equity in LLMs: Leveraging Multi-Hop Question Answering to Detect Amplified and Silenced Perspectives | Batool Haider et.al. | 2506.18116 | null |
2025-06-22 | InspireDebate: Multi-Dimensional Subjective-Objective Evaluation-Guided Reasoning and Optimization for Debating | Fuyu Wang et.al. | 2506.18102 | null |
2025-06-22 | Deep Research Agents: A Systematic Examination And Roadmap | Yuxuan Huang et.al. | 2506.18096 | null |
2025-06-22 | Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective | Jianyu Wang et.al. | 2506.17930 | null |
2025-06-22 | Leveraging Large Language Model for Intelligent Log Processing and Autonomous Debugging in Cloud AI Platforms | Cheng Ji et.al. | 2506.17900 | null |
2025-06-22 | How Alignment Shrinks the Generative Horizon | Chenghao Yang et.al. | 2506.17871 | null |
2025-06-21 | Bayesian Social Deduction with Graph-Informed Language Models | Shahab Rahimirad et.al. | 2506.17788 | null |
2025-06-21 | PAGENT: Learning to Patch Software Engineering Agents | Haoran Xue et.al. | 2506.17772 | null |
2025-06-21 | Towards a Unified Textual Graph Framework for Spectral Reasoning via Physical and Chemical Information Fusion | Jiheng Liang et.al. | 2506.17761 | null |
2025-06-21 | Resource-Friendly Dynamic Enhancement Chain for Multi-Hop Question Answering | Binquan Ji et.al. | 2506.17692 | null |
2025-06-21 | Measuring and Augmenting Large Language Models for Solving Capture-the-Flag Challenges | Zimo Ji et.al. | 2506.17644 | null |
2025-06-21 | Answer-Centric or Reasoning-Driven? Uncovering the Latent Memory Anchor in LLMs | Yang Wu et.al. | 2506.17630 | null |
2025-06-21 | CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning | Kailing Li et.al. | 2506.17629 | null |
2025-06-21 | Scene-R1: Video-Grounded Large Language Models for 3D Scene Reasoning without 3D Annotations | Zhihao Yuan et.al. | 2506.17545 | null |
2025-06-21 | DuaShepherd: Integrating Stepwise Correctness and Potential Rewards for Mathematical Reasoning | Yuanhao Wu et.al. | 2506.17533 | null |
2025-06-21 | Do LLMs Know When to Flip a Coin? Strategic Randomization through Reasoning and Experience | Lingyu Yang et.al. | 2506.18928 | null |
2025-06-20 | Domain Specific Benchmarks for Evaluating Multimodal Large Language Models | Khizar Anjum et.al. | 2506.12958 | null |
2025-06-20 | Towards AI Search Paradigm | Yuchen Li et.al. | 2506.17188 | null |
2025-06-20 | When Can Model-Free Reinforcement Learning be Enough for Thinking? | Josiah P. Hanna et.al. | 2506.17124 | null |
2025-06-20 | Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving | Chuxue Cao et.al. | 2506.17104 | null |
2025-06-20 | Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation | Jiahao Cheng et.al. | 2506.17088 | null |
2025-06-20 | Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs | Ricardo Rei et.al. | 2506.17080 | null |
2025-06-20 | From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers | Jingtong Su et.al. | 2506.17052 | null |
2025-06-20 | Latent Concept Disentanglement in Transformer-based Language Models | Guan Zhe Hong et.al. | 2506.16975 | null |
2025-06-20 | LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation | Tongtian Yue et.al. | 2506.16691 | null |
2025-06-20 | Distilling On-device Language Models for Robot Planning with Minimal Human Intervention | Zachary Ravichandran et.al. | 2506.17486 | null |
2025-06-20 | Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? | Mingyuan Wu et.al. | 2506.17417 | null |
2025-06-19 | Serving Large Language Models on Huawei CloudMatrix384 | Pengfei Zuo et.al. | 2506.12708 | null |
2025-06-19 | MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation | Xueqing Peng et.al. | 2506.14028 | null |
2025-06-19 | LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning | Haoyue Zhang et.al. | 2506.15969 | null |
2025-06-19 | SemAgent: A Semantics Aware Program Repair Agent | Anvith Pabba et.al. | 2506.16650 | null |
2025-06-19 | LLM-based Satisfiability Checking of String Requirements by Consistent Data and Checker Generation | Boqi Chen et.al. | 2506.16639 | null |
2025-06-19 | Robust Reward Modeling via Causal Rubrics | Pragya Srivastava et.al. | 2506.16507 | null |
2025-06-19 | SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity | Samir Khaki et.al. | 2506.16500 | null |
2025-06-19 | ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning | Zexi Liu et.al. | 2506.16499 | null |
2025-06-19 | Grounding Language Models with Semantic Digital Twins for Robotic Planning | Mehreen Naeem et.al. | 2506.16493 | null |
2025-06-19 | How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering? | Giuseppe Lando et.al. | 2506.16450 | null |
2025-06-19 | Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights | Zhiyuan Liang et.al. | 2506.16406 | null |
2025-06-19 | TrajSceneLLM: A Multimodal Perspective on Semantic GPS Trajectory Analysis | Chunhou Ji et.al. | 2506.16401 | link |
2025-06-19 | OJBench: A Competition Level Code Benchmark For Large Language Models | Zhexu Wang et.al. | 2506.16395 | null |
2025-06-19 | From LLM-anation to LLM-orchestrator: Coordinating Small Models for Data Labeling | Yao Lu et.al. | 2506.16393 | null |
2025-06-19 | RiOT: Efficient Prompt Refinement with Residual Optimization Tree | Chenyi Zhou et.al. | 2506.16389 | link |
2025-06-19 | Large Language Models in Argument Mining: A Survey | Hao Li et.al. | 2506.16383 | null |
2025-06-19 | SHREC and PHEONA: Using Large Language Models to Advance Next-Generation Computational Phenotyping | Sarah Pungitore et.al. | 2506.16359 | null |
2025-06-19 | Explainable Rule Application via Structured Prompting: A Neural-Symbolic Approach | Albert Sadowski et.al. | 2506.16335 | link |
2025-06-19 | SGIC: A Self-Guided Iterative Calibration Framework for RAG | Guanhua Chen et.al. | 2506.16172 | null |
2025-06-19 | Under the Shadow of Babel: How Language Shapes Reasoning in LLMs | Chenxi Wang et.al. | 2506.16151 | null |
2025-06-19 | GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning | Yi Chen et.al. | 2506.16141 | link |
2025-06-19 | Seeing is Fixing: Cross-Modal Reasoning with Multimodal LLMs for Visual Software Issue Fixing | Kai Huang et.al. | 2506.16136 | null |
2025-06-19 | AutoV: Learning to Retrieve Visual Prompt for Large Vision-Language Models | Yuan Zhang et.al. | 2506.16112 | null |
2025-06-19 | Human-Centered Shared Autonomy for Motor Planning, Learning, and Control Applications | MH Farhadi et.al. | 2506.16044 | null |
2025-06-19 | DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling | Fei Wang et.al. | 2506.16043 | null |
2025-06-19 | SimuPanel: A Novel Immersive Multi-Agent System to Simulate Interactive Expert Panel Discussion | Xiangyang He et.al. | 2506.16010 | null |
2025-06-19 | Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases | Yubeen Bae et.al. | 2506.17336 | link |
2025-06-19 | LMR-BENCH: Evaluating LLM Agent’s Ability on Reproducing Language Modeling Research | Shuo Yan et.al. | 2506.17335 | null |
2025-06-19 | Large Language Models for Spreadsheets: Benchmarking Progress and Evaluating Performance with FLARE | Simon Thorne et.al. | 2506.17330 | null |
2025-06-18 | Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations | Amey Agrawal et.al. | 2409.17264 | null |
2025-06-18 | Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs | Ling Team et.al. | 2506.14731 | null |
2025-06-18 | AIn’t Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation | Leah von der Heyde et.al. | 2506.14634 | null |
2025-06-18 | Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models | Chenchen Yuan et.al. | 2506.14625 | link |
2025-06-18 | eLLM: Elastic Memory Management Framework for Efficient LLM Serving | Jiale Xu et.al. | 2506.15155 | null |
2025-06-18 | CC-LEARN: Cohort-based Consistency Learning | Xiao Ye et.al. | 2506.15662 | null |
2025-06-18 | Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability | Yusuke Sakai et.al. | 2506.15629 | null |
2025-06-18 | Managing Complex Failure Analysis Workflows with LLM-based Reasoning and Acting Agents | Aline Dobrovsky et.al. | 2506.15567 | null |
2025-06-18 | Lessons from Training Grounded LLMs with Verifiable Rewards | Shang Hong Sim et.al. | 2506.15522 | null |
2025-06-18 | Optimizing Web-Based AI Query Retrieval with GPT Integration in LangChain A CoT-Enhanced Prompt Engineering Approach | Wenqi Guan et.al. | 2506.15512 | null |
2025-06-18 | SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling | Md Imbesat Hassan Rizvi et.al. | 2506.15498 | link |
2025-06-18 | RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation | Xinnuo Xu et.al. | 2506.15455 | null |
2025-06-18 | AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need | Zhouhong Gu et.al. | 2506.15451 | link |
2025-06-18 | DeVisE: Behavioral Testing of Medical Large Language Models | Camila Zurdo Tagliabue et.al. | 2506.15339 | null |
2025-06-18 | Cohort Discovery: A Survey on LLM-Assisted Clinical Trial Recruitment | Shrestha Ghosh et.al. | 2506.15301 | null |
2025-06-18 | MinosEval: Distinguishing Factoid and Non-Factoid for Tailored Open-Ended QA Evaluation with LLMs | Yongqi Fan et.al. | 2506.15215 | link |
2025-06-18 | ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs | Feng He et.al. | 2506.15211 | null |
2025-06-18 | Learning-Time Encoding Shapes Unlearning in LLMs | Ruihan Wu et.al. | 2506.15076 | link |
2025-06-18 | HEAL: An Empirical Study on Hallucinations in Embodied Agents Driven by Large Language Models | Trishna Chakraborty et.al. | 2506.15065 | null |
2025-06-18 | Truncated Proximal Policy Optimization | Tiantian Fan et.al. | 2506.15050 | null |
2025-06-18 | Language Models can perform Single-Utterance Self-Correction of Perturbed Reasoning | Sam Silver et.al. | 2506.15894 | null |
2025-06-18 | Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute | Sheng Liu et.al. | 2506.15882 | null |
2025-06-18 | MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents | Zijian Zhou et.al. | 2506.15841 | null |
2025-06-18 | Context Matters! Relaxing Goals with LLMs for Feasible 3D Scene Planning | Emanuele Musumeci et.al. | 2506.15828 | null |
2025-06-18 | Veracity: An Open-Source AI Fact-Checking System | Taylor Lynn Curtis et.al. | 2506.15794 | null |
2025-06-18 | ETrace:Event-Driven Vulnerability Detection in Smart Contracts via LLM-Based Trace Analysis | Chenyang Peng et.al. | 2506.15790 | null |
2025-06-18 | PentaRAG: Large-Scale Intelligent Knowledge Retrieval for Enterprise LLM Applications | Abu Hanif Muhammad Syarubany et.al. | 2506.21593 | null |
2025-06-18 | Representation Consistency for Accurate and Coherent LLM Answer Aggregation | Junqi Jiang et.al. | 2506.21590 | null |
2025-06-17 | Unified Software Engineering agent as AI Software Engineer | Leonhard Applis et.al. | 2506.14683 | null |
2025-06-17 | Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality | Yuto Harada et.al. | 2506.14681 | null |
2025-06-17 | Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot | Xiang Cheng et.al. | 2506.14641 | null |
2025-06-17 | NetRoller: Interfacing General and Specialized Models for End-to-End Autonomous Driving | Ren Xin et.al. | 2506.14589 | link |
2025-06-17 | Automatic Qiskit Code Refactoring Using Large Language Models | José Manuel Suárez et.al. | 2506.14535 | null |
2025-06-17 | M2BeamLLM: Multimodal Sensing-empowered mmWave Beam Prediction with Large Language Models | Can Zheng et.al. | 2506.14532 | null |
2025-06-17 | SIRI-Bench: Challenging VLMs’ Spatial Intelligence through Complex Reasoning Tasks | Zijian Song et.al. | 2506.14512 | null |
2025-06-17 | LLM-Powered Swarms: A New Frontier or a Conceptual Stretch? | Muhammad Atta Ur Rahman et.al. | 2506.14496 | null |
2025-06-17 | How Far Can LLMs Improve from Experience? Measuring Test-Time Learning Ability in LLMs with Human Comparison | Jiayin Wang et.al. | 2506.14448 | link |
2025-06-17 | Excessive Reasoning Attack on Reasoning LLMs | Wai Man Si et.al. | 2506.14374 | null |
2025-06-17 | ELLIS Alicante at CQs-Gen 2025: Winning the critical thinking questions shared task: LLM-based question generation and selection | Lucile Favero et.al. | 2506.14371 | null |
2025-06-17 | A Vision for Geo-Temporal Deep Research Systems: Towards Comprehensive, Transparent, and Reproducible Geo-Temporal Information Synthesis | Bruno Martins et.al. | 2506.14345 | null |
2025-06-17 | ADRD: LLM-Driven Autonomous Driving Based on Rule-based Decision Systems | Fanzhi Zeng et.al. | 2506.14299 | null |
2025-06-17 | Large Language Model Empowered Design of Fluid Antenna Systems: Challenges, Frameworks, and Case Studies for 6G | Chao Wang et.al. | 2506.14288 | null |
2025-06-17 | Improving LoRA with Variational Learning | Bai Cong et.al. | 2506.14280 | null |
2025-06-17 | Don’t throw the baby out with the bathwater: How and why deep learning for ARC | Jack Cole et.al. | 2506.14276 | null |
2025-06-17 | Re-Initialization Token Learning for Tool-Augmented Large Language Models | Chenghao Li et.al. | 2506.14248 | link |
2025-06-17 | Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs | Xumeng Wen et.al. | 2506.14245 | null |
2025-06-17 | Causes in neuron diagrams, and testing causal reasoning in Large Language Models. A glimpse of the future of philosophy? | Louis Vervoort et.al. | 2506.14239 | null |
2025-06-17 | Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team | Md Tanzib Hosain et.al. | 2506.14234 | null |
2025-06-17 | MIST: Towards Multi-dimensional Implicit Bias and Stereotype Evaluation of LLMs via Theory of Mind | Yanlin Li et.al. | 2506.14161 | null |
2025-06-17 | S$^4$C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models | Tao He et.al. | 2506.14158 | null |
2025-06-17 | InsertRank: LLMs can reason over BM25 scores to Improve Listwise Reranking | Rahul Seetharaman et.al. | 2506.14086 | null |
2025-06-17 | AI-Facilitated Analysis of Abstracts and Conclusions: Flagging Unsubstantiated Claims and Ambiguous Pronouns | Evgeny Markhasin et.al. | 2506.13172 | null |
2025-06-17 | AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving | Wentao Zhang et.al. | 2506.12508 | link |
2025-06-17 | LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification | Penghui Yang et.al. | 2502.17421 | link |
2025-06-17 | Cost-Efficient Serving of LLM Agents via Test-Time Plan Caching | Qizheng Zhang et.al. | 2506.14852 | null |
2025-06-17 | Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective | Zhoujun Cheng et.al. | 2506.14965 | link |
2025-06-17 | Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework | Mohna Chakraborty et.al. | 2506.14948 | null |
2025-06-17 | CALM: Contextual Analog Logic with Multimodality | Maxwell J. Jacobson et.al. | 2506.14936 | null |
2025-06-17 | MDBench: A Synthetic Multi-Document Reasoning Benchmark Generated with Knowledge Guidance | Joseph J. Peper et.al. | 2506.14927 | null |
2025-06-16 | Steering LLM Thinking with Budget Guidance | Junyan Li et.al. | 2506.13752 | link |
2025-06-16 | Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability | Shova Kuikel et.al. | 2506.13746 | link |
2025-06-16 | TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning | Junru Zhang et.al. | 2506.13705 | link |
2025-06-16 | Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text | Amr Mohamed et.al. | 2506.14012 | link |
2025-06-16 | Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences | Stas Bekman et.al. | 2506.13996 | link |
2025-06-16 | How Does LLM Reasoning Work for Code? A Survey and a Call to Action | Ira Ceka et.al. | 2506.13932 | null |
2025-06-16 | Spec2RTL-Agent: Automated Hardware Code Generation from Complex Specifications Using LLM Agent Systems | Zhongzhi Yu et.al. | 2506.13905 | null |
2025-06-16 | Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles | Antara Raaghavi Bhattacharya et.al. | 2506.13886 | null |
2025-06-16 | Balancing Knowledge Delivery and Emotional Comfort in Healthcare Conversational Systems | Shang-Chi Tsai et.al. | 2506.13692 | null |
2025-06-16 | An LLM’s Apology: Outsourcing Awkwardness in the Age of AI | Twm Stone et.al. | 2506.13685 | link |
2025-06-16 | LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning | Miho Koda et.al. | 2506.13841 | link |
2025-06-16 | EvolvTrip: Enhancing Literary Character Understanding with Temporal Theory-of-Mind Graphs | Bohao Yang et.al. | 2506.13641 | link |
2025-06-16 | An Empirical Study of LLM-as-a-Judge: How Design Choices Impact Evaluation Reliability | Yusuke Yamauchi et.al. | 2506.13639 | null |
2025-06-16 | FreeQ-Graph: Free-form Querying with Semantic Consistent Scene Graph for 3D Scene Understanding | Chenlu Zhan et.al. | 2506.13629 | null |
2025-06-16 | CAMS: A CityGPT-Powered Agentic Framework for Urban Human Mobility Simulation | Yuwei Du et.al. | 2506.13599 | null |
2025-06-16 | Understand the Implication: Learning to Think for Pragmatic Understanding | Settaluri Lakshmi Sravanthi et.al. | 2506.13559 | null |
2025-06-16 | Implicit and Explicit Research Quality Score Probabilities from ChatGPT | Mike Thelwall et.al. | 2506.13525 | null |
2025-06-16 | BOW: Bottlenecked Next Word Exploration | Ming Shen et.al. | 2506.13502 | null |
2025-06-16 | Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study | Zhengyu Hu et.al. | 2506.13464 | null |
2025-06-16 | From Promise to Peril: Rethinking Cybersecurity Red and Blue Teaming in the Age of LLMs | Alsharif Abuadbba et.al. | 2506.13434 | null |
2025-06-16 | RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis | Pengzuo Wu et.al. | 2506.13405 | null |
2025-06-16 | Decompositional Reasoning for Graph Retrieval with Large Language Models | Valentin Six et.al. | 2506.13380 | null |
2025-06-16 | Socratic RL: A Novel Framework for Efficient Knowledge Acquisition through Iterative Reflection and Viewpoint Distillation | Xiangfan Wu et.al. | 2506.13358 | null |
2025-06-16 | StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns | Luanbo Wan et.al. | 2506.13356 | null |
2025-06-16 | Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks | Yifei Xu et.al. | 2506.13351 | null |
2025-06-16 | Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers | Wooseok Seo et.al. | 2506.13342 | link |
2025-06-16 | Towards Pervasive Distributed Agentic Generative AI – A State of The Art | Gianni Molinari et.al. | 2506.13324 | null |
2025-06-16 | Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models | James Chua et.al. | 2506.13206 | null |
2025-06-16 | Breaking Thought Patterns: A Multi-Dimensional Reasoning Framework for LLMs | Xintong Tang et.al. | 2506.13192 | null |
2025-06-16 | Enhancing Large Language Models with Reliable Knowledge Graphs | Qinggang Zhang et.al. | 2506.13178 | null |
2025-06-16 | Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs | Gyutaek Oh et.al. | 2506.13102 | null |
2025-06-16 | Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs | Daniel Kilov et.al. | 2506.13082 | null |
2025-06-16 | MotiveBench: How Far Are We From Human-Like Motivational Reasoning in Large Language Models? | Xixian Yong et.al. | 2506.13065 | null |
2025-06-16 | Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning | Haibo Qiu et.al. | 2506.13056 | null |
2025-06-16 | Just Go Parallel: Improving the Multilingual Capabilities of Large Language Models | Muhammad Reza Qorib et.al. | 2506.13044 | null |
2025-06-16 | Knowledge Graph Fusion with Large Language Models for Accurate, Explainable Manufacturing Process Planning | Danny Hoang et.al. | 2506.13026 | null |
2025-06-16 | Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making | Claudio Fanconi et.al. | 2506.11887 | null |
2025-06-15 | I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference | Zibo Gao et.al. | 2505.06738 | null |
2025-06-15 | SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models | Xinyi Zhao et.al. | 2506.12992 | link |
2025-06-15 | Multi-document Summarization through Multi-document Event Relation Graph Reasoning in LLMs: a case study in Framing Bias Mitigation | Yuanyuan Lei et.al. | 2506.12978 | null |
2025-06-15 | Scaling Test-time Compute for LLM Agents | King Zhu et.al. | 2506.12928 | null |
2025-06-15 | PersonaFeedback: A Large-scale Human-annotated Benchmark For Personalization | Meiling Tao et.al. | 2506.12915 | null |
2025-06-15 | SciDA: Scientific Dynamic Assessor of LLMs | Junting Zhou et.al. | 2506.12909 | null |
2025-06-15 | WereWolf-Plus: An Update of Werewolf Game setting Based on DSGBench | Xinyuan Xia et.al. | 2506.12841 | null |
2025-06-15 | Mastering Da Vinci Code: A Comparative Study of Transformer, LLM, and PPO-based Agents | LeCheng Zhang et.al. | 2506.12801 | null |
2025-06-15 | MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution | Yibo Wang et.al. | 2506.12728 | null |
2025-06-15 | Humanity’s Last Code Exam: Can Advanced LLMs Conquer Human’s Hardest Code Competition? | Xiangyang Li et.al. | 2506.12713 | link |
2025-06-15 | Building Trustworthy AI by Addressing its 16+2 Desiderata with Goal-Directed Commonsense Reasoning | Alexis R. Tudor et.al. | 2506.12667 | null |
2025-06-14 | Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics | Jiarui Liu et.al. | 2506.12657 | null |
2025-06-14 | Towards Building General Purpose Embedding Models for Industry 4.0 Agents | Christodoulos Constantinides et.al. | 2506.12607 | null |
2025-06-14 | OneEval: Benchmarking LLM Knowledge-intensive Reasoning over Diverse Knowledge Bases | Yongrui Chen et.al. | 2506.12577 | null |
2025-06-14 | RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking | Shuo Yang et.al. | 2506.12538 | null |
2025-06-14 | Detection, Classification, and Mitigation of Gender Bias in Large Language Models | Xiaoqing Cheng et.al. | 2506.12527 | null |
2025-06-14 | Graph of Verification: Structured Verification of LLM Reasoning with Directed Acyclic Graphs | Jiwei Fang et.al. | 2506.12509 | null |
2025-06-14 | From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time Alignment | Bin Xie et.al. | 2506.12446 | null |
2025-06-14 | Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics | Asifullah khan et.al. | 2506.12365 | null |
2025-06-14 | QiMeng-Attention: SOTA Attention Operator is generated by SOTA Attention Algorithm | Qirui Zhou et.al. | 2506.12355 | null |
2025-06-14 | Information Suppression in Large Language Models: Auditing, Quantifying, and Characterizing Censorship in DeepSeek | Peiran Qiu et.al. | 2506.12349 | null |
2025-06-14 | Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning | Xiaotian Zhang et.al. | 2506.12307 | null |
2025-06-14 | Unveiling Confirmation Bias in Chain-of-Thought Reasoning | Yue Wan et.al. | 2506.12301 | null |
2025-06-14 | The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason | Shanchao Liang et.al. | 2506.12286 | null |
2025-06-14 | ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression | Guangda Liu et.al. | 2412.03213 | link |
2025-06-13 | Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache | Xiaoran Liu et.al. | 2506.11886 | null |
2025-06-13 | Lag-Relative Sparse Attention In Long Context Training | Manlai Liang et.al. | 2506.11498 | null |
2025-06-13 | Efficient Long-Context LLM Inference via KV Cache Clustering | Jie Hu et.al. | 2506.11418 | null |
2025-06-13 | Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles | Qingyan Wei et.al. | 2506.10848 | link |
2025-06-13 | Investigating the Potential of Large Language Model-Based Router Multi-Agent Architectures for Foundation Design Automation: A Task Classification and Expert Selection Study | Sompote Youwai et.al. | 2506.13811 | null |
2025-06-13 | From Emergence to Control: Probing and Modulating Self-Reflection in Language Models | Xudong Zhu et.al. | 2506.12217 | link |
2025-06-13 | Supernova Event Dataset: Interpreting Large Language Model’s Personality through Critical Event Analysis | Pranav Agarwal et.al. | 2506.12189 | null |
2025-06-13 | Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs | Chenqian Le et.al. | 2506.12182 | null |
2025-06-13 | code_transformed: The Influence of Large Language Models on Code | Yuliang Xu et.al. | 2506.12014 | null |
2025-06-13 | Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making | Xiaopeng Yuan et.al. | 2506.12012 | null |
2025-06-13 | How Visual Representations Map to Language Feature Space in Multimodal LLMs | Constantin Venhoff et.al. | 2506.11976 | null |
2025-06-13 | Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback | Dongwei Jiang et.al. | 2506.11930 | null |
2025-06-13 | LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? | Zihan Zheng et.al. | 2506.11928 | null |
2025-06-13 | TreeRL: LLM Reinforcement Learning with On-Policy Tree Search | Zhenyu Hou et.al. | 2506.11902 | link |
2025-06-13 | MapQaTor: An Extensible Framework for Efficient Annotation of Map-Based QA Datasets | Mahir Labib Dihan et.al. | 2412.21015 | link |
2025-06-12 | SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding | Ziyi Zhang et.al. | 2506.11309 | null |
2025-06-11 | SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems | Peiran Li et.al. | 2506.07564 | null |
2025-06-10 | Activated LoRA: Fine-tuned LLMs for Intrinsics | Kristjan Greenewald et.al. | 2504.12397 | link |
2025-06-09 | Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models | Haoyu Wang et.al. | 2506.07334 | null |
2025-06-09 | MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts | Wei Tao et.al. | 2506.07533 | null |
2025-06-08 | Paged Attention Meets FlexAttention: Unlocking Long-Context Efficiency in Deployed Inference | Thomas Joshi et.al. | 2506.07311 | null |
2025-06-08 | MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache | Akshat Sharma et.al. | 2411.18077 | null |
2025-06-05 | Inference-Time Hyper-Scaling with KV Cache Compression | Adrian Łańcucki et.al. | 2506.05345 | null |
2025-06-05 | Unleashing Hour-Scale Video Training for Long Video-Language Understanding | Jingyang Lin et.al. | 2506.05332 | null |
2025-06-05 | MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs | Zhenyan Lu et.al. | 2506.13772 | null |
2025-06-05 | ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration | Xianglong Yan et.al. | 2505.24357 | null |
2025-06-04 | Homogeneous Keys, Heterogeneous Values: Exploiting Local KV Cache Asymmetry for Long-Context LLMs | Wanyun Cui et.al. | 2506.05410 | null |
2025-06-04 | AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models | Yifeng Gu et.al. | 2506.03762 | null |
2025-06-04 | AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism | Zhepei Wei et.al. | 2506.03700 | link |
2025-06-04 | HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing | Minghui Liu et.al. | 2412.16187 | null |
2025-06-04 | KVPR: Efficient LLM Inference with I/O-Aware KV Cache Partial Recomputation | Chaoyi Jiang et.al. | 2411.17089 | link |
2025-06-03 | A$^2$ATS: Retrieval-Based KV Cache Reduction via Windowed Rotary Position Embedding and Query-Aware Vector Quantization | Junhui He et.al. | 2502.12665 | null |
2025-06-03 | SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation | Jialong Wu et.al. | 2412.13649 | link |
2025-06-02 | Memory Access Characterization of Large Language Models in CPU Environment and its Potential Impacts | Spencer Banasik et.al. | 2506.01827 | null |
2025-06-02 | SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation | Aurick Qiao et.al. | 2410.03960 | null |
2025-06-02 | SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications | Gabriele Oliaro et.al. | 2411.04975 | link |
2025-06-01 | Earley-Driven Dynamic Pruning for Efficient Structured Decoding | Xintong Sun et.al. | 2506.01151 | null |
2025-06-01 | A Survey of LLM $\times$ DATA | Xuanhe Zhou et.al. | 2505.18458 | link |
2025-05-31 | Accelerating Diffusion LLMs via Adaptive Parallel Decoding | Daniel Israel et.al. | 2506.00413 | null |
2025-05-31 | QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design | Benjamin Schneider et.al. | 2505.16175 | link |
2025-05-31 | KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference | Xing Li et.al. | 2502.04420 | link |
2025-05-30 | HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts | Neil He et.al. | 2505.24722 | link |
2025-05-30 | Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching | Juan Wisznia et.al. | 2505.24643 | null |
2025-05-30 | SkyLB: A Locality-Aware Cross-Region Load Balancer for LLM Inference | Tian Xia et.al. | 2505.24095 | null |
2025-05-30 | RaaS: Reasoning-Aware Attention Sparsity for Efficient LLM Reasoning | Junhao Hu et.al. | 2502.11147 | null |
2025-05-30 | Learn from the Past: Fast Sparse Indexing for Large Language Model Decoding | Feiyu Yao et.al. | 2506.15704 | null |
2025-05-29 | EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse | Tianyu Guo et.al. | 2505.21889 | link |
2025-05-29 | Wireless Agentic AI with Retrieval-Augmented Multimodal Semantic Perception | Guangyuan Liu et.al. | 2505.23275 | null |
2025-05-29 | EmbAdvisor: Adaptive Cache Management for Sustainable LLM Serving | Yuyang Tian et.al. | 2505.23970 | null |
2025-05-29 | KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction | Jang-Hyun Kim et.al. | 2505.23416 | link |
2025-05-28 | Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference | Yue Zhu et.al. | 2505.21919 | null |
2025-05-28 | Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference | Donghyeon Joo et.al. | 2505.22913 | link |
2025-05-28 | Scaling Reasoning without Attention | Xueliang Zhao et.al. | 2505.22425 | null |
2025-05-28 | InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing | Shuaiyi Li et.al. | 2505.22156 | null |
2025-05-28 | gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling | Tianyu Guo et.al. | 2504.14775 | link |
2025-05-27 | Hardware-Efficient Attention for Fast Decoding | Ted Zadouri et.al. | 2505.21487 | null |
2025-05-27 | SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences | Jungyoub Cha et.al. | 2505.20776 | link |
2025-05-27 | TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization | Dingyu Yao et.al. | 2505.19586 | link |
2025-05-27 | EPIC: Efficient Position-Independent Caching for Serving Large Language Models | Junhao Hu et.al. | 2410.15332 | null |
2025-05-26 | HAMburger: Accelerating LLM Inference via Token Smashing | Jingyu Liu et.al. | 2505.20438 | null |
2025-05-26 | O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering | Jianbiao Mei et.al. | 2505.16582 | link |
2025-05-26 | RAP: Runtime-Adaptive Pruning for LLM Inference | Huanrong Liu et.al. | 2505.17138 | null |
2025-05-26 | SLOT: Sample-specific Language Model Optimization at Test-time | Yang Hu et.al. | 2505.12392 | link |
2025-05-26 | PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving | Ahmet Caner Yüzügüler et.al. | 2501.08192 | null |
2025-05-26 | UniICL: An Efficient Unified Framework Unifying Compression, Selection, and Generation | Jun Gao et.al. | 2405.17062 | null |
2025-05-26 | BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems | Yuxin Wang et.al. | 2401.17644 | link |
2025-05-25 | Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps | Jie Ou et.al. | 2505.12731 | null |
2025-05-24 | Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing | Zhaoyuan Su et.al. | 2506.02006 | null |
2025-05-24 | Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query | Yixuan Wang et.al. | 2505.20334 | null |
2025-05-24 | PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs | Tengxuan Liu et.al. | 2505.18610 | link |
2025-05-24 | PersonaX: A Recommendation Agent Oriented User Modeling Framework for Long Behavior Sequence | Yunxiao Shi et.al. | 2503.02398 | link |
2025-05-23 | FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding | Zhibin Wang et.al. | 2505.17694 | null |
2025-05-23 | Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence | Amirhosein Ghasemabadi et.al. | 2505.20325 | null |
2025-05-23 | NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache | Donghyun Son et.al. | 2505.18231 | null |
2025-05-23 | Titanus: Enabling KV Cache Pruning and Quantization On-the-Fly for LLM Acceleration | Peilin Chen et.al. | 2505.17787 | link |
2025-05-23 | ThinkLess: A Training-Free Inference-Efficient Method for Reducing Reasoning Redundancy | Gengyang Li et.al. | 2505.15684 | null |
2025-05-23 | Hogwild! Inference: Parallel LLM Generation via Concurrent Attention | Gleb Rodionov et.al. | 2504.06261 | link |
2025-05-22 | Zebra-Llama: Towards Extremely Efficient Hybrid Models | Mingyu Yang et.al. | 2505.17272 | null |
2025-05-22 | T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning | Amartya Chakraborty et.al. | 2505.16986 | null |
2025-05-22 | NQKV: A KV Cache Quantization Scheme Based on Normal Distribution Characteristics | Zhihang Cai et.al. | 2505.16210 | null |
2025-05-22 | HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving | Zhiwen Chen et.al. | 2505.15793 | null |
2025-05-21 | Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models | Jingcong Liang et.al. | 2505.16056 | link |
2025-05-21 | A Federated Splitting Framework for LLMs: Security, Efficiency, and Adaptability | Zishuai Zhang et.al. | 2505.15683 | link |
2025-05-21 | FlowKV: Enhancing Multi-Turn Conversational Coherence in LLMs via Isolated Key-Value Cache Management | Xiang Liu et.al. | 2505.15347 | null |
2025-05-21 | LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval | Zhenyu Ning et.al. | 2505.15269 | null |
2025-05-21 | AutoData: A Multi-Agent System for Open Web Data Collection | Tianyi Ma et.al. | 2505.15859 | link |
2025-05-21 | Effective and Efficient Schema-aware Information Extraction Using On-Device Large Language Models | Zhihao Wen et.al. | 2505.14992 | null |
2025-05-21 | Can LLMs Maintain Fundamental Abilities under KV Cache Compression? | Xiang Liu et.al. | 2502.01941 | null |
2025-05-20 | Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning | Jiwon Song et.al. | 2505.13866 | link |
2025-05-20 | SkyMemory: A LEO Edge Cache for Transformer Inference Optimization and Scale Out | Thomas Sandholm et.al. | 2505.14427 | null |
2025-05-20 | Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation | Peter Baile Chen et.al. | 2505.14398 | null |
2025-05-20 | CE-LSLM: Efficient Large-Small Language Model Inference and Communication via Cloud-Edge Collaboration | Pengyan Zhu et.al. | 2505.14085 | null |
2025-05-20 | KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments | Junyoung Park et.al. | 2504.15364 | null |
2025-05-20 | Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding | Sakhinana Sagar Srinivas et.al. | 2504.01281 | null |
2025-05-20 | Online Scheduling for LLM Inference with KV Cache Constraints | Patrick Jaillet et.al. | 2502.07115 | null |
2025-05-19 | FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference | Guangda Liu et.al. | 2505.13109 | null |
2025-05-19 | AD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection | Tiankai Yang et.al. | 2505.12594 | link |
2025-05-19 | SubGCache: Accelerating Graph-based RAG with Subgraph-level KV Cache | Qiuyu Zhu et.al. | 2505.10951 | null |
2025-05-19 | FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension | Jushi Kai et.al. | 2505.00570 | null |
2025-05-18 | KVmix: Gradient-Based Layer Importance-Aware Mixed-Precision Quantization for KV Cache | Fei Li et.al. | 2506.08018 | null |
2025-05-16 | Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models | Camille Couturier et.al. | 2505.11271 | null |
2025-05-16 | Accurate KV Cache Quantization with Outlier Tokens Tracing | Yi Su et.al. | 2505.10938 | link |
2025-05-16 | KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse | Huan Yang et.al. | 2503.16525 | null |
2025-05-14 | SALM: A Multi-Agent Framework for Language Model-Driven Social Network Simulation | Gaurav Koley et.al. | 2505.09081 | link |
2025-05-14 | Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization | Minsu Kim et.al. | 2503.18599 | null |
2025-05-13 | Enhancing Cache-Augmented Generation (CAG) with Adaptive Contextual Compression for Scalable Knowledge Integration | Rishabh Agrawal et.al. | 2505.08261 | null |
2025-05-13 | Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs | Lucas Maisonnave et.al. | 2504.13989 | null |
2025-05-12 | SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models | Hang Wu et.al. | 2505.07680 | null |
2025-05-12 | Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains | Ibne Farabi Shihab et.al. | 2505.07274 | null |
2025-05-12 | Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity | Guang Yan et.al. | 2505.07239 | null |
2025-05-12 | PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications | Kuntai Du et.al. | 2505.07203 | null |
2025-05-11 | Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-aware Cache Compression | Feng Cheng et.al. | 2505.06901 | null |
2025-05-09 | Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM | Zehao Fan et.al. | 2505.05772 | null |
2025-05-08 | A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency | Sihyeong Park et.al. | 2505.01658 | link |
2025-05-05 | Large Language Model Partitioning for Low-Latency Inference at the Edge | Dimitrios Kafetzis et.al. | 2505.02533 | null |
2025-05-01 | Spill The Beans: Exploiting CPU Cache Side-Channels to Leak Tokens from Large Language Models | Andrew Adiletta et.al. | 2505.00817 | null |
2025-05-01 | QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | Yujun Lin et.al. | 2405.04532 | link |
2025-04-29 | CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks | Rui Wang et.al. | 2504.21228 | null |
2025-04-28 | semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage | Ke Hong et.al. | 2504.19867 | null |
2025-04-25 | ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference | Hanshi Sun et.al. | 2410.21465 | link |
2025-04-24 | L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference | Qingyuan Liu et.al. | 2504.17584 | null |
2025-04-22 | SeaLLM: Service-Aware and Latency-Optimized Resource Sharing for Large Language Model Inference | Yihao Zhao et.al. | 2504.15720 | null |
2025-04-22 | Optimizing SLO-oriented LLM Serving with PD-Multiplexing | Weihao Cui et.al. | 2504.14489 | null |
2025-04-21 | Splitwiser: Efficient LM inference with constrained resources | Asad Aali et.al. | 2505.03763 | link |
2025-04-21 | LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention | Shang Yang et.al. | 2502.14866 | link |
2025-04-21 | FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving | Zihao Ye et.al. | 2501.01005 | link |
2025-04-21 | Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design | Rui Xie et.al. | 2503.18869 | null |
2025-04-20 | Understanding and Optimizing Multi-Stage AI Inference Pipelines | Abhimanyu Rajeshkumar Bambhaniya et.al. | 2504.09775 | null |
2025-04-19 | Improving the Serving Performance of Multi-LoRA Large Language Models via Efficient LoRA and KV Cache Management | Hang Zhang et.al. | 2505.03756 | null |
2025-04-18 | LogicTree: Structured Proof Exploration for Coherent and Rigorous Logical Reasoning with Large Language Models | Kang He et.al. | 2504.14089 | null |
2025-04-18 | HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing | Myunghyun Rhee et.al. | 2504.16112 | null |
2025-04-16 | Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading | Kihyun Kim et.al. | 2504.11816 | link |
2025-04-16 | Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs | Hyungwoo Lee et.al. | 2504.11765 | null |
2025-04-15 | Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints | Ruicheng Ao et.al. | 2504.11320 | link |
2025-04-14 | AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference | Yangshen Deng et.al. | 2504.10326 | null |
2025-04-14 | KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference | Yuxuan Tian et.al. | 2504.09936 | null |
2025-04-13 | Efficient LLM Serving on Hybrid Real-time and Best-effort Requests | Wan Borui et.al. | 2504.09590 | null |
2025-04-11 | Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash | Fucheng Jia et.al. | 2504.08378 | null |
2025-04-11 | Boosting Universal LLM Reward Design through Heuristic Reward Observation Space Evolution | Zen Kit Heng et.al. | 2504.07596 | null |
2025-04-10 | Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving | Shihong Gao et.al. | 2504.07494 | link |
2025-04-10 | Marconi: Prefix Caching for the Era of Hybrid LLMs | Rui Pan et.al. | 2411.19379 | null |
2025-04-10 | UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference | Weikai Xu et.al. | 2504.07479 | null |
2025-04-09 | Saliency-driven Dynamic Token Pruning for Large Language Models | Yao Tao et.al. | 2504.04514 | null |
2025-04-08 | Unifying KV Cache Compression for Large Language Models with LeanKV | Yanqi Zhang et.al. | 2412.03131 | null |
2025-04-08 | SPIRe: Boosting LLM Inference Throughput with Speculative Decoding | Sanjit Neelam et.al. | 2504.06419 | null |
2025-04-08 | HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference | Shuzhang Zhong et.al. | 2504.05897 | link |
2025-04-08 | Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching | Yanhao Dong et.al. | 2504.06319 | null |
2025-04-07 | AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design | Yanbiao Liang et.al. | 2505.03745 | null |
2025-04-03 | CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion | Jiayi Yao et.al. | 2405.16444 | link |
2025-04-03 | HyperRAG: Enhancing Quality-Efficiency Tradeoffs in Retrieval-Augmented Generation with Reranker KV-Cache Reuse | Yuwei An et.al. | 2504.02921 | null |
2025-04-03 | LLM Library Learning Fails: A LEGO-Prover Case Study | Ian Berlot-Attwell et.al. | 2504.03048 | null |
2025-04-02 | MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding | Ranajoy Sadhukhan et.al. | 2408.11049 | link |
2025-04-01 | SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching | Yuxuan Zhu et.al. | 2504.00970 | null |
2025-04-01 | Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB | Anas Dorbani et.al. | 2504.01157 | null |
2025-04-01 | Knowledge-Aware Iterative Retrieval for Multi-Agent Systems | Seyoung Song et.al. | 2503.13275 | null |
2025-03-31 | Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving | Wei Gao et.al. | 2503.24000 | link |
2025-03-30 | PQCache: Product Quantization-based KVCache for Long Context LLM Inference | Hailin Zhang et.al. | 2407.12820 | null |
2025-03-30 | Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference | Wei Tao et.al. | 2503.23294 | null |
2025-03-27 | Solving AI Foundational Model Latency with Telco Infrastructure | Sebastian Barros et.al. | 2504.03708 | null |
2025-03-27 | WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference | Youhui Zuo et.al. | 2503.17922 | link |
2025-03-25 | LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation | Han Chen et.al. | 2503.19950 | link |
2025-03-24 | Jenga: Effective Memory Management for Serving LLM with Heterogeneity | Chen Zhang et.al. | 2503.18292 | null |
2025-03-24 | Mitigating KV Cache Competition to Enhance User Experience in LLM Inference | Haiying Shen et.al. | 2503.13773 | null |
2025-03-24 | EconoServe: Maximizing Multi-Resource Utilization with SLO Guarantees in LLM Serving | Haiying Shen et.al. | 2411.06364 | null |
2025-03-24 | xKV: Cross-Layer SVD for KV-Cache Compression | Chi-Chih Chang et.al. | 2503.18893 | link |
2025-03-21 | MKG-Rank: Enhancing Large Language Models with Knowledge Graph for Multilingual Medical Question Answering | Feiyang Li et.al. | 2503.16131 | null |
2025-03-20 | Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models | Keda Tao et.al. | 2503.16257 | null |
2025-03-20 | SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs | Shibo Jie et.al. | 2503.16163 | null |
2025-03-17 | AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications | Haiying Shen et.al. | 2503.13737 | null |
2025-03-16 | CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences | Ziran Qin et.al. | 2503.12491 | null |
2025-03-12 | PRISM: Efficient Long-Range Reasoning With Short-Context LLMs | Dulhan Jayalath et.al. | 2412.18914 | null |
2025-03-11 | FastCache: Optimizing Multimodal LLM Serving through Lightweight KV-Cache Compression Framework | Jianian Zhu et.al. | 2503.08461 | null |
2025-03-09 | Seesaw: High-throughput LLM Inference via Model Re-sharding | Qidong Su et.al. | 2503.06433 | null |
2025-03-07 | DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference | Jinwei Yao et.al. | 2404.00242 | null |
2025-03-06 | LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression | Souvik Kundu et.al. | 2503.04982 | null |
2025-03-06 | Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning | Giulio Corallo et.al. | 2503.04973 | null |
2025-03-06 | Markov Chain of Thought for Efficient Mathematical Reasoning | Wen Yang et.al. | 2410.17635 | null |
2025-03-05 | Enhancing Memory Efficiency in Large Language Model Training Through Chronos-aware Pipeline Parallelism | Xinyuan Lin et.al. | 2503.03182 | null |
2025-03-03 | WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models | Jian Yuan et.al. | 2503.01330 | null |
2025-03-01 | Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving | Qihui Zhou et.al. | 2503.00392 | null |
2025-02-27 | Dynamic Parallel Tree Search for Efficient LLM Reasoning | Yifu Ding et.al. | 2502.16235 | null |
2025-02-27 | ThinK: Thinner Key Cache by Query-Driven Pruning | Yuhui Xu et.al. | 2407.21018 | null |
2025-02-24 | ELMo-Tune-V2: LLM-Assisted Full-Cycle Auto-Tuning to Optimize LSM-Based Key-Value Stores | Viraj Thakkar et.al. | 2502.17606 | link |
2025-02-24 | The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve? | Zhenheng Tang et.al. | 2502.17535 | null |
2025-02-22 | AIBrix: Towards Scalable, Cost-Effective Large Language Model Inference Infrastructure | The AIBrix Team et.al. | 2504.03648 | null |
2025-02-20 | SpinQuant: LLM quantization with learned rotations | Zechun Liu et.al. | 2405.16406 | null |
2025-02-20 | Compute Or Load KV Cache? Why Not Both? | Shuowei Jin et.al. | 2410.03065 | null |
2025-02-17 | Does RAG Really Perform Bad For Long-Context Processing? | Kun Luo et.al. | 2502.11444 | null |
2025-02-12 | The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems | Linke Song et.al. | 2409.20002 | null |
2025-02-11 | HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment | Youhe Jiang et.al. | 2502.07903 | null |
2025-02-10 | MARM: Unlocking the Future of Recommendation Systems through Memory Augmentation and Scalable Complexity | Xiao Lv et.al. | 2411.09425 | null |
2025-02-08 | ProMoE: Fast MoE-based LLM Serving using Proactive Caching | Xiaoniu Song et.al. | 2410.22134 | null |
2025-02-07 | fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving | Hanfei Yu et.al. | 2502.05370 | null |
2025-02-05 | Accessible and Portable LLM Inference by Compiling Computational Graphs into SQL | Wenbo Sun et.al. | 2502.02818 | null |
2025-02-05 | Qrazor: Reliable and Effortless 4-bit LLM Quantization by Significant Data Razoring | Dongyoung Lee et.al. | 2501.13331 | null |
2025-02-05 | Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation | Shubham Agarwal et.al. | 2502.15734 | null |
2025-02-04 | LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation | Xuan Zhang et.al. | 2410.13846 | link |
2025-02-02 | RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations | Zunhai Su et.al. | 2501.16383 | null |
2025-02-01 | QSpec: Speculative Decoding with Complementary Quantization Schemes | Juntao Zhao et.al. | 2410.11305 | null |
2025-01-30 | State Stream Transformer (SST) : Emergent Metacognitive Behaviours Through Latent State Persistence | Thea Aviss et.al. | 2501.18356 | null |
2025-01-29 | vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention | Ramya Prabhu et.al. | 2405.04437 | link |
2025-01-27 | PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization | Mengzhao Chen et.al. | 2410.05265 | link |
2025-01-25 | Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads | Xingyang He et.al. | 2501.15113 | null |
2025-01-24 | Locality-aware Fair Scheduling in LLM Serving | Shiyi Cao et.al. | 2501.14312 | null |
2025-01-24 | Serving Long-Context LLMs at the Mobile Edge: Test-Time Reinforcement Learning-based Model Caching and Inference Offloading | Minrui Xu et.al. | 2501.14205 | null |
2025-01-24 | EchoLM: Accelerating LLM Serving with Real-time Knowledge Distillation | Yifan Yu et.al. | 2501.12689 | null |
2025-01-23 | A Training-free Sub-quadratic Cost Transformer Model Serving Framework With Hierarchically Pruned Attention | Heejun Lee et.al. | 2406.09827 | null |
2025-01-22 | Yi-Lightning Technical Report | Alan Wake et.al. | 2412.01253 | null |
2025-01-17 | BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching | Zhen Zheng et.al. | 2412.03594 | null |
2025-01-12 | Mell: Memory-Efficient Large Language Model Serving via Multi-GPU KV Cache Management | Liu Qianli et.al. | 2501.06709 | null |
2025-01-06 | The Power of Negative Zero: Datatype Customization for Quantized Large Language Models | Yuzong Chen et.al. | 2501.04052 | link |
2025-01-02 | MSWA: Refining Local Attention with Multi-ScaleWindow Attention | Yixing Xu et.al. | 2501.01039 | null |
2025-01-02 | A Survey on Large Language Model Acceleration based on KV Cache Management | Haoyang Li et.al. | 2412.19442 | link |
2024-12-31 | RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval | Di Liu et.al. | 2409.10516 | link |
2024-12-23 | Deliberation in Latent Space via Differentiable Cache Augmentation | Luyang Liu et.al. | 2412.17747 | null |
2024-12-21 | SYMPHONY: Improving Memory Management for LLM Inference Workloads | Saurabh Agarwal et.al. | 2412.16434 | null |
2024-12-21 | MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool | Cunchen Hu et.al. | 2406.17565 | null |
2024-12-18 | MagicPIG: LSH Sampling for Efficient LLM Generation | Zhuoming Chen et.al. | 2410.16179 | link |
2024-12-18 | Semantic Convergence: Harmonizing Recommender Systems via Two-Stage Alignment and Behavioral Semantic Tokenization | Guanghan Li et.al. | 2412.13771 | null |
2024-12-17 | A System for Microserving of LLMs | Hongyi Jin et.al. | 2412.12488 | null |
2024-12-16 | CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation | Hongxuan Zhang et.al. | 2412.11741 | null |
2024-12-13 | KVDirect: Distributed Disaggregated LLM Inference | Shiyang Chen et.al. | 2501.14743 | null |
2024-12-12 | PowerInfer-2: Fast Large Language Model Inference on a Smartphone | Zhenliang Xue et.al. | 2406.06282 | null |
2024-12-05 | A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts | Suyu Ge et.al. | 2410.01485 | null |
2024-11-27 | FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving | Ao Shen et.al. | 2411.18424 | null |
2024-11-24 | Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments | Nikoleta Iliakopoulou et.al. | 2411.17741 | null |
2024-11-14 | Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning | Yu Fu et.al. | 2410.19258 | link |
2024-11-08 | Eigen Attention: Attention in Low-Rank Space for KV Cache Compression | Utkarsh Saxena et.al. | 2408.05646 | link |
2024-11-02 | NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference | Xuanlin Jiang et.al. | 2411.01142 | null |
2024-10-31 | ALISE: Accelerating Large Language Model Serving with Speculative Scheduling | Youpeng Zhao et.al. | 2410.23537 | null |
2024-10-29 | LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism | Bingyang Wu et.al. | 2404.09526 | link |
2024-10-25 | Fast Inference for Augmented Large Language Models | Rana Shahout et.al. | 2410.18248 | null |
2024-10-24 | Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design | Ruisi Cai et.al. | 2410.19123 | link |
2024-10-23 | Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching | Jie Peng et.al. | 2410.14740 | null |
2024-10-23 | ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference | Xin He et.al. | 2410.17954 | null |
2024-10-21 | Do Large Language Models Need a Content Delivery Network? | Yihua Cheng et.al. | 2409.13761 | link |
2024-10-16 | COMET: Towards Partical W4A4KV4 LLMs Serving | Lian Liu et.al. | 2410.12168 | null |
2024-10-09 | LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management | Yi Xiong et.al. | 2410.00428 | null |
2024-10-08 | KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches | Jiayi Yuan et.al. | 2407.01527 | link |
2024-10-07 | Fast State Restoration in LLM Serving with HCache | Shiwei Gao et.al. | 2410.05004 | null |
2024-10-07 | KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head | Isaac Rehg et.al. | 2410.00161 | link |
2024-10-04 | LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy | Rongzhi Zhang et.al. | 2410.03111 | null |
2024-10-04 | Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization | Seungwoo Son et.al. | 2406.12016 | null |
2024-10-03 | Preble: Efficient Distributed Prompt Scheduling for LLM Serving | Vikranth Srivatsa et.al. | 2407.00023 | link |
2024-10-01 | Self-controller: Controlling LLMs with Multi-round Step-by-step Self-awareness | Xiao Peng et.al. | 2410.00359 | null |
2024-09-23 | Steward: Natural Language Web Automation | Brian Tang et.al. | 2409.15441 | link |
2024-09-23 | BlockLLM: Multi-tenant Finer-grained Serving for Large Language Models | Bodun Hu et.al. | 2404.18322 | null |
2024-09-23 | SEAL: Suite for Evaluating API-use of LLMs | Woojeong Kim et.al. | 2409.15523 | null |
2024-09-21 | LLM-dCache: Improving Tool-Augmented LLMs with GPT-Driven Localized Data Caching | Simranjit Singh et.al. | 2406.06799 | null |
2024-09-11 | Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU | Zhenyu Ning et.al. | 2409.09086 | null |
2024-09-04 | SparQ Attention: Bandwidth-Efficient LLM Inference | Luka Ribar et.al. | 2312.04985 | link |
2024-08-05 | SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving | Andreas Kosmas Kakolyris et.al. | 2408.05235 | null |
2024-08-04 | TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding | Hanshi Sun et.al. | 2404.11912 | link |
2024-08-01 | Intermittent Semi-working Mask: A New Masking Paradigm for LLMs | Mingcong Lu et.al. | 2408.00539 | null |
2024-08-01 | ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition | Lu Ye et.al. | 2402.15220 | link |
2024-07-22 | vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving | Jiale Xu et.al. | 2407.15309 | link |
2024-07-22 | Dissecting Multiplication in Transformers: Insights into LLMs | Luyu Qiu et.al. | 2407.15360 | link |
2024-07-21 | Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks | Zheng Wang et.al. | 2407.08454 | null |
2024-07-18 | QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead | Amir Zandieh et.al. | 2406.03482 | link |
2024-07-11 | Bifurcated Attention: Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs | Ben Athiwaratkun et.al. | 2403.08845 | null |
2024-07-09 | Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving | Ruoyu Qin et.al. | 2407.00079 | link |
2024-06-30 | Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention | Bin Gao et.al. | 2403.19708 | null |
2024-06-28 | InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management | Wonbeom Lee et.al. | 2406.19707 | null |
2024-06-19 | VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework | Zhi Yao et.al. | 2406.13399 | null |
2024-06-16 | EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism | Yanxi Chen et.al. | 2312.04916 | link |
2024-06-08 | QCQA: Quality and Capacity-aware grouped Query Attention | Vinay Joshi et.al. | 2406.10247 | null |
2024-06-06 | SGLang: Efficient Execution of Structured Language Model Programs | Lianmin Zheng et.al. | 2312.07104 | link |
2024-05-31 | Cached Model-as-a-Resource: Provisioning Large Language Model Agents for Edge Intelligence in Space-air-ground Integrated Networks | Minrui Xu et.al. | 2403.05826 | null |
2024-05-13 | Hydragen: High-Throughput LLM Inference with Shared Prefixes | Jordan Juravsky et.al. | 2402.05099 | link |
2024-04-15 | Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models | Siyan Zhao et.al. | 2404.09529 | link |
2024-03-26 | ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching | Youpeng Zhao et.al. | 2403.17312 | null |
2024-03-18 | FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines | Jiaao He et.al. | 2403.11421 | null |
2024-03-12 | GPT-4V(ision) is a Generalist Web Agent, if Grounded | Boyuan Zheng et.al. | 2401.01614 | link |
2024-03-11 | Large Language Models as Tool Makers | Tianle Cai et.al. | 2305.17126 | link |
2024-02-16 | When Large Language Model Agents Meet 6G Networks: Perception, Grounding, and Alignment | Minrui Xu et.al. | 2401.07764 | null |
2024-02-04 | LLM-Enhanced Data Management | Xuanhe Zhou et.al. | 2402.02643 | link |
2024-01-16 | GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching | Cong Guo et.al. | 2401.08156 | link |
2024-01-16 | LLMs for Test Input Generation for Semantic Caches | Zafaryab Rasool et.al. | 2401.08138 | null |
2023-06-09 | S$^{3}$: Increasing GPU Utilization during Generative Inference for Higher Throughput | Yunho Jin et.al. | 2306.06000 | null |