# Awesome-LLM-IoT-Papers **Repository Path**: KAIWEILIUCC/awesome-llm-io-t-papers ## Basic Information - **Project Name**: Awesome-LLM-IoT-Papers - **Description**: A collection of papers on LLM applications in the IoT field. - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-10-11 - **Last Updated**: 2026-01-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: IOT, networking, system, Network, internet-of-things ## README # Awesome-LLM-IoT-Papers [![Awesome](https://awesome.re/badge.svg)](https://awesome.re) ![](https://img.shields.io/github/last-commit/KAIWEILIUCC/Awesome-LLM-IoT-Papers?color=green) ## Table of Contents - [πŸ“œ Surveys](#surveys) - [🎲 LLM Agents](#llm-agents) - [πŸš€ Edge FM](#edge-fm) - [πŸ”Š Sensor Data Understanding](#sensor-data-understanding) - [πŸ“Š Sensor Data Generation](#sensor-data-generation) - [πŸ’» Code Generation](#code-generation) - [πŸ’‘ Interesting Applications](#interesting-applications) - [πŸ§‘β€βš•οΈ Smart Health](#smart-health) - [πŸ€– Robotics](#robotics) - [πŸ’» Human-computer Interaction](#human-computer-interaction) - [πŸ”— Resources](#resources) ## Surveys [A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis](https://arxiv.org/abs/2506.12263). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2506.12263) [The role of Large Language Models in addressing IoT challenges: A systematic literature review](https://www.sciencedirect.com/science/article/pii/S0167739X25001244). [Foundation Models for CPS-IoT: Opportunities and Challenges](https://arxiv.org/abs/2501.16368). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2501.16368) [LLMs and IoT: A Comprehensive Survey on Large Language Models and the Internet of Things](https://www.techrxiv.org/users/894996/articles/1271502/master/file/data/LLMs%20and%20IoT_A_Comprehensive_Survey_V2/LLMs%20and%20IoT_A_Comprehensive_Survey_V2.pdf?inline=true). [A Review on Edge Large Language Models: Design, Execution, and Applications](https://dl.acm.org/doi/full/10.1145/3719664). [Large Language Models for Network Intrusion Detection Systems: Foundations, Implementations, and Future Directions](https://arxiv.org/abs/2507.04752). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.04752) [Toward Edge General Intelligence via Large Language Models: Opportunities and Challenges](https://ieeexplore.ieee.org/abstract/document/10876185). [Large Language Models in Smart Grid: Applications and Risks](https://link.springer.com/chapter/10.1007/978-3-031-96146-5_1). [Small Language Models: Survey, Measurements, and Insights](https://arxiv.org/abs/2409.15790). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2409.15790) [From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalities](https://arxiv.org/abs/2412.11694). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.11694) ## LLM Agents AutoDroid: [LLM-powered Task Automation in Android](https://arxiv.org/abs/2308.15272) (MobiCom 2024) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2308.15272) MobileGPT: [Augmenting LLM with Human-like App Memory for Mobile Task Automation](https://arxiv.org/abs/2312.03003) (MobiCom 2024) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.03003) [![Star](https://img.shields.io/github/stars/mobilegptsys/MobileGPT.svg?style=social&label=Star)](https://github.com/mobilegptsys/MobileGPT.git) [Poster: Enabling Agent-centric Interaction on Smartphones with LLM-based UI Reassembling](https://dl.acm.org/doi/10.1145/3643832.3661432) (MobiSys 2024) [LLMind: Orchestrating AI and IoT with LLM for Complex Task Execution](https://arxiv.org/abs/2312.09007). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.09007) [LLMind 2.0: Distributed IoT Automation with Natural Language M2M Communication and Lightweight LLM Agents](https://www.arxiv.org/abs/2508.13920). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://www.arxiv.org/abs/2508.13920) TaskSense: [A Translation-like Approach for Tasking Heterogeneous Sensor Systems with LLMs.](https://dl.acm.org/doi/10.1145/3715014.3722070) (SenSys 2025) [Toward Sensor-In-the-Loop LLM Agent: Benchmarks and Implications.](https://doi.org/10.1145/3715014.3722082) (SenSys 2025) ContextAgent: [Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions](https://arxiv.org/abs/2505.14668). (NeurIPS 2025) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2505.14668) SensorMCP: [A Model Context Protocol Server for Custom Sensor Tool Creation](https://guoyunqi.com/assets/pdf/sensormcp-netaisys2025.pdf). (NetAISys 2025) [Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment](https://arxiv.org/abs/2503.15937). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.15937) [Empowering Agentic Video Analytics Systems with Video Language Models](https://arxiv.org/abs/2505.00254). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2505.00254) AutoBridge: [Automating Smart Device Integration with Centralized Platform](https://arxiv.org/abs/2507.23178). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.23178) [LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction](https://arxiv.org/abs/2507.04748). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.04748) [Towards Privacy-Preserving and Personalized Smart Homes via Tailored Small Language Models](https://arxiv.org/abs/2507.08878). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.08878) UIShift: [Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning](https://arxiv.org/abs/2505.12493). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2505.12493) MCPWorld: [A Unified Benchmarking Testbed for API, GUI, and Hybrid Computer Use Agents](https://arxiv.org/abs/2506.07672). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2506.07672) MobileViews: [A Large-Scale Mobile GUI Dataset](https://arxiv.org/abs/2409.14337). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2409.14337) AutoDroid-V2: [Boosting SLM-based GUI Agents via Code Generation](https://arxiv.org/abs/2412.18116). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.18116) LLM-Explorer: [Towards Efficient and Affordable LLM-based Exploration for Mobile Apps](https://arxiv.org/abs/2505.10593). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2505.10593) [Process-Supervised Reinforcement Learning for Interactive Multimodal Tool-Use Agents](https://arxiv.org/abs/2509.14480). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2509.14480) InfiGUIAgent: [A Multimodal Generalist GUI Agent with Native Reasoning and Reflection](https://arxiv.org/abs/2501.04575). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2501.04575) AgentCPM-GUI: [Building Mobile-Use Agents with Reinforcement Fine-Tuning](https://arxiv.org/abs/2506.01391). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2506.01391) [Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment](https://arxiv.org/abs/2503.15937). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.15937) FingerTip 20K: [A Benchmark for Proactive and Personalized Mobile LLM Agents](https://arxiv.org/abs/2507.21071). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.21071) AgentSense: [LLMs Empower Generalizable and Explainable Web-Based Participatory Urban Sensing](https://arxiv.org/abs/2510.19661). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2510.19661) IndusGCC: [A Data Benchmark and Evaluation Framework for GUI-Based General Computer Control in Industrial Automation](https://www.arxiv.org/abs/2509.01199). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://www.arxiv.org/abs/2509.01199) IoT-MCP: [Bridging LLMs and IoT Systems Through Model Context Protocol](https://arxiv.org/abs/2510.01260). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2510.01260) AsyncVoice Agent: [Real-Time Explanation for LLM Planning and Reasoning](https://arxiv.org/abs/2510.16156). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2510.16156) VeriSafe Agent: [Safeguarding Mobile GUI Agent via Logic-based Action Verification](https://arxiv.org/abs/2503.18492). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.18492) DroidCall: [A Dataset for LLM-powered Android Intent Invocation](https://arxiv.org/abs/2412.00402). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.00402) [More Than Meets the Eye? Uncovering the Reasoning-Planning Disconnect in Training Vision-Language Driving Models](https://arxiv.org/abs/2510.04532). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2510.04532) [BAMAS: Structuring Budget-Aware Multi-Agent Systems](https://arxiv.org/abs/2511.21572). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2511.21572) HiveMind: [Contribution-Guided Online Prompt Optimization of LLM Multi-Agent Systems](https://www.arxiv.org/abs/2512.06432). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://www.arxiv.org/abs/2512.06432) DIMGen: [Dynamic Intent Macro Generation for Efficient LLM-Driven Mobile Automation](https://dl.acm.org/doi/10.1145/3737904.3768533). NESTFUL: [A Benchmark for Evaluating LLMs on Nested Sequences of API Calls](https://arxiv.org/abs/2409.03797). [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2409.03797) ## Edge FM MELTing Point: [Mobile Evaluation of Language Transformers](https://arxiv.org/abs/2403.12844) (MobiCom 2024) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.12844) Your Data, Your Model: [A Framework for Training and Deploying Foundational Language Models for Embedded Devices.](https://dl.acm.org/doi/10.1145/3636534.3695901) (MobiCom 2024) [Federated Black-box Prompt Tuning System for Large Language Models on the Edge.](https://dl.acm.org/doi/10.1145/3636534.3698856) (MobiCom 2024) [A Framework for Training and Deploying Foundational Language Models for Embedded Sensing.](https://dl.acm.org/doi/10.1145/3636534.3695901) (MobiCom 2024) EdgeFM: [Leveraging Foundation Model for Open-set Learning on the Edge](https://arxiv.org/abs/2311.10986) (SenSys 2023) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.10986) Confidant: [Customizing Transformer-based LLMs via Collaborative Training on Mobile Devices](https://yshu.org/paper/mobicom25confidant.pdf) (MobiCom 2025) [Dynamic Sparse Attention on Mobile SoCs](https://arxiv.org/abs/2508.16703) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2508.16703) MobiEdit: [Resource-efficient Knowledge Editing for Personalized On-device LLMs](https://www.arxiv.org/abs/2506.13772) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://www.arxiv.org/abs/2506.13772) PhoneLM: [an Efficient and Capable Small Language Model Family through Principled Pre-training](https://arxiv.org/abs/2411.05046) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.05046) [LLM as a System Service on Mobile Devices](https://arxiv.org/abs/2403.11805) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.11805) LoRASuite: [Efficient LoRA Adaptation Across Large Language Model Upgrades](https://arxiv.org/abs/2505.13515) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2505.13515) ELMS: [Elasticized Large Language Models On Mobile Devices](https://arxiv.org/pdf/2409.09071) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/pdf/2409.09071) [Serving MoE Models on Resource-constrained Edge Devices via Dynamic Expert Swapping](https://ieeexplore.ieee.org/document/11022729) HAPE: [Hardware-Aware LLM Pruning For Efficient On-Device Inference Optimization](https://dl.acm.org/doi/10.1145/3744244) [Demystifying Small Language Models for Edge Deployment](https://aclanthology.org/2025.acl-long.718/) TinyLLM: [A Framework for Training and Deploying Language Models at the Edge Computerss](https://arxiv.org/abs/2412.15304) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.15304) Confidant: [Customizing Transformer-based LLMs via Collaborative Edge Training](https://arxiv.org/abs/2311.13381) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.13381) Modality Plug-and-Play: [Runtime Modality Adaptation in LLM-Driven Autonomous Mobile Systems](https://dl.acm.org/doi/10.1145/3680207.3723491) D2MoE: [Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving](https://arxiv.org/abs/2504.15299) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2504.15299) [Elastic On-Device LLM Service](https://arxiv.org/abs/2409.09071) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2409.09071) PerCache: [Predictive Hierarchical Cache for RAG Applications on Mobile Devices](https://arxiv.org/abs/2601.11553) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2601.11553) ## Sensor Data Understanding [Improving On-Device LLMs' Sensory Understanding with Embedding Interpolations](https://dl.acm.org/doi/10.1145/3636534.3697456) (MobiCom 2024) Penetrative AI: [Making LLMs Comprehend the Physical World](https://arxiv.org/abs/2310.09605) (ACL 2024) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2310.09605) [Exploring the Capabilities of LLMs for IMU-based Fine-grained Human Activity Understanding](https://arxiv.org/abs/2504.02878) (ACM FMSys 2025) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2504.02878) ContextLLM: [Meaningful Context Reasoning from Multi-Sensor and Multi-Device Data Using LLMs.](https://dl.acm.org/doi/10.1145/3708468.3711892) (HotMobile 2025) SensorBench: [Benchmarking LLMs in Coding-Based Sensor Processing](https://dl.acm.org/doi/10.1145/3708468.3711882) (HotMobile 2025) [Making Sensing Interactive and Descriptive with LLMs: Context Reasoning from Multi-Sensor Data](https://dl.acm.org/doi/10.1145/3708468.3715686) (HotMobile 2025) Babel: [A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment](https://arxiv.org/abs/2407.17777) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.17777) [Empowering Agentic Video Analytics Systems with Video Language Models](https://arxiv.org/abs/2505.00254) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2505.00254) SensorQA: [A Question Answering Benchmark for Daily-Life Monitoring](https://arxiv.org/abs/2501.04974) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2501.04974) ChainStream: [An LLM-based Framework for Unified Synthetic Sensing](https://arxiv.org/abs/2412.15240) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.15240) MASTER: [A Multi-modal Foundation Model for Human Activity Recognition](https://dl.acm.org/doi/10.1145/3749511) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://dl.acm.org/doi/10.1145/3749511) LLM-CoSen: [Revisiting Collaborative Sensing With Large Language Models (LLMs)](https://ieeexplore.ieee.org/document/11051039) ## Sensor Data Generation [High Resolution Millimeter Wave Imaging Based on FMCW Radar Systems at W-Band](https://dl.acm.org/doi/10.1145/3711875.3729162) SHADE-AD: [An LLM-Based Framework for Synthesizing Activity Data of Alzheimer's Patients.](https://dl.acm.org/doi/10.1145/3715014.3722062) (SenSys 2025) DailyLLM: [Context-Aware Activity Log Generation Using Multi-Modal Sensors and LLMs](https://arxiv.org/abs/2507.13737) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.13737) Spider: [Any-to-Many Multimodal LLM](https://arxiv.org/abs/2411.09439) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.09439) CCC: [cross-modal contrastive creator for end-to-end sign language generation](https://link.springer.com/article/10.1007/s44443-025-00418-3#:~:text=This%20paper%20proposes%20the%20Cross,language%20videos%20directly%20from%20text.) ## Code Generation AutoIOT: [LLM-Driven Automated Natural Language Programming for AIoT Applications](https://arxiv.org/abs/2503.05346) (MobiCom 2025) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.05346) [![Star](https://img.shields.io/github/stars/lemingshen/AutoIOT.svg?style=social&label=Star)](https://github.com/lemingshen/AutoIOT.git) GPIoT: [Tailoring Small Language Models for IoT Program Synthesis and Development](https://arxiv.org/abs/2503.00686) (SenSys 2025) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.00686) [![Star](https://img.shields.io/github/stars/lemingshen/GPIoT.svg?style=social&label=Star)](https://github.com/lemingshen/GPIoT.git) CheckMate: [LLM-Powered Approximate Intermittent Computing](https://dl.acm.org/doi/10.1145/3715014.3722056) (SenSys 2025) [Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis](https://dl.acm.org/doi/abs/10.1145/3658617.3697616) [LLM for Complex Signal Processing in FPGA-based Software Defined Radios: A Case Study on FFT](https://ieeexplore.ieee.org/document/10757597) WebGen-Agent: [Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning](https://arxiv.org/abs/2509.22644) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2509.22644) DeepFeature: [Iterative Context-aware Feature Generation for Wearable Biosignals](https://arxiv.org/abs/2512.08379) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2512.08379) ## Interesting Applications TransCompressor: [LLM-Powered Multimodal Data Compression for Smart Transportation](https://arxiv.org/abs/2411.16020) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.16020) Llambda: [An LLM-Empowered Low-Resolution Vision System for On-Device Human Behavior Understanding](https://arxiv.org/pdf/2505.01743) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/pdf/2505.01743) IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models (Arxiv) [Paper](https://arxiv.org/abs/2410.02429) LightLLM: [A Versatile Large Language Model for Predictive Light Sensing](https://arxiv.org/abs/2411.15211) (SenSys 2025) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.15211) FlexiFly: [Interfacing the Physical World with Foundation Models Empowered by Reconfigurable Drone Systems](https://arxiv.org/abs/2403.12853) (SenSys 2025) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.12853) Socialmind: [LLM-based proactive ar social assistive system with human-like perception for in-situ live interactions](https://arxiv.org/abs/2412.04036) (IMWUT 2025) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.04036) TRAMBA: [A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms](https://arxiv.org/abs/2405.01242) (IMWUT 2024) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2405.01242) [Exploring Foundation Models in Detecting Concerning Daily Functioning in Psychotherapeutic Context Based on Images from Smart Home Devices](https://ieeexplore.ieee.org/abstract/document/10590355) (ACM FMSys 2024) Sensor2Scene: [Foundation Model-Driven Interactive Realities](https://ieeexplore.ieee.org/abstract/document/10590268) (ACM FMSys 2024) [See Where You Read with Eye Gaze Tracking and Large Language Model](https://arxiv.org/abs/2409.19454) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2409.19454) [Empower Vision Applications with LoRA LMM](https://arxiv.org/abs/2411.00915) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.00915) [Can we make FCC Experts out of LLMs?](https://dl.acm.org/doi/10.1145/3708468.3711885) (HotMobile 2025) RouteLLM: [A Large Language Model with Native Route Context Understanding to Enable Context-Aware Reasoning](https://dl.acm.org/doi/10.1145/3749552) (IMWUT 2025) [Congestion Control System Optimization with Large Language Models](https://arxiv.org/abs/2508.16074) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2508.16074) [Enhancing User Engagement in Socially-Driven Dialogue through Interactive LLM Alignments](https://arxiv.org/abs/2506.21497) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2506.21497) SecurityLingua: [Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression](https://arxiv.org/abs/2506.12707) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2506.12707) [Toward Foundation Models for Online Complex Event Detection in CPS-IoT: A Case Study](https://dl.acm.org/doi/abs/10.1145/3722565.3727198) RealFactBench: [A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking](https://arxiv.org/abs/2506.12538) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2506.12538) [Large Language Model-Guided Disentangled Belief Representation Learning on Polarized Social Graphs](https://ieeexplore.ieee.org/abstract/document/10637650) [Decoding the Silent Majority: Inducing Belief Augmented Social Graph with Large Language Model for Response Forecasting](https://arxiv.org/abs/2310.13297) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2310.13297) SA-OOSC: [A Multimodal LLM-Distilled Semantic Communication Framework for Enhanced Coding Efficiency with Scenario Understanding](https://arxiv.org/abs/2509.07436) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2509.07436) [Leveraging AI Agents for Autonomous Networks: A Reference Architecture and Empirical Studies](https://arxiv.org/abs/2509.08312) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2509.08312) [Inducing Causal World Models in LLMs for Zero-Shot Physical Reasoning](https://arxiv.org/abs/2507.19855) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.19855) [Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models](https://arxiv.org/abs/2508.12587) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2508.12587) [Large Language Models for Wireless Communications: From Adaptation to Autonomy](https://arxiv.org/abs/2507.21524) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.21524) ThermiKit: [Edge-Optimized LWIR Analytics with Agent-Driven Interactions](https://dl.acm.org/doi/10.1145/3737905.3769284) [All You Need for Object Detection: From Pixels, Points, and Prompts to Next-Gen Fusion and Multimodal LLMs/VLMs in Autonomous Vehicles](https://arxiv.org/abs/2510.26641) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2510.26641) [Edge-IoT and MLLMs for Education and Scene Understanding: Assisting Vision and Hearing-Impaired Individuals](https://ieeexplore.ieee.org/document/11048799) SA-OOSC: [A Multimodal LLM-Distilled Semantic Communication Framework for Enhanced Coding Efficiency with Scenario Understanding](https://arxiv.org/abs/2509.07436) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2509.07436) LLM-Assisted IoT Testing: [Finding Conformance Bugs in Matter SDKs](https://dl.acm.org/doi/10.1145/3680207.3765257) CBM-RAG: [Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models](https://arxiv.org/abs/2504.20898) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2504.20898) [Multimodal LLM for Patient Activity Recognition: Integrating Video, Audio, and Text in Clinical Environments](https://pubmed.ncbi.nlm.nih.gov/41052200/) MedSeg-R: [Reasoning Segmentation in Medical Images with Multimodal Large Language Models](https://arxiv.org/abs/2506.10465) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2506.10465) [Reasoning Visual Language Model for Chest X-Ray Analysis](https://arxiv.org/abs/2510.23968) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2510.23968) ## Smart Health DrHouse: [An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert Knowledge](https://arxiv.org/abs/2405.12541) (IMWUT 2024) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2405.12541) [LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices](https://arxiv.org/abs/2403.10779) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.10779) AutoLife: [Automatic Life Journaling with Smartphones and LLMs](https://arxiv.org/abs/2412.15714) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.15714) MDTeamGPT: [A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation](https://arxiv.org/abs/2503.13856) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.13856) [Introduction to the Special Issue on Large Language Models, Conversational Systems, and Generative AI in Healthβ€”Part 1](https://dl.acm.org/doi/10.1145/3723454) CataractBot: [An LLM-powered Expert-in-the-Loop Chatbot for Cataract Patients](https://dl.acm.org/doi/10.1145/3729479) GLOSS: [Group of LLMs for Open-ended Sensemaking of Passive Sensing Data for Health and Wellbeing.](https://dl.acm.org/doi/10.1145/3749474) Demo: Myotrainer: [Muscle-Aware Motion Analysis and Feedback System for In-Home Resistance Training](https://dl.acm.org/doi/10.1145/3666025.3699397) [A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer](https://arxiv.org/abs/2508.16569) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2508.16569) DermINO: [Hybrid Pretraining for a Versatile Dermatology Foundation Model](https://arxiv.org/abs/2508.12190) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2508.12190) DynamiCare: [A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making](https://arxiv.org/abs/2507.02616) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.02616) [Demo Abstract: An LLM-Powered Multimodal Mobile Sensing System for Personalized and Interactive Health Behavior Analysis](https://dl.acm.org/doi/10.1145/3715014.3724376) DocCHA: [Towards LLM-Augmented Interactive Online diagnosis System](https://arxiv.org/abs/2507.07870) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.07870) MMedAgent: [Learning to Use Medical Tools with Multi-modal Agent](https://arxiv.org/abs/2407.02483) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.02483) MedAgentSim: [Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions](https://arxiv.org/html/2503.22678v2) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/html/2503.22678v2) MegaAgent: [A Large-Scale Autonomous LLM-based Multi-Agent System Without Predefined SOPs](https://arxiv.org/abs/2408.09955) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2408.09955) Fleming-VL: [Towards Universal Medical Visual Reasoning with Multimodal LLMs](https://arxiv.org/abs/2511.00916) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2511.00916) Lingshu: [A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning](https://arxiv.org/abs/2506.07044) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2506.07044) SynLLM: [A Comparative Analysis of Large Language Models for Medical Tabular Synthetic Data Generation via Prompt Engineering](https://arxiv.org/abs/2508.08529) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2508.08529) Myo-Trainer: [A Vision-based Muscle-Aware Motion Feedback System for In-Home Resistance Training](https://dl.acm.org/doi/10.1145/3680207.3765243) MedAide: [Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration](https://arxiv.org/html/2410.12532v2) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/html/2410.12532v2) MedPlan: [A Two-Stage RAG-Based System for Personalized Medical Plan Generation](https://arxiv.org/abs/2503.17900) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.17900) ## Robotics RoboInspector: [Unveiling the Unreliability of Policy Code for LLM-enabled Robotic Manipulation](https://arxiv.org/abs/2508.21378) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2508.21378) NaVILA: [Legged Robot Vision-Language-Action Model for Navigation](https://arxiv.org/pdf/2412.04453) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/pdf/2412.04453) [Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation](https://arxiv.org/abs/2407.05890) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.05890) FSR-VLN: [Fast and Slow Reasoning for Vision-Language Navigation with Hierarchical Multi-modal Scene Graph](https://arxiv.org/abs/2509.13733) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2509.13733) [Multi-robot Rigid Formation Navigation via Synchronous Motion and Discrete-time Communication-Control Optimization](https://www.arxiv.org/abs/2510.02624) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://www.arxiv.org/abs/2510.02624) Expertise need not monopolize: [Action-Specialized Mixture of Experts for Vision-Language-Action Learning](https://arxiv.org/abs/2510.14300) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2510.14300) See, Point, Fly: [A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation](https://arxiv.org/abs/2509.22653) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2509.22653) Guide-LLM: [An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments](https://arxiv.org/abs/2410.20666) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2410.20666) ## Human-computer Interaction Memoro: [Using Large Language Models to Realize a Concise Interface for Real-Time Memory Augmentation](https://arxiv.org/abs/2403.02135) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.02135) [Exploring Large Language Model as an Interactive Sports Coach: Lessons from a Single-Subject Half Marathon Preparation](https://arxiv.org/abs/2509.26593) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2509.26593) [TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication](https://arxiv.org/abs/2504.01708) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2504.01708) ## Resources