# Awesome-LLM-IoT-Papers

**Repository Path**: KAIWEILIUCC/awesome-llm-io-t-papers

## Basic Information

- **Project Name**: Awesome-LLM-IoT-Papers
- **Description**: A collection of papers on LLM applications in the IoT field.
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-10-11
- **Last Updated**: 2026-01-21

## Categories & Tags

**Categories**: Uncategorized

**Tags**: IOT, networking, system, Network, internet-of-things

## README

# Awesome-LLM-IoT-Papers
[![Awesome](https://awesome.re/badge.svg)](https://awesome.re)
![](https://img.shields.io/github/last-commit/KAIWEILIUCC/Awesome-LLM-IoT-Papers?color=green)


## Table of Contents
- [📜 Surveys](#surveys)
- [🎲 LLM Agents](#llm-agents) 
- [🚀 Edge FM](#edge-fm)
- [🔊 Sensor Data Understanding](#sensor-data-understanding)
- [📊 Sensor Data Generation](#sensor-data-generation)
- [💻 Code Generation](#code-generation)
- [💡 Interesting Applications](#interesting-applications)
- [🧑‍⚕️ Smart Health](#smart-health)
- [🤖 Robotics](#robotics)
- [💻 Human-computer Interaction](#human-computer-interaction)
- [🔗 Resources](#resources)

## Surveys
[A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis](https://arxiv.org/abs/2506.12263).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2506.12263)

[The role of Large Language Models in addressing IoT challenges: A systematic literature review](https://www.sciencedirect.com/science/article/pii/S0167739X25001244).

[Foundation Models for CPS-IoT: Opportunities and Challenges](https://arxiv.org/abs/2501.16368).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2501.16368)

[LLMs and IoT: A Comprehensive Survey on Large Language Models and the Internet of Things](https://www.techrxiv.org/users/894996/articles/1271502/master/file/data/LLMs%20and%20IoT_A_Comprehensive_Survey_V2/LLMs%20and%20IoT_A_Comprehensive_Survey_V2.pdf?inline=true).


[A Review on Edge Large Language Models: Design, Execution, and Applications](https://dl.acm.org/doi/full/10.1145/3719664).


[Large Language Models for Network Intrusion Detection Systems: Foundations, Implementations, and Future Directions](https://arxiv.org/abs/2507.04752).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.04752)


[Toward Edge General Intelligence via Large Language Models: Opportunities and Challenges](https://ieeexplore.ieee.org/abstract/document/10876185).


[Large Language Models in Smart Grid: Applications and Risks](https://link.springer.com/chapter/10.1007/978-3-031-96146-5_1).


[Small Language Models: Survey, Measurements, and Insights](https://arxiv.org/abs/2409.15790).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2409.15790)


[From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalities](https://arxiv.org/abs/2412.11694).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.11694)


## LLM Agents
AutoDroid: [LLM-powered Task Automation in Android](https://arxiv.org/abs/2308.15272) (MobiCom 2024)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2308.15272)

MobileGPT: [Augmenting LLM with Human-like App Memory for Mobile Task Automation](https://arxiv.org/abs/2312.03003) (MobiCom 2024)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.03003)
[![Star](https://img.shields.io/github/stars/mobilegptsys/MobileGPT.svg?style=social&label=Star)](https://github.com/mobilegptsys/MobileGPT.git)

[Poster: Enabling Agent-centric Interaction on Smartphones with LLM-based UI Reassembling](https://dl.acm.org/doi/10.1145/3643832.3661432) (MobiSys 2024) 


[LLMind: Orchestrating AI and IoT with LLM for Complex Task Execution](https://arxiv.org/abs/2312.09007).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.09007)


[LLMind 2.0: Distributed IoT Automation with Natural Language M2M Communication and Lightweight LLM Agents](https://www.arxiv.org/abs/2508.13920).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://www.arxiv.org/abs/2508.13920)


TaskSense: [A Translation-like Approach for Tasking Heterogeneous Sensor Systems with LLMs.](https://dl.acm.org/doi/10.1145/3715014.3722070) (SenSys 2025)


[Toward Sensor-In-the-Loop LLM Agent: Benchmarks and Implications.](https://doi.org/10.1145/3715014.3722082) (SenSys 2025)

ContextAgent: [Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions](https://arxiv.org/abs/2505.14668). (NeurIPS 2025)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2505.14668)

SensorMCP: [A Model Context Protocol Server for Custom Sensor Tool Creation](https://guoyunqi.com/assets/pdf/sensormcp-netaisys2025.pdf). (NetAISys 2025)

[Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment](https://arxiv.org/abs/2503.15937).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.15937)


[Empowering Agentic Video Analytics Systems with Video Language Models](https://arxiv.org/abs/2505.00254).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2505.00254)


AutoBridge: [Automating Smart Device Integration with Centralized Platform](https://arxiv.org/abs/2507.23178).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.23178)


[LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction](https://arxiv.org/abs/2507.04748).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.04748)


[Towards Privacy-Preserving and Personalized Smart Homes via Tailored Small Language Models](https://arxiv.org/abs/2507.08878).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.08878)


UIShift: [Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning](https://arxiv.org/abs/2505.12493).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2505.12493)


MCPWorld: [A Unified Benchmarking Testbed for API, GUI, and Hybrid Computer Use Agents](https://arxiv.org/abs/2506.07672).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2506.07672)


MobileViews: [A Large-Scale Mobile GUI Dataset](https://arxiv.org/abs/2409.14337).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2409.14337)


AutoDroid-V2: [Boosting SLM-based GUI Agents via Code Generation](https://arxiv.org/abs/2412.18116).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.18116)


LLM-Explorer: [Towards Efficient and Affordable LLM-based Exploration for Mobile Apps](https://arxiv.org/abs/2505.10593).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2505.10593)


[Process-Supervised Reinforcement Learning for Interactive Multimodal Tool-Use Agents](https://arxiv.org/abs/2509.14480).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2509.14480)


InfiGUIAgent: [A Multimodal Generalist GUI Agent with Native Reasoning and Reflection](https://arxiv.org/abs/2501.04575).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2501.04575)


AgentCPM-GUI: [Building Mobile-Use Agents with Reinforcement Fine-Tuning](https://arxiv.org/abs/2506.01391).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2506.01391)


[Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment](https://arxiv.org/abs/2503.15937).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.15937)


FingerTip 20K: [A Benchmark for Proactive and Personalized Mobile LLM Agents](https://arxiv.org/abs/2507.21071).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.21071)


AgentSense: [LLMs Empower Generalizable and Explainable Web-Based Participatory Urban Sensing](https://arxiv.org/abs/2510.19661).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2510.19661)


IndusGCC: [A Data Benchmark and Evaluation Framework for GUI-Based General Computer Control in Industrial Automation](https://www.arxiv.org/abs/2509.01199).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://www.arxiv.org/abs/2509.01199)


IoT-MCP: [Bridging LLMs and IoT Systems Through Model Context Protocol](https://arxiv.org/abs/2510.01260).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2510.01260)


AsyncVoice Agent: [Real-Time Explanation for LLM Planning and Reasoning](https://arxiv.org/abs/2510.16156).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2510.16156)


VeriSafe Agent: [Safeguarding Mobile GUI Agent via Logic-based Action Verification](https://arxiv.org/abs/2503.18492).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.18492)


DroidCall: [A Dataset for LLM-powered Android Intent Invocation](https://arxiv.org/abs/2412.00402).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.00402)


[More Than Meets the Eye? Uncovering the Reasoning-Planning Disconnect in Training Vision-Language Driving Models](https://arxiv.org/abs/2510.04532).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2510.04532)


[BAMAS: Structuring Budget-Aware Multi-Agent Systems](https://arxiv.org/abs/2511.21572).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2511.21572)


HiveMind: [Contribution-Guided Online Prompt Optimization of LLM Multi-Agent Systems](https://www.arxiv.org/abs/2512.06432).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://www.arxiv.org/abs/2512.06432)


DIMGen: [Dynamic Intent Macro Generation for Efficient LLM-Driven Mobile Automation](https://dl.acm.org/doi/10.1145/3737904.3768533).  


NESTFUL: [A Benchmark for Evaluating LLMs on Nested Sequences of API Calls](https://arxiv.org/abs/2409.03797).  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2409.03797)


## Edge FM
MELTing Point: [Mobile Evaluation of Language Transformers](https://arxiv.org/abs/2403.12844) (MobiCom 2024)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.12844)

Your Data, Your Model: [A Framework for Training and Deploying Foundational Language Models for Embedded Devices.](https://dl.acm.org/doi/10.1145/3636534.3695901) (MobiCom 2024)


[Federated Black-box Prompt Tuning System for Large Language Models on the Edge.](https://dl.acm.org/doi/10.1145/3636534.3698856) (MobiCom 2024)

[A Framework for Training and Deploying Foundational Language Models for Embedded Sensing.](https://dl.acm.org/doi/10.1145/3636534.3695901) (MobiCom 2024)

EdgeFM: [Leveraging Foundation Model for Open-set Learning on the Edge](https://arxiv.org/abs/2311.10986) (SenSys 2023)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.10986)


Confidant: [Customizing Transformer-based LLMs via Collaborative Training on Mobile Devices](https://yshu.org/paper/mobicom25confidant.pdf) (MobiCom 2025)  


[Dynamic Sparse Attention on Mobile SoCs](https://arxiv.org/abs/2508.16703)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2508.16703)


MobiEdit: [Resource-efficient Knowledge Editing for Personalized On-device LLMs](https://www.arxiv.org/abs/2506.13772)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://www.arxiv.org/abs/2506.13772)


PhoneLM: [an Efficient and Capable Small Language Model Family through Principled Pre-training](https://arxiv.org/abs/2411.05046)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.05046)


[LLM as a System Service on Mobile Devices](https://arxiv.org/abs/2403.11805)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.11805)


LoRASuite: [Efficient LoRA Adaptation Across Large Language Model Upgrades](https://arxiv.org/abs/2505.13515)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2505.13515)


ELMS: [Elasticized Large Language Models On Mobile Devices](https://arxiv.org/pdf/2409.09071)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/pdf/2409.09071)


[Serving MoE Models on Resource-constrained Edge Devices via Dynamic Expert Swapping](https://ieeexplore.ieee.org/document/11022729)  


HAPE: [Hardware-Aware LLM Pruning For Efficient On-Device Inference Optimization](https://dl.acm.org/doi/10.1145/3744244)  


[Demystifying Small Language Models for Edge Deployment](https://aclanthology.org/2025.acl-long.718/) 


TinyLLM: [A Framework for Training and Deploying Language Models at the Edge Computerss](https://arxiv.org/abs/2412.15304)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.15304)


Confidant: [Customizing Transformer-based LLMs via Collaborative Edge Training](https://arxiv.org/abs/2311.13381)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.13381)


Modality Plug-and-Play: [Runtime Modality Adaptation in LLM-Driven Autonomous Mobile Systems](https://dl.acm.org/doi/10.1145/3680207.3723491)  


D2MoE: [Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving](https://arxiv.org/abs/2504.15299)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2504.15299)


[Elastic On-Device LLM Service](https://arxiv.org/abs/2409.09071)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2409.09071)


PerCache: [Predictive Hierarchical Cache for RAG Applications on Mobile Devices](https://arxiv.org/abs/2601.11553)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2601.11553)


## Sensor Data Understanding
[Improving On-Device LLMs' Sensory Understanding with Embedding Interpolations](https://dl.acm.org/doi/10.1145/3636534.3697456) (MobiCom 2024)

Penetrative AI: [Making LLMs Comprehend the Physical World](https://arxiv.org/abs/2310.09605) (ACL 2024)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2310.09605)

[Exploring the Capabilities of LLMs for IMU-based Fine-grained Human Activity Understanding](https://arxiv.org/abs/2504.02878) (ACM FMSys 2025)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2504.02878)


ContextLLM: [Meaningful Context Reasoning from Multi-Sensor and Multi-Device Data Using LLMs.](https://dl.acm.org/doi/10.1145/3708468.3711892) (HotMobile 2025)  


SensorBench: [Benchmarking LLMs in Coding-Based Sensor Processing](https://dl.acm.org/doi/10.1145/3708468.3711882) (HotMobile 2025)  


[Making Sensing Interactive and Descriptive with LLMs: Context Reasoning from Multi-Sensor Data](https://dl.acm.org/doi/10.1145/3708468.3715686) (HotMobile 2025)  


Babel: [A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment](https://arxiv.org/abs/2407.17777)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.17777)


[Empowering Agentic Video Analytics Systems with Video Language Models](https://arxiv.org/abs/2505.00254)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2505.00254)


SensorQA: [A Question Answering Benchmark for Daily-Life Monitoring](https://arxiv.org/abs/2501.04974)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2501.04974)


ChainStream: [An LLM-based Framework for Unified Synthetic Sensing](https://arxiv.org/abs/2412.15240)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.15240)


MASTER: [A Multi-modal Foundation Model for Human Activity Recognition](https://dl.acm.org/doi/10.1145/3749511)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://dl.acm.org/doi/10.1145/3749511)


LLM-CoSen: [Revisiting Collaborative Sensing With Large Language Models (LLMs)](https://ieeexplore.ieee.org/document/11051039)  


## Sensor Data Generation

[High Resolution Millimeter Wave Imaging Based on FMCW Radar Systems at W-Band](https://dl.acm.org/doi/10.1145/3711875.3729162)


SHADE-AD: [An LLM-Based Framework for Synthesizing Activity Data of Alzheimer's Patients.](https://dl.acm.org/doi/10.1145/3715014.3722062) (SenSys 2025)


DailyLLM: [Context-Aware Activity Log Generation Using Multi-Modal Sensors and LLMs](https://arxiv.org/abs/2507.13737)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.13737)


Spider: [Any-to-Many Multimodal LLM](https://arxiv.org/abs/2411.09439)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.09439)


CCC: [cross-modal contrastive creator for end-to-end sign language generation](https://link.springer.com/article/10.1007/s44443-025-00418-3#:~:text=This%20paper%20proposes%20the%20Cross,language%20videos%20directly%20from%20text.)


## Code Generation
AutoIOT: [LLM-Driven Automated Natural Language Programming for AIoT Applications](https://arxiv.org/abs/2503.05346) (MobiCom 2025)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.05346)
[![Star](https://img.shields.io/github/stars/lemingshen/AutoIOT.svg?style=social&label=Star)](https://github.com/lemingshen/AutoIOT.git)

GPIoT: [Tailoring Small Language Models for IoT Program Synthesis and Development](https://arxiv.org/abs/2503.00686) (SenSys 2025)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.00686)
[![Star](https://img.shields.io/github/stars/lemingshen/GPIoT.svg?style=social&label=Star)](https://github.com/lemingshen/GPIoT.git)

CheckMate: [LLM-Powered Approximate Intermittent Computing](https://dl.acm.org/doi/10.1145/3715014.3722056) (SenSys 2025)


[Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis](https://dl.acm.org/doi/abs/10.1145/3658617.3697616) 


[LLM for Complex Signal Processing in FPGA-based Software Defined Radios: A Case Study on FFT](https://ieeexplore.ieee.org/document/10757597) 


WebGen-Agent: [Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning](https://arxiv.org/abs/2509.22644)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2509.22644)


DeepFeature: [Iterative Context-aware Feature Generation for Wearable Biosignals](https://arxiv.org/abs/2512.08379)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2512.08379)


## Interesting Applications
TransCompressor: [LLM-Powered Multimodal Data Compression for Smart Transportation](https://arxiv.org/abs/2411.16020)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.16020)

Llambda: [An LLM-Empowered Low-Resolution Vision System for On-Device Human Behavior Understanding](https://arxiv.org/pdf/2505.01743)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/pdf/2505.01743)

IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models (Arxiv) [Paper](https://arxiv.org/abs/2410.02429)


LightLLM: [A Versatile Large Language Model for Predictive Light Sensing](https://arxiv.org/abs/2411.15211) (SenSys 2025)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.15211)


FlexiFly: [Interfacing the Physical World with Foundation Models Empowered by Reconfigurable Drone Systems](https://arxiv.org/abs/2403.12853) (SenSys 2025)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.12853)


Socialmind: [LLM-based proactive ar social assistive system with human-like perception for in-situ live interactions](https://arxiv.org/abs/2412.04036) (IMWUT 2025)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.04036)

TRAMBA: [A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms](https://arxiv.org/abs/2405.01242) (IMWUT 2024)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2405.01242)

[Exploring Foundation Models in Detecting Concerning Daily Functioning in Psychotherapeutic Context Based on Images from Smart Home Devices](https://ieeexplore.ieee.org/abstract/document/10590355) (ACM FMSys 2024)

Sensor2Scene: [Foundation Model-Driven Interactive Realities](https://ieeexplore.ieee.org/abstract/document/10590268) (ACM FMSys 2024)

[See Where You Read with Eye Gaze Tracking and Large Language Model](https://arxiv.org/abs/2409.19454)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2409.19454)


[Empower Vision Applications with LoRA LMM](https://arxiv.org/abs/2411.00915)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.00915)


[Can we make FCC Experts out of LLMs?](https://dl.acm.org/doi/10.1145/3708468.3711885) (HotMobile 2025)


RouteLLM: [A Large Language Model with Native Route Context Understanding to Enable Context-Aware Reasoning](https://dl.acm.org/doi/10.1145/3749552) (IMWUT 2025)


[Congestion Control System Optimization with Large Language Models](https://arxiv.org/abs/2508.16074)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2508.16074)


[Enhancing User Engagement in Socially-Driven Dialogue through Interactive LLM Alignments](https://arxiv.org/abs/2506.21497)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2506.21497)


SecurityLingua: [Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression](https://arxiv.org/abs/2506.12707)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2506.12707)


[Toward Foundation Models for Online Complex Event Detection in CPS-IoT: A Case Study](https://dl.acm.org/doi/abs/10.1145/3722565.3727198) 


RealFactBench: [A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking](https://arxiv.org/abs/2506.12538)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2506.12538)


[Large Language Model-Guided Disentangled Belief Representation Learning on Polarized Social Graphs](https://ieeexplore.ieee.org/abstract/document/10637650) 


[Decoding the Silent Majority: Inducing Belief Augmented Social Graph with Large Language Model for Response Forecasting](https://arxiv.org/abs/2310.13297)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2310.13297)


SA-OOSC: [A Multimodal LLM-Distilled Semantic Communication Framework for Enhanced Coding Efficiency with Scenario Understanding](https://arxiv.org/abs/2509.07436)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2509.07436)


[Leveraging AI Agents for Autonomous Networks: A Reference Architecture and Empirical Studies](https://arxiv.org/abs/2509.08312)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2509.08312)


[Inducing Causal World Models in LLMs for Zero-Shot Physical Reasoning](https://arxiv.org/abs/2507.19855)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.19855)


[Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models](https://arxiv.org/abs/2508.12587)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2508.12587)


[Large Language Models for Wireless Communications: From Adaptation to Autonomy](https://arxiv.org/abs/2507.21524)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.21524)


ThermiKit: [Edge-Optimized LWIR Analytics with Agent-Driven Interactions](https://dl.acm.org/doi/10.1145/3737905.3769284)  


[All You Need for Object Detection: From Pixels, Points, and Prompts to Next-Gen Fusion and Multimodal LLMs/VLMs in Autonomous Vehicles](https://arxiv.org/abs/2510.26641)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2510.26641)


[Edge-IoT and MLLMs for Education and Scene Understanding: Assisting Vision and Hearing-Impaired Individuals](https://ieeexplore.ieee.org/document/11048799)  


SA-OOSC: [A Multimodal LLM-Distilled Semantic Communication Framework for Enhanced Coding Efficiency with Scenario Understanding](https://arxiv.org/abs/2509.07436)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2509.07436)


LLM-Assisted IoT Testing: [Finding Conformance Bugs in Matter SDKs](https://dl.acm.org/doi/10.1145/3680207.3765257) 


CBM-RAG: [Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models](https://arxiv.org/abs/2504.20898)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2504.20898)


[Multimodal LLM for Patient Activity Recognition: Integrating Video, Audio, and Text in Clinical Environments](https://pubmed.ncbi.nlm.nih.gov/41052200/)


MedSeg-R: [Reasoning Segmentation in Medical Images with Multimodal Large Language Models](https://arxiv.org/abs/2506.10465)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2506.10465)


[Reasoning Visual Language Model for Chest X-Ray Analysis](https://arxiv.org/abs/2510.23968)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2510.23968)


## Smart Health
DrHouse: [An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert Knowledge](https://arxiv.org/abs/2405.12541) (IMWUT 2024)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2405.12541)

[LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices](https://arxiv.org/abs/2403.10779)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.10779)

AutoLife: [Automatic Life Journaling with Smartphones and LLMs](https://arxiv.org/abs/2412.15714)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.15714)

MDTeamGPT: [A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation](https://arxiv.org/abs/2503.13856)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.13856)


[Introduction to the Special Issue on Large Language Models, Conversational Systems, and Generative AI in Health—Part 1](https://dl.acm.org/doi/10.1145/3723454)


CataractBot: [An LLM-powered Expert-in-the-Loop Chatbot for Cataract Patients](https://dl.acm.org/doi/10.1145/3729479)

GLOSS: [Group of LLMs for Open-ended Sensemaking of Passive Sensing Data for Health and Wellbeing.](https://dl.acm.org/doi/10.1145/3749474)


Demo: Myotrainer: [Muscle-Aware Motion Analysis and Feedback System for In-Home Resistance Training](https://dl.acm.org/doi/10.1145/3666025.3699397)

[A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer](https://arxiv.org/abs/2508.16569)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2508.16569)


DermINO: [Hybrid Pretraining for a Versatile Dermatology Foundation Model](https://arxiv.org/abs/2508.12190)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2508.12190)


DynamiCare: [A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making](https://arxiv.org/abs/2507.02616)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.02616)


[Demo Abstract: An LLM-Powered Multimodal Mobile Sensing System for Personalized and Interactive Health Behavior Analysis](https://dl.acm.org/doi/10.1145/3715014.3724376) 


DocCHA: [Towards LLM-Augmented Interactive Online diagnosis System](https://arxiv.org/abs/2507.07870)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2507.07870)


MMedAgent: [Learning to Use Medical Tools with Multi-modal Agent](https://arxiv.org/abs/2407.02483)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.02483)


MedAgentSim: [Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions](https://arxiv.org/html/2503.22678v2)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/html/2503.22678v2)


MegaAgent: [A Large-Scale Autonomous LLM-based Multi-Agent System Without Predefined SOPs](https://arxiv.org/abs/2408.09955)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2408.09955)


Fleming-VL: [Towards Universal Medical Visual Reasoning with Multimodal LLMs](https://arxiv.org/abs/2511.00916)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2511.00916)


Lingshu: [A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning](https://arxiv.org/abs/2506.07044)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2506.07044)


SynLLM: [A Comparative Analysis of Large Language Models for Medical Tabular Synthetic Data Generation via Prompt Engineering](https://arxiv.org/abs/2508.08529)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2508.08529)


Myo-Trainer: [A Vision-based Muscle-Aware Motion Feedback System for In-Home Resistance Training](https://dl.acm.org/doi/10.1145/3680207.3765243)  


MedAide: [Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration](https://arxiv.org/html/2410.12532v2)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/html/2410.12532v2)


MedPlan: [A Two-Stage RAG-Based System for Personalized Medical Plan Generation](https://arxiv.org/abs/2503.17900)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.17900)


## Robotics
RoboInspector: [Unveiling the Unreliability of Policy Code for LLM-enabled Robotic Manipulation](https://arxiv.org/abs/2508.21378)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2508.21378)

NaVILA: [Legged Robot Vision-Language-Action Model for Navigation](https://arxiv.org/pdf/2412.04453)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/pdf/2412.04453)

[Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation](https://arxiv.org/abs/2407.05890)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.05890)


FSR-VLN: [Fast and Slow Reasoning for Vision-Language Navigation with Hierarchical Multi-modal Scene Graph](https://arxiv.org/abs/2509.13733)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2509.13733)


[Multi-robot Rigid Formation Navigation via Synchronous Motion and Discrete-time Communication-Control Optimization](https://www.arxiv.org/abs/2510.02624)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://www.arxiv.org/abs/2510.02624)


Expertise need not monopolize: [Action-Specialized Mixture of Experts for Vision-Language-Action Learning](https://arxiv.org/abs/2510.14300)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2510.14300)


See, Point, Fly: [A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation](https://arxiv.org/abs/2509.22653)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2509.22653)


Guide-LLM: [An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments](https://arxiv.org/abs/2410.20666)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2410.20666)


## Human-computer Interaction
Memoro: [Using Large Language Models to Realize a Concise Interface for Real-Time Memory Augmentation](https://arxiv.org/abs/2403.02135)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.02135)

[Exploring Large Language Model as an Interactive Sports Coach: Lessons from a Single-Subject Half Marathon Preparation](https://arxiv.org/abs/2509.26593)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2509.26593)


[TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication](https://arxiv.org/abs/2504.01708)  
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2504.01708)


## Resources