TY - JOUR
T1 - Spike-Driven Lightweight Large Language Model With Evolutionary Computation
AU - Zhang, Malu
AU - Wei, Wenjie
AU - Zhou, Zijian
AU - Liu, Wanlong
AU - Zhang, Jie
AU - Belatreche, Ammar
AU - Yang, Yang
PY - 2025/9/5
Y1 - 2025/9/5
N2 - Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, but their deployment in resource-constrained environments remains challenging due to substantial memory and computational requirements. Benefiting from the sparse event-driven computation paradigm of Spiking Neural Networks (SNNs), some research has focused on designing spike-based language models. However, existing spike-based language models achieve only partial computational efficiency gains and fail to address memory constraints comprehensively. In this paper, we propose an evolved and quantized spike-driven language model (EQ-SpikeLM) to address identified challenges. This model incorporates two primary innovations. First, inspired by the artificial bee colony algorithm in evolutionary computation, we propose an architecture evolution method, namely ABC-Arc. This method optimizes network topology by systematically removing redundant neural pathways. Second, a dynamic post-training quantization (DynPTQ) strategy is developed for the evolved SpikeLM, facilitating the conversion of floating-point parameters to lower-bit precision without requiring model retraining. By combining these two methods, EQ-SpikeLM significantly reduces storage and computational demands while preserving model performance. Experimental evaluation on the GLUE benchmark demonstrates EQ-SpikeLM's ability to maintain performance equivalent to its uncompressed counterpart, with a substantial reduction in both model size and power consumption. These results position EQ-SpikeLM as a viable approach for deploying large language models in resource-constrained edge computing scenarios.
AB - Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, but their deployment in resource-constrained environments remains challenging due to substantial memory and computational requirements. Benefiting from the sparse event-driven computation paradigm of Spiking Neural Networks (SNNs), some research has focused on designing spike-based language models. However, existing spike-based language models achieve only partial computational efficiency gains and fail to address memory constraints comprehensively. In this paper, we propose an evolved and quantized spike-driven language model (EQ-SpikeLM) to address identified challenges. This model incorporates two primary innovations. First, inspired by the artificial bee colony algorithm in evolutionary computation, we propose an architecture evolution method, namely ABC-Arc. This method optimizes network topology by systematically removing redundant neural pathways. Second, a dynamic post-training quantization (DynPTQ) strategy is developed for the evolved SpikeLM, facilitating the conversion of floating-point parameters to lower-bit precision without requiring model retraining. By combining these two methods, EQ-SpikeLM significantly reduces storage and computational demands while preserving model performance. Experimental evaluation on the GLUE benchmark demonstrates EQ-SpikeLM's ability to maintain performance equivalent to its uncompressed counterpart, with a substantial reduction in both model size and power consumption. These results position EQ-SpikeLM as a viable approach for deploying large language models in resource-constrained edge computing scenarios.
KW - Evolutionary computation
KW - Large language model
KW - Model compression
KW - Spiking neural networks
UR - https://www.scopus.com/pages/publications/105015334789
U2 - 10.1109/TEVC.2025.3606613
DO - 10.1109/TEVC.2025.3606613
M3 - Article
AN - SCOPUS:105015334789
SN - 1089-778X
JO - IEEE Transactions on Evolutionary Computation
JF - IEEE Transactions on Evolutionary Computation
ER -