![]() |
Assistant Professor Before joining UCF, I worked as Senior Research Scientist at Samsung Research AI Center. I obtained my Ph.D. and M.S. degrees from the Luddy School at Indiana University Bloomington. My research interests lie at general machine learning, deep learning and efficient/private systems. I have particular interests in improving the efficiency, privacy, and security of deep learning systems, making deep learning more accessible to the general community, and advancing interdisciplinary research on computer vision, natural language processing, and science tasks by designing novel algorithms, models, and systems. Publications | Research Group | Teaching | CV | Google Scholar |
Prospective BS/MS/PhD students: We will be recruiting highly motivated PhD students who would like to work hard to join our lab starting Spring 2024. Underrepresented minority applicants are encouraged to apply. Please drop me an email to qian.lou@ucf.edu if you are interested. After the interview, please apply through the CS department and include my name as a possible advisor in your application. Have questions? Check out my answers to the PhD Advisor Guide.
02 / 2023 | TrojViT: Trojan Insertion in Vision Transformers is accepted by CVPR 2023. |
02 / 2023 | Primer: Privacy-preserving Transformer on Encrypted Data is accepted by DAC 2023. |
01 / 2023 | TrojText: Test-time Invisible Textual Trojan Insertion is accepted by ICLR 2023. |
10 / 2022 | Weighted value decomposition on language model is accepted by EMNLP 2022. |
09 / 2022 | LITE-MDETR won the silver award in samsung best paper evaluation. |
03 / 2022 | LITE-MDETR is accepted by CVPR 2022. |
02 / 2022 | MATCHA is accepted by DAC 2022. |
01 / 2022 | Language Model Compression is accepted by ICLR 2022. |
01 / 2022 | DictFormer is accepted by ICLR 2022. |
12 / 2021 | SAFENet won the Samsung Research America Q4 best paper award. |
11 / 2021 | coxHE is accepted by DATE 2022. |
08 / 2021 | CryptoGRU is accepted by EMNLP 2021. |
05 / 2021 | HEMET is accepted by ICML 2021. |
05 / 2021 | Qian received a Luddy Outstanding Research Award. |
09 / 2020 | Three papers were accepted by NeurIPS 2020. |
![]() |
![]() |
![]() |
Jiaqi Xue | Ardhi Yudha (collaboration) | Mansour Al Ghanim (collaboration) |
![]() |
||
Nicolas Gonzalez | ||
NeurIPS: Reviewer
ICML:Reviewer
ICLR: Reviewer
AAAI: Senior Program Committee
CVPR: Reviewer
ECCV: Reviewer
2022 Fall: CAP 5106 Computer Architecture
2023 Spring: CAP 6614 Current Topics in Machine Learning
|
TrojViT: Trojan Insertion in Vision Transformers Directly transplanting CNN-specific backdoor attacks to ViTs yields only a low clean data accuracy and a low attack success rate. In this paper, we propose a stealth and practical ViT-specific backdoor attack TrojViT. Rather than an area-wise trigger used by CNN-specific backdoor attacks, TrojViT generates a patch-wise trigger designed to build a Trojan composed of some vulnerable bits on the parameters of a ViT stored in DRAM memory through patch salience ranking and attention-target loss. |
|
Audit and Improve Robustness of Private Neural Networks on Encrypted Data Performing neural network inference on encrypted data without decryption is one popular method to enable privacy-preserving neural networks (PNet) as a service. Compared with regular neural networks deployed for machine-learning-as-a-service, PNet requires additional encoding, e.g., quantized-precision numbers, and polynomial activation. Encrypted input also introduces novel challenges such as adversarial robustness and security. To the best of our knowledge, we are the first to study questions including (i) Whether PNet is more robust against adversarial inputs than regular neural networks? (ii) How to design a robust PNet given the encrypted input without decryption? |
|
Lite-MDETR: A Lightweight Multi-Modal Detector We present a Lightweight modulated detector, Lite-MDETR, to facilitate efficient end-to-end multi-modal understanding on mobile devices. The key primitive is that Dictionary-Lookup-Transformormations (DLT) is proposed to replace Linear Transformation (LT) in multi-modal detectors where each weight in Linear Transformation (LT) is approximately factorized into a smaller dictionary, index, and coefficient. |
|
DictFormer: Tiny Transformer with Shared Dictionary We introduce DictFormer with efficient shared dictionary to provide a compact, fast, and accurate transformer model. DictFormer significantly reduces the redundancy in the transformer's parameters by replacing the prior transformer's parameters with compact, shared dictionary, few unshared coefficients and indices. Also, DictFormer enables faster computations since expensive weights multiplications are converted into cheap shared look-ups on dictionary and few linear projections. |
|
SAFENET: A SECURE, ACCURATE AND FAST NEURAL NETWORK INFERENCE A cryptographic neural network inference service is an efficient way to allow two parties to execute neural network inference without revealing either party's data or model. Nevertheless, existing cryptographic neural network inference services suffer from enormous running latency; in particular, the latency of communication-expensive cryptographic activation function is 3 orders of magnitude higher than plaintextdomain activation function. And activations are the necessary components of the modern neural networks. Therefore, slow cryptographic activation has become the primary obstacle of efficient cryptographic inference. In this paper, we propose a new technique, called SAFENet, to enable a Secure, Accurate and Fast nEural Network inference service. To speedup secure inference and guarantee inference accuracy, SAFENet includes channel-wise activation approximation with multiple-degree options. |
|
HEMET: A Homomorphic-Encryption-Friendly Privacy-Preserving Mobile Neural Network Architecture Recently Homomorphic Encryption (HE) is used to implement Privacy-Preserving Neural Networks (PPNNs) that perform inferences directly on encrypted data without decryption. Prior PPNNs adopt mobile network architectures such as SqueezeNet for smaller computing overhead, but we find naïvely using mobile network architectures for a PPNN does not necessarily achieve shorter inference latency. Despite having less parameters, a mobile network architecture typically introduces more layers and increases the HE multiplicative depth of a PPNN, thereby prolonging its inference latency. In this paper, we propose a HE-friendly privacy-preserving Mobile neural nETwork architecture, HEMET. |
|
AutoPrivacy: Automated Layer-wise Parameter Selection for Secure Neural Network Inference In this paper, for fast and accurate secure neural network inference, we propose an automated layer-wise parameter selector, AutoPrivacy, that leverages deep reinforcement learning to automatically determine a set of HE parameters for each linear layer in a HPPNN. The learning-based HE parameter selection policy outperforms conventional rule-based HE parameter selection policy. |
|
Falcon: Fast Spectral Inference on Encrypted Data In this paper, we propose a fast, frequency-domain deep neural network called Falcon, for fast inferences on encrypted data. Falcon includes a fast Homomorphic Discrete Fourier Transform (HDFT) using block-circulant matrices to homomorphically support spectral operations. We also propose several efficient methods to reduce inference latency, including Homomorphic Spectral Convolution and Homomorphic Spectral Fully Connected operations by combining the batched HE and block-circulant matrices. |
|
Glyph: Fast and accurately training deep neural networks on encrypted data In this paper, we propose, Glyph, a FHE-based technique to fast and accurately train DNNs on encrypted data by switching between TFHE (Fast Fully Homomorphic Encryption over the Torus) and BGV cryptosystems. Glyph uses logicoperation-friendly TFHE to implement nonlinear activations, while adopts vectorialarithmetic-friendly BGV to perform multiply-accumulations (MACs). Glyph further applies transfer learning on DNN training to improve test accuracy and reduce the number of MACs between ciphertext and ciphertext in convolutional layers. |
|
AUTOQ: AUTOMATED KERNEL-WISE NEURAL NETWORK QUANTIZATION It is difficult for even deep reinforcement learning (DRL) Deep Deterministic Policy Gradient (DDPG)-based agents to find a kernel-wise QBN configuration that can achieve reasonable inference accuracy. In this paper, we propose a hierarchical-DRL-based kernel-wise network quantIzation technique, AutoQ, to automatically search a QBN for each weight kernel, and choose another QBN for each activation layer. |
|
Helix: Algorithm/Architecture Co-design for Accelerating Nanopore Genome Base-calling we propose a novel algorithm/architecture codesigned PIM, Helix, to power-efficiently and accurately accelerate nanopore base-calling. From algorithm perspective, we present systematic error aware training to minimize the number of systematic errors in a quantized base-caller. From architecture perspective, we propose a low-power SOT-MRAM-based ADC array to process analog-to-digital conversion operations and improve power efficiency of prior DNN PIMs. Moreover, we revised a traditional NVM-based dot-product engine to accelerate CTC decoding operations, and create a SOT-MRAM binary comparator array to process read voting. |
|
SHE: A Fast and Accurate Deep Neural Network for Encrypted Data we propose a Shift-accumulation-based LHE-enabled deep neural network (SHE) for fast and accurate inferences on encrypted data. We use the binary operation-friendly Leveled Fast Homomorphic Encryption over Torus (LTFHE) encryption scheme to implement ReLU activations and max poolings. We also adopt the logarithmic quantization to accelerate inferences by replacing expensive LTFHE multiplications with cheap LTFHE shifts. We propose a mixed bitwidth accumulator to accelerate accumulations. Since the LTFHE ReLU activations, max poolings, shifts and accumulations have small multiplicative depth overhead, SHE can implement much deeper network architectures with more convolutional and activation layers. |
|
3dict: a reliable and qos capable mobile process-in-memory architecture for lookup-based cnns in 3d xpoint rerams In this paper, we propose a 3D XPoint ReRAM-based process-in-memory architecture, 3DICT, to provide various test accuracies to applications with different priorities by lookup-based CNN tests that dynamically exploit the trade-off between test accuracy and latency. |
|
Numerical Optimizations for Weighted Low-rank Estimation on Language Model |
|
coxHE: A software-hardware co-design framework for FPGA acceleration of homomorphic computation |
|
MATCHA: A Fast and Energy-Efficient Accelerator for Fully Homomorphic Encryption over the Torus |
|
Language model compression with weighted low-rank factorization |
|
CryptoGRU: Low Latency Privacy-Preserving Text Analysis With GRU |
|
Automatic Mixed-Precision Quantization Search of BERT |
|
LightBulb: A Photonic-Nonvolatile-Memory-based Accelerator for Binarized Convolutional Neural Networks |
|
MindReading: An Ultra-Low-Power Photonic Accelerator for EEG-based Human Intention Recognition |
|
Holylight: A nanophotonic accelerator for deep learning in data centers |
|
BRAWL: A Spintronics-Based Portable Basecalling-in-Memory Architecture for Nanopore Genome Sequencing |
|
Runtime and reconfiguration dual-aware placement for SRAM-NVM hybrid FPGAs |