Hye Won Chung, KAIST

Research

Overview

The goal of my research is to provide a theoretical and algorithmic framework for information science that can lead to efficient strategies for assessing, gathering, extracting, and exploiting information. In the era of data deluge, we want to fully utilize the large volumes and richness of data sets to efficiently infer the real-world phenomena behind the data. Information-theoretic concepts and tools are useful in data science, especially to establish fundamental limits and to explore trade-offs in extracting information from data sets. To deal with new challenges originated from practical concerns in engineering information processors for big data, we also need new techniques and concepts beyond the classical information-theoretic solutions.

My research focus is on developing a theoretical framework for data science that copes with practical concerns such as timeliness in decision making, efficient usage of limited sensing resources, and computational efficiency in data processing. More specially, I study questions such as: How can we design sensing strategies to acquire the most relevant observations for estimating an unknown target variable at the lowest cost? How can we quantify value of information and develop strategies to extract the most valuable information given limited sensing resources? How can we design efficient information-recovery procedures from large amounts of noisy observations? How can we design distributed querying over crowd of unknown reliabilities to efficiently collect useful observations? I develop algorithms for these data acquisition and information recovery problems and provide performance guarantees for these algorithms by using tools from probability theory, information theory, and stochastic analysis.

Selected recent papers

Algorithms and Theory for Data Science and Machine Learning

Rethinking Self-Distillation: Label Averaging and Enhanced Soft Label Refinement with Partial Labels, ICLR 2025
Exact Matching in Correlated Networks with Node Attributes for Improved Community Recovery, IEEE Trans. Information Theory 2025
Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization, NeurIPS 2023
Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing, ICML 2023
Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation, ICML 2023
A Worker-Task Specialization Model for Crowdsourcing: Efficient Inference and Fundamental Limits, IEEE Trans. Information Theory 2024
Binary Classification with XOR Queries: Fundamental Limits and An Efficient Algorithm, IEEE Trans. Information Theory 2021
Detection of Signal in the Spiked Rectangular Models, ICML 2021

Efficient Deep Learning, Robust and Trustworthy AI