publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- Measuring the Contribution of Fine-Tuning to Individual Responses of LLMsInternational Conference on Machine Learning (ICML), 2025
- Reinforcement Learning for Quantum Control under Physical ConstraintsInternational Conference on Machine Learning (ICML), 2025
2024
- HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia editsAssociation for Computational Linguistics (ACL), 2024
- Select to Perfect: Imitating desired behavior from large multi-agent dataIn International Conference on Learning Representations (ICLR), 2024
- Illusory Attacks: Information-theoretic detectability matters in adversarial attacksIn International Conference on Learning Representations (ICLR), 2024
- Rethinking out-of-distribution detection for reinforcement learning: Advancing methods for evaluation and detectionarXiv preprint arXiv:2404.07099, 2024
2023
- Extracting Reward Functions from Diffusion ModelsIn Advances in Neural Information Processing Systems (NeurIPS), 2023
2022
- Learn what matters: cross-domain imitation learning with task-relevant embeddingsIn Advances in Neural Information Processing Systems (NeurIPS), 2022
- Learning Altruistic Behaviours in Reinforcement Learning without External RewardsInternational Conference on Learning Representations (ICLR), 2022