hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Information-gathering in latent bandits
Halmstad University, School of Information Technology.ORCID iD: 0000-0002-7453-9186
Halmstad University, School of Information Technology.ORCID iD: 0000-0002-7796-5201
2023 (English)In: Knowledge-Based Systems, ISSN 0950-7051, E-ISSN 1872-7409, Vol. 260, article id 110099Article in journal (Refereed) Published
Abstract [en]

In the latent bandit problem, the learner has access to reward distributions and – for the non-stationary variant – transition models of the environment. The reward distributions are conditioned on the arm and unknown latent states. The goal is to use the reward history to identify the latent state, allowing for the optimal choice of arms in the future. The latent bandit setting lends itself to many practical applications, such as recommender and decision support systems, where rich data allows the offline estimation of environment models with online learning remaining a critical component. Previous solutions in this setting always choose the highest reward arm according to the agent’s beliefs about the state, not explicitly considering the value of information-gathering arms. Such information-gathering arms do not necessarily provide the highest reward, thus may never be chosen by an agent that chooses the highest reward arms at all times.

In this paper, we present a method for information-gathering in latent bandits. Given particular reward structures and transition matrices, we show that choosing the best arm given the agent’s beliefs about the states incurs higher regret. Furthermore, we show that by choosing arms carefully, we obtain an improved estimation of the state distribution, and thus lower the cumulative regret through better arm choices in the future. Through theoretical analysis we show that the proposed method retains the sub-linear regret rate of previous methods while having much better problem dependent constants. We evaluate our method on both synthetic and real-world data sets, showing significant improvement in regret over state-of-the-art methods. © 2022 The Author(s). 

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2023. Vol. 260, article id 110099
Keywords [en]
Latent bandits, Information gathering, Non-stationary, Information directed sampling
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:hh:diva-49833DOI: 10.1016/j.knosys.2022.110099Scopus ID: 2-s2.0-85143522327OAI: oai:DiVA.org:hh-49833DiVA, id: diva2:1727614
Funder
Vinnova, 2017-04617Available from: 2023-01-16 Created: 2023-01-16 Last updated: 2023-11-29Bibliographically approved
In thesis
1. Mobile Health Interventions through Reinforcement Learning
Open this publication in new window or tab >>Mobile Health Interventions through Reinforcement Learning
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis presents work conducted in the domain of sequential decision-making in general and Bandit problems in particular, tackling challenges from a practical and theoretical perspective, framed in the contexts of mobile Health. The early stages of this work have been conducted in the context of the project ``improving Medication Adherence through Person-Centred Care and Adaptive Interventions'' (iMedA) which aims to provide personalized adaptive interventions to hypertensive patients, supporting them in managing their medication regimen. The focus lies on inadequate medication adherence (MA), a pervasive issue where patients do not take their medication as instructed by their physician. The selection of individuals for intervention through secondary database analysis on Electronic Health Records (EHRs) was a key challenge and is addressed through in-depth analysis of common adherence measures, development of prediction models for MA, and discussions on limitations of such approaches for analyzing MA. Providing personalized adaptive interventions is framed in several bandit settings and addresses the challenge of delivering relevant interventions in environments where contextual information is unreliable and full of noise. Furthermore, the need for good initial policies is explored and improved in the latent-bandits setting, utilizing prior collected data to optimal selection the best intervention at every decision point. As the final concluding work, this thesis elaborates on the need for privacy and explores different privatization techniques in the form of noise-additive strategies using a realistic recommendation scenario.         

The contributions of the thesis can be summarised as follows: (1) Highlighting the issues encountered in measuring MA through secondary database analysis and providing recommendations to address these issues, (2) Investigating machine learning models developed using EHRs for MA prediction and extraction of common refilling patterns through EHRs, (3) formal problem definition for a novel contextual bandit setting with context uncertainty commonly encountered in Mobile Health and development of an algorithm designed for such environments. (4) Algorithmic improvements, equipping the agent with information-gathering capabilities for active action selection in the latent bandit setting, and (5) exploring important privacy aspects using a realistic recommender scenario.   

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2023. p. 56
Series
Halmstad University Dissertations ; 102
National Category
Computer Sciences
Research subject
Health Innovation, IDC
Identifiers
urn:nbn:se:hh:diva-52139 (URN)978-91-89587-17-5 (ISBN)978-91-89587-16-8 (ISBN)
Public defence
2023-12-15, S1002, Kristian IV:s väg 3, Halmstad, 13:00 (English)
Opponent
Supervisors
Available from: 2023-11-29 Created: 2023-11-29 Last updated: 2024-01-03Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Galozy, AlexanderNowaczyk, Sławomir

Search in DiVA

By author/editor
Galozy, AlexanderNowaczyk, Sławomir
By organisation
School of Information Technology
In the same journal
Knowledge-Based Systems
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 106 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf