We tested the following hyperparameter space:

Parameter	From	To	Steps
Loss Function	BPR-MAX	TOP1-MAX	-
Final Activation Function	ELU-0.5	Linear	-
Learning Rate	0.1 0.5	0.01 0.1	10 5
Momentum	0.00	0.90	0.10
Drop-Out	0.00	0.90	0.10
Constrained Embedding	True	False	-

Dataset	Loss Function	Final Activation Function	Learning Rate	Momentum	Drop-Out	Constrained Embedding
RSC15/4	BPR-MAX	Linear	0.09	0.0	0.1	True
RSC15/64	BPR-MAX	Linear	0.05	0.3	0.1	False
DIGINETICA	BPR-MAX	ELU-0.5	0.02	0.5	0.1	True
DIGINETICA (STAMP)	BPR-MAX	ELU-0.5	0.05	0.1	0.2	True

We tested the following hyperparameter space:

Parameter	From	To	Steps
Number of Epochs	10	30	10
Decay Rate	0.0	0.9	10
Initial Learning Rate	0.001 0.0001	0.01 0.001	10 10

Dataset	Number of Epochs	Decay Rate	Initial Learning Rate
RSC15/4	20	0.2	0.0002
RSC15/64	30	0.4	0.0004
DIGINETICA	10	0.3	0.0007
DIGINETICA (STAMP)	30	0.9	0.0004

We tested the following hyperparameter space:

Parameter	From	To	Steps
Learning Rate	0.1 0.5	0.01 0.1	10 5

We tested the following hyperparameter space:

Parameter	From	To	Steps
Learning Rate	0.01 0.001	0.001 0.0001	10 5
Iterations	10	30	10
Negative Sampling	True	False	-

Dataset	Learning Rate	Iterations	Negative Sampling
RSC15/64	0.001	10	False

We tested the following hyperparameter space:

Parameter	From	To	Steps	Options
Steps	1	20	1	-
Weighting	-	-	-	Div, Linear, Quadratic, Log, Same

We tested the following hyperparameter space:

Dataset	Number of Neighbors	Sample Size	Similarity
RSC15/4	500	500	Jaccard
RSC15/64	500	1000	Cosine
DIGINETICA	50	500	Cosine
DIGINETICA (STAMP)	100	500	Cosine

We tested the following hyperparameter space:

Parameter	Options
Number of Neighbors	50, 100, 500, 1000, 1500
Sample Size	500, 1000, 2500, 5000, 10000
Weighting	Same, Div, Linear, Quadratic, Log
Weighting Score	Same, Div, Linear, Quadratic, Log
IDF Weighting	False, 1, 2, 5, 10

Dataset	Number of Neighbors	Sample Size	Weighting	Weighting Score	IDF_Weighting
RSC15/4	1000	1000	Log	Quadratic	5
RSC15/64	1000	5000	Log	Quadratic	2
DIGINETICA	500	5000	Quadratic	Div	10
DIGINETICA (STAMP)	500	10000	Quadratic	Quadratic	10

We tested the following hyperparameter space:

Parameter	Options
Expert	StdExpert, DirichletExpert
Max Considered Context Length	5,10,20,30,40,50,75
Number of Recent Candidates (Only for Adaptive Configuration)	5,10,20,30,40,50,75

Dataset	Expert	Max Considered Context Length	Number of Recent Candidates
RSC15/4, RSC15/64, DIGINETICA, DIGINETICA(STAMP)	StdExpert	50	1000

We tested the following hyperparameter space:

Parameter	From	To	Steps
Learning Rate	0.01 0.001	0.001 0.0001	10 10
L2 Penalty	0.0001 0.00001	0.00001 0.000001	10 10
Decay Rate	0.1	0.9	10
Decay Rate Step	3	7	2

Dataset	Learning Rate	L2 Penalty	Decay Rate	Decay Rate Step
RSC15/4
RSC15/64	0.008	0.0001	0.45	3
DIGINETICA	0.0002	0.00007	0.1	5
DIGINETICA (STAMP)