Exam Preparation

Exam Questions Assumption

1 logistic regression question 1 k-means clustering question 1 hierarchichal clustering question 1 silhouette index calculation question.

Important things:

for logistic regression:

  1. how to calculate and interpret the odds

  2. how to compare and interpret the probability

for clustering:

ways of calculating the distances:

  1. squared euclidiant

  2. manhattan

  3. euclidian

he say in the exam he will ask either 1 or 2.

ways of calcualting linkage:

  1. single (take the minimum distance)

  2. complete (take the maximum distance)

  3. average (take the average distance)

  4. centroids (take the distance of centers)

I guess it might be either 1 or 2, as others somehow computational heavy.

process of calculating the silhouette:

  1. calculate a1 (in-cluster distances then average)

  2. calculate b1 (point to cluster distances then average then pick the minimum)

  3. calculate s1 (b1-a1)/max(b1,a1)

  4. calculate s_k (mean of all s_i for that cluster)

  5. calculate the index (mean of all s_k for all clusters)

s-value close to 1 is better.