NOTES / IMPORTANT INSIGHTS

1] Exploding gradient(Solved by gradient clipping)
2] Dying ReLu : No learning if the activation is 0 (Solved by parametric relu)
3] Mean and variance of activations is not 0 and 1. (Partially solved by subtracting around 0.5 from activation. Better explained in fastai videos)

5-23-2020

import csv
import numpy as np
import os.path
import pandas
import time
import xgboost as xgb
import sys
if sys.version_info[0] >= 3:
    from urllib.request import urlretrieve
else:
    from urllib import urlretrieve

data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz"
dmatrix_train_filename = "higgs_train.dmatrix"
dmatrix_test_filename = "higgs_test.dmatrix"
csv_filename = "HIGGS.csv.gz"
train_rows = 10500000
test_rows = 500000
num_round = 1000

plot = True

# return xgboost dmatrix
def load_higgs():
    if os.path.isfile(dmatrix_train_filename) 
      and os.path.isfile(dmatrix_test_filename):           
        dtrain = xgb.DMatrix(dmatrix_train_filename)
        dtest = xgb.DMatrix(dmatrix_test_filename)
        if dtrain.num_row() == train_rows and dtest.num_row() == test_rows:
            print("Loading cached dmatrix...")
            return dtrain, dtest

    if not os.path.isfile(csv_filename):
        print("Downloading higgs file...")
        urlretrieve(data_url, csv_filename)

    df_higgs_train = pandas.read_csv(csv_filename, dtype=np.float32, 
                                     nrows=train_rows, header=None)
    dtrain = xgb.DMatrix(df_higgs_train.ix[:, 1:29], df_higgs_train[0])
    dtrain.save_binary(dmatrix_train_filename)
    df_higgs_test = pandas.read_csv(csv_filename, dtype=np.float32, 
                                    skiprows=train_rows, nrows=test_rows, 
                                    header=None)
    dtest = xgb.DMatrix(df_higgs_test.ix[:, 1:29], df_higgs_test[0])
    dtest.save_binary(dmatrix_test_filename)

    return dtrain, dtest


dtrain, dtest = load_higgs()
param = {}
param['objective'] = 'binary:logitraw'
param['eval_metric'] = 'error'
param['tree_method'] = 'gpu_hist'
param['silent'] = 1

print("Training with GPU ...")
tmp = time.time()
gpu_res = {}
xgb.train(param, dtrain, num_round, evals=[(dtest, "test")], 
          evals_result=gpu_res)
gpu_time = time.time() - tmp
print("GPU Training Time: %s seconds" % (str(gpu_time)))

print("Training with CPU ...")
param['tree_method'] = 'hist'
tmp = time.time()
cpu_res = {}
xgb.train(param, dtrain, num_round, evals=[(dtest, "test")], 
          evals_result=cpu_res)
cpu_time = time.time() - tmp
print("CPU Training Time: %s seconds" % (str(cpu_time)))

if plot:
    import matplotlib.pyplot as plt
    min_error = min(min(gpu_res["test"][param['eval_metric']]), 
                    min(cpu_res["test"][param['eval_metric']]))
    gpu_iteration_time = 
        [x / (num_round * 1.0) * gpu_time for x in range(0, num_round)]
    cpu_iteration_time = 
        [x / (num_round * 1.0) * cpu_time for x in range(0, num_round)]
    plt.plot(gpu_iteration_time, gpu_res['test'][param['eval_metric']], 
             label='Tesla P100')
    plt.plot(cpu_iteration_time, cpu_res['test'][param['eval_metric']], 
             label='2x Haswell E5-2698 v3 (32 cores)')
    plt.legend()
    plt.xlabel('Time (s)')
    plt.ylabel('Test error')
    plt.axhline(y=min_error, color='r', linestyle='dashed')
    plt.margins(x=0)
    plt.ylim((0.23, 0.35))
    plt.show()

5-22-2020

5-16-2020

4-22-2020

4-4-2020

3-21-2020

Basic recap : k-NN, Naive Bayes, SVM, Decision Forests, Data Mining, Clustering, and, Classification

3-20-2020

3-17-2020

Flow GAN’s

Combining Maximum Likelihood and Adversarial Learning Flow-GAN

variational Autoencoders

-Tutorial on Variational Autoencoders

Read up on BERT

Bidirectional Encoder Representations from Transformers - NLP Pre-training.

xGBoost

Xtreme Gradient Boosting - has given some of the best results recently on problems involving structured data.

Gradient Boosting

Miscellany

Reinforcement Learning

The agent learns from the environment and recives reward/penalties as the result of it’s actions. It’s objective is to devise policy function in order to maximize cumulative reward. It’s diffrent from supervised and unsupervised learning. It is based on Markov Decision Processes. But model-free paradigms such as Q-Learning perform better, especially on complex tasks.

Transfer Learning

Use a model trained on one problem to do predictive modelling on another problem. For instance, say you have a image classification task. You can use the VGG Model shell, conveniently provided by Oxford at their Vector Graphics Group Website. You definitely would need to change the last few layers based on your task, and other changes would require hypotheses testing / domain knowledge.

Transfer learning really improves efficiency in the case where we need to perform supervised learning tasks, and we require a significantly large, labeled dataset for tackling the problem successfully.

Visualization

Regularization Techniques

Regularization is used for reducing overfitting.

Probabilistic Graphical Models

Stochastic Gradient Descent

CNN’s

Convergence

Expectation Maximization