Just to be more explicit, I get different best estimators on different runs of the same data set. I vary C from 2^-5, 2^-3, ... , 2^15 as suggested in the libSVM guide. Though I observe that the best accuracy between all runs is for C = 2^5, why should the value of C vary across runs?

One immediate point comes across the mind would be the highlighted line. Is it that it creates a different train,test for each run and hence the difference? I may be wrong but I'm just thinking loud here.

On Wed, May 18, 2011 at 9:19 AM, Denzil Correa wrote:
Is it okay to get different values of C on different grid searches?

X = sparse_feats
Y = target_labels

folds = StratifiedKFold(Y, cross_fold, indices=True)
train, test = iter(StratifiedKFold(Y, 2, indices = True)).next()

# Generate grid search values for C, gamma
C_val = 2. ** np.arange(C_start, C_end + C_step, C_step)
gamma_val = 2. ** np.arange(gamma_start, gamma_end + gamma_step, gamma_step)

print C_val
print gamma_val

grid_clf = svm.sparse.LinearSVC()

print grid_clf

linear_SVC_params = {'C': C_val}

grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs = 10, iid = False, score_func = f1_score)

grid_search.fit(X[train], Y[train], cv=StratifiedKFold(Y[train], 10))
y_true, y_pred = Y[test], grid_search.predict(X[test])

print grid_search.best_estimator
print "Best score: %0.3f" % grid_search.best_score

print "Best parameters set:"
best_parameters = grid_search.best_estimator._get_params()
for param_name in sorted(linear_SVC_params.keys()):
print "\t%s: %r" % (param_name, best_parameters[param_name])

clf = svm.sparse.LinearSVC(C = best_parameters['C'])

I get a different C on each grid search. Is this normal?

--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/

--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/