I've encountered problems with Level2 not accurately scoring bags of different lengths. I'm including a patch which solved the problem for me.
The problem presented itself when Level2.score(s, t) was called with size(s) < size(t). For example:
s = {'Frances', 'Fyfe'}
t = {'Mary', 'Frances', 'Fyfe'}
Level2.score(s,t) -> 1.0
level2.score(t,s) -> 0.83
The problem is, I believe, that the algorithm always iterates over s. What should happen, in my opinion, is that the algorithm should iterate over the larger of the two sets.
I'm including a patch which does just that. I hope this is helpful.
patch to Level2