[Memcacheddotnet-devel] RE: Memcached 1.0

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Yes, I noticed the exceptions as well.  I've broadly narrowed (.?) it down
to the TCP connections timing out.  I would really like to squash that if I
could.

As time frees up here are my three development tasks that I would like to
tackle:

-Clean up the API (the longer it goes on like this the more of a pain it
will be to change later)

-Fix those exceptions that occur when the sockets time out

-Make an App.config section handler so we can configure all of the options
for the client API in the App or Web.config file (I just thought of this one
today).

And wow that's quite a setup you have there.  Ours isn't quite as big as
that right now, but there is a good possibility that we'll need to scale out
that big so we're planning ahead :-).

As for an explanation on how the client handles failover:

The socket pool keeps persistent connections to each of the memcached
servers.  When the client loses a connection to one it tries to reconnect
after the reconnect timeout.  If it doesn't connect, it doubles the timeout
and tries again.  It keeps doubling the timeout everytime it fails to
connect.  If it ever reconnects the timeout length is reset.  This works
really well because if there is a slight network "hiccup" you'll reconnect
very quickly, but if the node goes down for a very long time the client
quickly ignores it.  The failover code could use a little more work and it's
all related to those exceptions.  There isn't really any "redundancy", only
failover.  But the failover process works pretty well and it's how the other
clients I looked at handle it.  

Are you familiar with how memcached works?  It's really just a hashtable of
hashtables.  The first level of hashing decides "what server does this go
on?" then the next level of hashing happens on the server and it says "where
does this item go in my hashtable?"  Since all the clients use the same
hashing algorithm they all end up coming up with the same values.  That's
one thing I had to explain to some of the devs on my team.  It's only a
cache, it's not a persistent store.  If one of the nodes goes down, there
will be a small hiccup in our web application because most of the stuff will
have to be re-cached.

Unfortunately I haven't had time to tune and performance test our cache yet,
so we're just using the default values right now for our limited beta test
of our system.  We're a pretty small development shop and I'm pretty pressed
for time just trying to add features and squash bugs in the application.
Unfortunately the users can't see how cool memcached is, they only see how
cool the interface for our application looks and that gets priority.

I'm copying this email to the developer mailing list.  I'll IM you my AIM
screenname.

Have a great weekend and drink up!

-Tim

This sounds great. It's 'alpha' but it's working great for us so far. We've
been stable with our patched client for a few days now running on 25+ web
servers with 10 memcache boxes with just a small handful of unhandled
exceptions.  

I've recently reimaged out machines from 2.4 kernel to an smp 2.6 kernel.
Load on each box went from ~85% to less than 5% !  This was extremely
encouraging and paves the way to more extensive use of memcache throughout
our site. 

Go ahead and post the bugs and emails--Anything that gets more people to
download the library and get more eyes on the code would be great.

I'd like to gain a bit more insight into some aspects of the client.. One
question I get frequently is about redundancy. From looking at the code, it
seems like if a node goes down then it's marked as down and the next server
on the list is used instead. Since every client performs the same check and
goes on to the same next server, there's barely a performance hit. Then the
server is checked periodically (at increasing time intervals?) until it
comes back up.  Is this about right? Can you describe this process any
further? Do you have some ideas of improving it? 

What do you think is a good value to use for max idle connections? I
currently have roughly 2000 simultaneous connections to each memcache node
and i'm pretty sure that most of these are idle sockets in client
pools--though i'm not certain how many are actually used. It'd be cool to be
able to see various client stats to get more transparency into things. I
think the perl client does something like it already. I could take a look at
this closer next week and get some more concrete ideas together. 

This email is getting long... are you on aim or any im? i'm **** on aim or
msn: ****

gonna go out drinking.. ttyl! :)

Max

Max,

I made the changes to the library and re-uploaded the binaries.  I also have
two sets of project/solutions, one for VS.NET 2003 and .NET 1.1 and one for
VS.NET 2005 and .NET 2.0.  They both run off the exact same source files
(although I think I may make the 2.0 version work with the native GZIP stuff
new in the 2.0 framework at some point).

Keep in mind though that technically the project is still in alpha.  Mainly
because the API isn't very clean.  I would like to clean it up to adhere
more to the .NET coding standards.  It would be pretty easy to change any
code (mostly just going from lowercase stuff to uppercase and small stuff
like that), but I'll make sure to put any changes in the changelog.

-Tim

p.s.  Would you mind if I copied this message to the development mailing
list for the project so that it looks like there is some activity?  It might
help us out if some other people have an indication that something is going
on with this project.

I'll see if i can send diff's later on when i have a chance but the first
bug I changed the sleeps to this:
Thread.Sleep((int)interval);
and
Thread.Sleep((int)interval * 10 ) in the catch

and the while loop changed to this 

while ((count = gzi.Read(tmp, 0, 2048)) != 0)
please confirm that -1 really is never returned though (or why it would be)
but we've had no probs so far.

Regarding 2.0, we are using it for some projects but not for those using
this client so it'd be great if you can maintain two branches if you start
using 2.0-only stuff.

We're using this pretty heavily at this point and I'll be sure to let you
know as soon as we find new issues.

Max

:

Hi Max,

Hey thanks for the extra set of eyes.  Yes, the nanoseconds stuff caused a
lot of small little errors when I was porting it from Java (which uses
milliseconds).  

If you made any changes, would it be possible to send me a diff of your
project?  I could incorporate them (and give you credit) and repost the
project.

On another note, are you using .NET 1.1 or 2.0?  One thing I would like to
do is move the project to .NET 2.0 because the library performs so much
faster (I think serialization in .NET 2.0 has been much improved).  If not,
I'll make sure to keep both project files around and build it for both
frameworks.

-Tim

Hi Tim, great work with the memcached c# client -- been using it over here
in production with a lot of good results. I've had to modify the client so
far in a couple ways to make it work and wanted to let you know so you could
consider fixing it. 

In SockIOPool.Maintain(): when you pass a timespan in into thread.sleep you
should instead be just passing in that number of milliseconds (5000).  A
timespan constructor takes ticks in nanoseconds. This causes 100% cpu time
since it's polling way faster than it should.

In Memcacheclient.LoadItems you have a while loop that reads until gzi.Read
returns -1. I'm not sure if it ever returns -1 but looking at the zip code,
it does return 0. This again was causing 100% cpu time as it never left this
tight loop.

Please keep up the awesome work you're doing and I will let you know if I
find other issues ( I suspected the maintenance thread has a bug somewhere
causing exceptions but haven't nailed it yet).

Max