I wanted to update everybody on the status of http://www.py2exe.org. Things seem
to be in really good shape now, pending final DNS change propagation (I'm
not seeing any non-bot traffic on the old IP address, but it's possible
there will be some for the next day or two). All writes have been disabled
on the old IP to make sure we don't lose things. If you see further
problems, please let me know. The is a lot more detail below for the
curious, and so I can remember my thought processes later...
The problems we were having with the hosting provider were in fact caused
by py2exe.org. We were receiving about 30K hits/day (many from bots, more
on that later) and their rule of thumb is that performance problems arise
on shared hosting beyond about 10K hits/day. So the only option there was
to move to one of their
VPS<http://en.wikipedia.org/wiki/Virtual_private_server>hosts. I tried
that and their cheapest VPS could handle all but the largest
spikes pretty well. Even the large spikes only resulted in modest
slowdowns, not the crazy multi-minute response times we were seeing a few
weeks ago (we were actually repeatedly causing Apache workers to get killed
by watchdogs before the move to the VPS). The main problem with the VPS was
price. I've been paying about $10-15 per month for shared hosting for
py2exe, but the move to VPS would up that to close to $30 per month. They
were gracious enough to offer a $10 per month discount to help me and
py2exe out, but $20 per month is still more than I'd like to pay and it
would increase quickly if we had to move up from their smallest VPS. So I
started looking for alternatives.
I looked at a number of competing VPS providers, but those that were
significantly cheaper had restrictions that didn't fit well with py2exe.org.
On a lark I decided to see how well I could do with AWS/EC2, thinking it
would be far too expensive but at least I'd have full control. I think I
have it down to the point where it will be about $5 per month and we have a
lot of headroom for growth before that goes up! The downside is that it
took a lot of planning and experimentation, but that's pretty much done now.
The trick was to optimize py2exe.org enough that it could run on a
make it robust enough that it could run as a Spot
Instance <http://aws.amazon.com/ec2/spot-instances/>. Most of the time it's
costing 0.4 cents per hour (yes, less than a penny per hour) for the
machine instance and another 50 cents per month for storage and bandwidth.
The instance price will fluctuate, but over the last week it hasn't changed
I started with the robustness. I've set a persistent bid above the regular
fixed price. The market prices do go above that occasionally, but it's rare
(every few months) and short lived (it appears to be minutes). When it
happens, py2exe.org will be killed with no notice. As soon as prices drop
to normal again, py2exe.org will automatically relaunch with everything on
the site intact. That involved a lot of learning and tweaking, but after
getting the details nailed down, I've simulated this several times and it
has worked great. I'll also do regular backups outside of AWS just to be
safe, just as I always have.
The other part was the optimization. I can't recommend
uWSGI <http://uwsgi-docs.readthedocs.org/en/latest/> enough - they are much
easier to work with than Apache, other than getting through the learning
curve, consume almost no memory and are lightning fast. I switched most
traffic over about 12 hours ago, and under loads that caused the VPS to
sustain 80-90% CPU usage, we're now sustaining 3-5%. But there's one more
potential issue. We get a lot of traffic from bots trying to place spam on
the wiki. Usually those bots spread out their hits to avoid detection, but
now and then they go crazy. If that causes a CPU spike that is sustained
for more then a few seconds, then AWS will start to throttle CPU
availability (this is why t1.micro instances are so cheap, this doesn't
happen on more expensive instance types). We've had rate limiting by IP for
quite a while, but some bots are now distributed across several IPs. We
also have text capthas (e.g., What does [0, 1, 2, 3][-1] evaluate to in
Python?). Changing them used to eliminate spam for several months, but when
I change them now it takes the bots less than a month to get an answer from
a human. And even when the questions stop them, they keep trying anyway
which keeps the load high.
So I've taken another step up in fighting the bots. All HTTP POSTs, PUTs,
and DELETEs now cause py2exe.org to check Project Honey
allowing the write request to pass through to MoinMoin (wsgi makes it super
easy to wrap this around MoinMoin). For real users, this means you have to
wait for an additional DNS lookup when saving edits. There is caching going
on so we don't hammer Project Honey Pot. If your IP is above a certain
level of suspicion then you get a very fast "403 Forbidden" back. If you
get a 403 on a write request, then subsequent read requests from the same
IP also get 403s until you fall out of the cache - the idea is to
discourage the bots from hanging around and to eliminate their load on us
(processing a 403 for an IP in the cache takes < 5 ms). If you don't make
any write requests, then your IP will never get a 403 on a read request -
hopefully this all but eliminates issues with false positives. A login is a
write request, so that's all you have to do to see if you'll be blocked
(bots are trying to login several times per minute). This all went live
about 10 hours ago and so far I'm seeing a couple of sequences of 403s per
hour, but far more 200s. The bots I've seen so far are giving up very
quickly after a couple of 403s. I'll likely add a Honey Pot to
py2exe.orgto help Project Honey Pot with their efforts.
Finally, I cleaned up the user accounts. Any user who had never made an
edit that survived de-spam actions was removed. More than 3000 accounts
were deleted, leaving ~65. Before doing the final delete I made sure that
all the usual suspects (me, Thomas, Werner, Grant, Aahz, ...) made the cut.
I scanned the deletes and every one I saw had an obviously fake/disposable
Please let me know if you see any further issues.
On Sat, Aug 3, 2013 at 11:20 AM, Jimmy Retzlaff <jimmy@...> wrote:
> Yep, py2exe.org has been having intermittent problems for about a week
> now (sometimes normal, sometimes it takes several minutes to load, and
> sometimes it errors out). While py2exe.org is somewhat dynamic, I have
> another site on the same account that is merely ~300 bytes of HTML plus an
> image and that site is having the exact same issues. The hosting provider
> moved everything to another machine about 10 days ago which may or may not
> be related. I've been going back and forth with them all week and they just
> moved everything to yet another machine last night, but I'm still seeing
> problems today. Things had been rock solid for years up until this week.
> I'll stay after them...
> On Fri, Aug 2, 2013 at 4:01 PM, David Goldsmith <d.l.goldsmith@...:
>> Yup, thanks, I figured out I needed to include the leading http://www.
>> On Fri, Aug 2, 2013 at 2:45 PM, Werner F. Bruhin <werner.bruhin@...:
>>> Here it is a bit slow but it shows.
>>> Get your SQL database under version control now!
>>> Version control is standard for application code, but databases havent
>>> caught up. So what steps can you take to put your SQL databases under
>>> version control? Why should you start doing it? Read more to find out.
>>> Py2exe-users mailing list
>> From "A Letter From The Future" in "Peak Everything" by Richard Heinberg:
>> "By the time I was an older teenager, a certain...attitude was developing
>> among the young people...a feeling of utter contempt for anyone over a
>> certain age--maybe 30 or 40. The adults had consumed so many resources,
>> and now there were none left for their own children...when those adults
>> were younger, they [were] just doing what everybody else was doing...they
>> figured it was normal to cut down ancient forests for...phone books, pump
>> every last gallon of oil to power their SUV's...[but] for...my generation
>> all that was just a dim memory...We [grew up] living in darkness, with
>> shortages of food and water, with riots in the streets, with people begging
>> on street corners...for us, the adults were the enemy."
>> Want to *really* understand what's *really* going on? Read "Peak
>> Get your SQL database under version control now!
>> Version control is standard for application code, but databases havent
>> caught up. So what steps can you take to put your SQL databases under
>> version control? Why should you start doing it? Read more to find out.
>> Py2exe-users mailing list