#53 Denyhosts default sync leaves >90% unprotected; Serverdesign

open
nobody
None
5
2013-12-15
2011-04-13
Anonymous
No

(Warning: long report)

First of all: THANKS for this very useful piece of software!

Having run denyhosts in cron mode for a couple of years now, I was happy and never bothered to check its inner workings. But just now I have installed a few new boxes with denyhosts in daemon mode (Debian package), which gives very nice /var/log/denyhosts output.

So I noticed that denyhosts will only receive max 50 new hosts from the central database during any sync run. I'd guess that's a hardcoded limit to prevent overloading the central server. The problem is that I get 50 new blocked hosts _every_ sync run, which made me think that I'm not receiving quite enough.

Indeed, the denyhosts stats page shows an average of 14624 new hosts per day (though that page hasn't been updated for more than a month now). Default sync interval is 1 hour, so any denyhosts client will by default receive at most 24*50=1200 new hosts per day. That is 1200/14624 = just 8.2% of all new active crackers of that day. In other words, any given client will _never_ know of 91.8% of all active crackers.

That's bad, since it renders the whole sync stuff mostly useless.

So, I tried to tweak the sync settings a bit. If I can receive only a limited maximum number of new hosts per day, I might well opt to receive the hosts that are most likely to attack me, i.e. the hosts that have already attacked the most other clients. Let's say SYNC_DOWNLOAD_THRESHOLD=10. And quick botnet cracks would be handy to catch early, so SYNC_DOWNLOAD_RESILIENCY=3h. And to keep up with it all, SYNC_INTERVAL=8m. This seems a quite more useful setting, with the client mostly receiving some 30-40 new crackers during each sync run, but sometimes still the maximum amount of 50 (once every few hours).

Conclusion 1: Please adjust the default sync settings which are _way_ wrong.

Yet, there's still another problem. During any given sync run with the above settings, only 5-10 hosts get added to hosts.deny. Indeed, most of the hosts returned by the sync are _already_ in the hosts.deny file. They have been put in at previous sync runs. Quick checking shows that the central server will happily re-send crackers at least 10-20 times (once 37 repeats).

This means the sync run is not working as advertised.

Okay, syncing is hard. A syncing client defines a set of "qualifying" crackers, namely those that have been reported at least N times over a resiliency period of at least T hours. From that set, any given sync run wants exactly those crackers that _started_ to qualify between time S1 and S2 (S1=previous sync, S2=now).

[Sidenote: S1 = WORK_DIR/sync-timestamp should be _server_ time, since client clocks can be way off. I didn't check if this file is indeed filled with a server-provided value.]

I'd expect that the "_started_ to qualify" idea is the big problem.

But the denyhosts central database design is closed-source (Boohoo!), so I can't check what is going wrong. That's the problem with closed-source.

Yet I couldn't stop myself thinking about this problem, and ended up designing a denyhosts central-server idea myself. This design "should" work as intended, but I leave the implementation and verification as an excercise to the reader.

(WARNING: This message is licensed under GNU GPL>=3. Other licensing available on request & payment.)

It's probably easiest to use simple C types to make my point. Actually, it could be a C implementation, given enough RAM (currently ~10M crackers, max ~3k reports per cracker).

cracker_t:
addr_t ipaddress
time_t firsttime
time_t latesttime
time_t resiliency (= .latesttime - .firsttime)
int totalreports (= all reports including cleaned-up ones)
int currentreports (= length of current reports list)
report_t *reports

report_t:
addr_t reportingipaddress
time_t firstreporttime
time_t latestreporttime
time_t latesttotalresiliency
(=report.latestreporttime - cracker.firsttime)
report_t *next (linked list, or somesuch)

Sync upload, reporter sends in some cracker:
make sure reporter itself isn't "reliably" listed as cracker
(if so, then drop the report entirely)
if reported cracker already registered:
if reportingipaddress already in reports list (*Note 1*):
update report.latestreporttime and .latesttotalresiliency
update cracker.latesttime and .resiliency
else
add new report at end of reports list
(must stay sorted for condition (c) below)
update cracker.* stuff
else
add new cracker at end of crackers list
(should stay sorted for easy most-recently-first access)
add new report
set cracker.* stuff

*Note 1* Merging repeated reports from the same reportingipaddres is a design tradeoff. Ideal would be to add all reports separately even if repeatedly from same reporter, but that could possibly use too much database memory. It also possibly messes with the meaning of the (c) condition below, depending on whether/how you want to count NAT-ed hosts.

However, a simplistic always-merging solution would allow the (d) condition below to be manipulated: a small group of malicious clients may keep sending repeated reports for the same bogus cracker every 30 minutes. Due to the merging, and the fact that no honest client will report the bogus cracker, the (d) condition will always trigger and this bogus cracker will certainly be included in all following sync downloads. Do this for a slowly-changing list of 50 bogus crackers, and no real crackers will ever come through again.

So Alternative 2: just drop repeated reports from same reportingipaddress if .latestreporttime is less than 24 hours ago. After 24 hours, merge once. Then again wait 24 hours, and repeat.

Alternative 2b: wait 12 hours if it is report[0], 18 hours if it is report[1], 24 hours if report[>=2].

Both of these variants still have some probability to force the (d) condition to pass from time to time.

Alternative 3: After 24 hours, keep old report untouched and add new 2nd report at end of reports list. Wait another 24 hours, dropping new duplicate reports. Then add a 3rd report for this reportingipaddress at end of reports list. New duplicates may be merged to this 3rd report indefinitely, since the second test of (d) will keep failing at the 2nd report (assuming all sensible clients have T less than 24 hours).

Downside of this variant is that the meaning of the (c) condition is changed (which may not necessarily be bad in this case).

Sync download: client provides N, T, S1; server knows (and sends) S2.
limit S1 to reasonable value, for example 1 day ago.
_bottom-up_ check for qualifying crackers (most-recently-first, to
have most useful results within max-50 limit)
(a) cracker qualifies if
cracker.currentreports >= N && cracker.resiliency >= T
(b) cracker can only have _started_ to qualify during S1--S2 if it was
actually updated during that period, so if
cracker.latesttime >= S1 (always <= S2=now)
we want to send as soon as both N and T are reached = (a), i.e. as
soon as the _last_ of them is reached:
(c) cracker _started_ to qualify for N during S1--S2 if
reports[N-1].firstreporttime >= S1
(d) cracker _started_ to qualify for T during S1--S2 if
any report has .latestreporttime >= S1 &&
.latesttotalresiliency >= T
but no report has .latestreporttime <= S1 &&
.latesttotalresiliency >= T
cracker will be sent to syncing client if a && b && (c || d)

(a) and (b) are very fast checks; (c) may be slow(er) depending on implementation, but constant-time per cracker, (d) can be slowest but second check will often fail early. Depending on implementation, the first few (d) checks can be combined with (c).

"Use case" for (c):
Client sets N=5 reporting hosts, T=5 hours resiliency, S1 = 1 hour ago.
New cracker gets 1 report each day.
Cracker will qualify for T from second day onward, but not yet for N.
On the fifth day, the fifth report is added, which means the cracker _starts_ to qualify.
Cracker will be sent when client connects the first time after the fifth report is received. All following client connects will have S1 after report[4].firstreporttime, and the cracker will not be sent again any more.

"Use case" for (d):
Client sets N=5 reporting hosts, T=5 hours resiliency, S1 = 1 hour ago.
New cracker gets 10 reports within 1 hour, then it takes a week vacation, then after that week suddenly one new report, and three hours later another report.
Cracker will not qualify during that week, and will only _start_ to qualify when the first new report arrives. At that point, (a) becomes true for the first time ever, the new report passes the first check of (d), and no reports fail the second check of (d).
The next client connects will have an increased S1, making the first check of (d) fail.
The second new report ("three hours later") is of no consequence, since the second new report indeed passes the first check of (d), but the earlier report already fails the second check of (d).

This will cause repeated sync-download if the final "three hours later" report actually was a duplicate from the same reportingipaddress as the earlier report and these reports were merged. See *Note 1* above.

Expiry/maintenance every hour/day:
remove reports with .latestreporttime older than (for example) 1 month
and only update cracker.currentreports
remove reports that were reported by what we now "reliably" know to
be crackers themselves
remove crackers that have no reports left

Well that's about it. I hope this is clear enough.

Conclusion 2: Please adjust the central server design to start working correctly, possibly by implementing above ideas.

Conclusion 2.5 in case you're not interested in doing that: Anyone else willing to try it?

Which brings me to the last related issue: preventing database poisoning by botnets.

Any black-hat botnet operator wanting to keep a bunch of real crackers off the radar, will need to make sure that 50+ bogus crackers are sent to all clients all the time. (Because as long as a total of less than 50 crackers are sent to a client, all qualifying _real_ crackers will certainly be included.) So, many repeated 50-new-cracker sync downloads are highly suspicious.

... which seems to be the case right now. Oops.

Some of this might well be caused by the semi-broken central server issues as described above (10-20 times repeated downloads for any given cracker), some of this might not...

Next to solving the underlying central server issues, increasing the maximum of 50 crackers per sync would be a _very_ good idea to counter this.

But pending that, and indeed supplemental to that, the straightforward client-based solution actually works: simply increase SYNC_DOWNLOAD_THRESHOLD (called N above). Why does that work? Well, if a real cracker is forcibly kept off of the sync downloads, very few hosts will receive the real cracker and block it proactively. Which means the cracker will be seen - and _reactively_ blocked - by many hosts. All these many hosts will report the cracker to the central database. Which means this real cracker will easily overcome a very high threshold.

As long as a client keeps the threshold low, it will receive the newest 50 qualifying crackers, which might well be bogus, and it will never see the real cracker which quickly will be (much) older and gets drowned in the noise. As soon as a client sets a higher threshold, the real cracker will still qualify easily, but there are much less bogus crackers to fill the 50 maximum.

This keeps working when all clients raise thresholds simultaneously. Actually, it will work better, since more clients will receive more real crackers and proactively block them. In the "completely swamped" situation, many clients will report a cracker since they do not _know_ that it has been reported very many times already. In the "controlled" situation, any cracker will be reported at most average(thresholds)+1 times, and then anyone else will have proactively blocked it.

And this will probably continue working until a black-hat botnet operator generates more traffic to the denyhosts central server than to all really brute-forced ssh hosts combined. After all, he'll need to make sure there are always at least 50 new (very-)high-threshold bogus crackers in the database. At that point, it'll be very easy to single out who the "heavy denyhosts contributers" are, and to simply block them from contributing. Additionally, this will reveal a major part of the botnet IP addresses, which is usually not what botnet operators want... (Indeed, this would effectively amount to DOS'ing the denyhosts central server, which would probably be a lot easier to do _outside_ the denyhosts sync protocol.)

So I'd propose an automatically-adaptive threshold for all clients, as follows:

1. The central server will report some maximum allowed number of crackers, and additionally include an explicit "too many" notice if more crackers were available. (This is to allow easy server-based adjustment of the maximum.)

2. If a client sees the "too many" notice for, say, 10 consecutive sync cycles, it will increase its threshold by 1. Then it starts counting 10 sync cycles again, etc.

3. If a client does not see the "too many" notice for, say, 30 consecutive sync cycles, it will decrease its threshold by 1, but no further down than the configfile value (which becomes a minimum, effectively).

Some more contrived control algorithms would be possible, but this should already work quite nicely.

This client-side adjustment idea has the significant advantage that it will automatically find the optimal SYNC_DOWNLOAD_THRESHOLD value for any given combination of SYNC_DOWNLOAD_RESILIENCY and SYNC_INTERVAL. (As opposed to other, completely server-based ideas that I had been considering.)

To cater for manual or cron-based setups, the threshold and sync-cycle counters should be saved in the WORK_DIR.

Pending an update of the central server, items 2 and 3 can already be implemented in clients, with the "too many" check replaced by ">=50 newly received crackers". But note that, due to the current server issues outline above, most of the received "real" crackers will actually be duplicates.

Conclusion 3: Please implement something like the outlined automatic threshold adjustment system to make poisoning the denyhosts central database a useless affair.

Best regards, and keep up the good work,

Anne Bezemer

J.A.BezemerXopensourcepartners.nl | s/X/@/

This message and all ideas expressed herein are Copyright (C) 2011 by me, and licensed under GNU General Public License version 3 or higher. Other licensing options are available on request & payment. No warranty whatsoever.

Discussion

  • K Yoder

    K Yoder - 2011-06-09

    I've noticed the same problem. I can't see anything in the sourceforge bug tracker that allows me to "up-vote" this bug, so this comment serves as my unofficial "up-vote" instead.

    One of the primary reasons I chose denyhosts over other solutions (eg fail2ban) was because of the collaboritive brute-forcer reports. I'm noticing an increase in distributed brute-force attacks (eg a botnet?), so I hope this issue can be fixed.

     
  • Kyle Willmon

    Kyle Willmon - 2011-06-09

    I would like to add that I am also interested in this problem. I hope to see the source code for the server released as open source software so that we can assist in creating a better system.

    This bug is equivalent to Debian bug #622697

    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=622697

     
  • Wolfgang M.

    Wolfgang M. - 2013-12-15

    Any news regarding this issue (which is still valid)?
    I thought the same (well, a bit less detailed) when I parsed the denyhosts log.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks