I'm also interested in creating an open source WoW combat log parser (client 2.4 or higher). I haven't seen any activity here yet so I'm assuming you're still in the early stages of design. What features are you considering? For my project, I'm looking to create a Java based client that generates XML representing the aggregated data. I like the idea of using a well documented and easy to understand XML intermediate representation to allow any number of utilities (including possibly your's) to generate web pages, populate tables, etc that suit the user's specific needs.
This project of mine is still early in the design stage as I'm still trying to compile a list of requirements and features. Perhaps we can share some ideas to make both of our projects better or collaborate together.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Think what i have now would be considered Beta testing. Im trying to get it to the point that i can import a nights raid say 75mb log file in under an hour. The server i have at home is a P2 so not the fastest at processing. All the data gets stored into mysql i have reports displaying information in the form of simple web pages. I have been thinking of doing something with gd lib to do bar graphs and pie charts.
Im trying to avoid having any kind of client to run this. The server is capable of processing the combat log its self. Mysql is more then able to deal with the reports. The only issue is that you will have to FTP the combat log up to the server because php file upload is 2mb unless you change that in php.ini which i don't think everyone is going to have access to.
Reports done:
1. total raid damage, dps, healing
2. player breakdown. (by spell)
a. damage done.
b. healing done.
c. damage taken
d. healing taken.
3. player summary. basically just total's of everything dps and the like.
4. buffs and debuffs. Any buffs put on the player. you can see down to the number of pots this person took.
works but needs testing.
1. over healing.
2. merged pet data.
Issues and needs fixing.
1. _ENERGIZE mana returned from mana totems and shadow priests. Need to decide where to save this.
2. Deciding when combat has ended and starting new combat.
3. logging player deaths.
Think thats all for now.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The issue you mention of file size restrictions on uploads is one of the biggest initial reasons why I decided to aggregate the data on the client side. Although, after more consideration in the interest of making this as modular and robust as possible for people to create their own interfaces representing the aggregated data, client side makes more sense for my project.
Something I've been wondering is how are you starting the parser after you upload the combat log via FTP? I can only think of three ways to do that:
1) Invoke PHP through it's apache module (or IIS if you prefer) and provide the path to the file on the server as an argument. But that would run into problems with timeouts on both the client and server side as well as PHP's default max execution time of 30 seconds which often can't be changed on production servers since most run in safe mode.
2) Run the script through the CLI which is admittedly enabled by default since 4.3 but requires the user to have access to the command line which is not typical of hosting companies in my experience.
3) Run the script through the CLI by making an exec() or similar call in another script but this is usually restricted on production servers and would probably also at least cause the appearance of time outs since I think those calls all wait for the child to return.
Am I missing something or do you have a clever solution?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
file is uploaded to a que dir. i have a job set up in cron tab on the server that checks for files in that dir. If it finds one it process it.
You really cant run it via apache even if you up the time out theres no way a web browser would be able to take it. It has to be run as a batch job on the server.
I haven't completely rejected the idea of a client to process it on your personal pc. I just want to get a working version of it up. That and its going to require i reinstall VS.net again. I would rather not have a client Mainly because being open source i feel the risk is going to be to grate that someone will slip a key logger into the code.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, that is a clever solution that would work for people who have complete control over the server.
The thought of a key logger being slipped into the client is a threat I had not considered; but I don't think it's likely, at least not in the official version since I'll personally be scrutinizing any code submitted before it's applied to the source tree. Users would need to be informed of the risks involved in using unofficial versions.
Here are some thoughts from my design that may or may not help you in determining when combat starts and ends. My design is heavily focused on partitioning the raid into boss attempts and a catch-all for trash. To that end, I've been working on ways to determine when encounters begin and end (either with a boss kill or a wipe). My solution requires maintaining a list of boss names and also provide a way for users to add names to the list as new raid zones come out or if they want parsing for something more trivial like a 5 man.
For setting the start of boss partitions, in each event where damage is dealt (and the raid is not currently engaged in a boss encounter, just a boolean), I check the name of the hostile entity (determined from the flags describing relationship to the person creating the log) against the boss list and set a partition barrier at that point if a match is found.
For setting the end of boss partitions, starting from the new barrier every event that contains a player (again determined by the entity flags) will look up that player in a player list. If the player is found, do nothing; if the player is not found, insert them into the list and increase the player count. A second variable tracks the number of dead players and increases whenever there is an event for player death and decreases whenever a player reincarnates, resurrects with a soulstone, or is resurrected by a Druid. Combat is over when the size of the player list equals the count of dead players or there is an event for the death of the boss.
Some problems I have identified include:
1) It's possible to easily identify a player being resurrected by a Druid because it will show up as a SPELL_CAST_SUCCESS event. I'm not sure if the use of reincarnation or soul stones generate events (simple to check, I'm just not sure).
2) If a Paladin uses divine intervention to save somebody to help recover from the wipe, the player death count will never equal the player count. I think this can be remedied by increasing the death count when a player receives this buff and decreasing it when the buff is removed.
3) The overhead involved in searching through a list every time damage is dealt, especially the boss list if it contains every boss for every raid zone. The lists would need to be alphabetical to enable binary searches keeping it O(log n) down from O(n). Additionally, the client could ask the user for the raid zone(s) prior to parsing to keep the list limited to just the bosses that might be encountered. Another optimization comes from the fact that large numbers of events are generated around a single mob and a short list containing the names of the most recent trash mobs (three seems like a good number) could be searched first to rule out based on existing matches there before searching the long list of bosses to rule out based on the absence of a match.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think you need to look though the combat log and see how the data is stored there. Some of the things you have mentioned arn't even possible. Then i recommend checking wowwiki so that you can see how the data is formated. checking the lua code for a few of the more popular dps addons that have been converted to 2.4 is also a good idea. Only then can you begin to figure out how you want to read the data.
How do you want to use this data? I'm on the third rework of the database. Trust me cant just import 75 mb of data. A 75 mb combat log has around 4 million rows. You cant just dump 4 million rows into the database trust me. Mysql wasn't happy. Even if you do make a client you are going to have the same problems with how to import the data. the file would probably still have to be ftp'ed up. unless you think you can convert a 75 mb file to under 2mb of xml. using exec() or shell() are two very scary php functions that should be avoided at all costs.
I don't think having access to cron tab is to much of a problem. Anyone that would run this would probably also have phpbb, eqdkp, teamspeak and ventrillo so they will have shell access. if not then it will be hosting company's that may some day offer this as well and they will have the job in cron tab. I think i can live with the first few versions not having a client and requiring it be in cron tab. It is beta after all i think its probably better that someone know what they are doing before trying to run this.
If you know .net we could probably work on a client together. As i have never even seen a java code i don't think doing the client for me in java is something i would attempt.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I do have an excellent understanding of how the combat logs look, how they are formatted, the parameters to each event and their meaning, and the large file sizes from a night of raiding. I'm curious to know what you think is not possible. I also think it's possible to condense all the data down to <2MB of XML simply because it's aggregated data.
I'm not interested in individual events, just the aggregated data. I don't know if you used WWS before 2.4 but there was an option there to generate XML as well as the HTML files when self hosting. A night's worth of raiding resulted in log files in the 10 to 15 mb range but the XML generated that kept track of everything was only ~0.5 mb. My XML files will likely be bigger than this, but only because I intend to track data WWS did not. The combat log file sizes are much much bigger now, but mainly because each line representing an event is much longer now and includes more useful information such as the flags describing the entity in terms of relationship to the person who generated the log (self, party, raid, outside), it's behavior towards the player (hostile, friendly, neutral), how it's controlled, if it has a raid icon, etc.
There's nothing in my design where I intend to use exec or similar function, I was just listing that trying to guess how you're getting the combat log file as input to your PHP parser.
As for how I intend to use this data, I'm generating XML that is divided into partitions that represent a slice of time from the raid. In each partition (boss attempt/kill) I'm keeping track of each player and the damage they deal with each ability and statistics about that ability such as averages, max, total, crit count/percentage, max crit, resist percentages, up front damage, periodic damage, etc. Basically, any SPELL_DAMAGE, SWING_DAMAGE, or SPELL_PERIODIC_DAMAGE lines from the combat log, look at the srcName and attribute the data to that person's use of that ability.
Similar aggregate data for each player's damage received broken down by who dealt the damage and the ability used (although I may limit details to only damage dealt during boss encounters because that is a bit excessive). This is one of the areas I was talking about when I said I intend to track data WWS did not.
Similar stuff with healing and energy gains though not all of the things mentioned will apply (like crits for energy gains).
Whenever I determine a boss attempt to be over, I end the partition, write it to disk, free up any data structures used for that partition, and add any future events to the catch-all partition until I determine combat with another boss has begun.
As for what I'm doing with the XML, I'll probably create two separate tools to start. One that generates static XHTML for self-hosting (much like the old WWS client did though with some improvements) and another that does something similar to what you're doing - populates tables in a database. I'm interested in tracking performance over time for the entire raid and individual players and I'll need to be able to pull all relevant data for a player or a boss or combination of the two. It's something I had started using the self-hosting option of WWS but the removal of self-hosting in WWS2 prevents it now.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The easiest way i have found to tell when combat stops is the fact that the log file stops for 2+ seconds. Even if they start rezing and buffing it wont matter they arnt in combat so the combat log has stopped.
During combat every NPC, PC and pet that does something has one row i the table associated to the spell they did and who they did it to.
Player X does spell Y to NPC Z
Player X does spell N to NPC Z
I have counters that detect the nr of times they did this and if it was a crit or not. So in the end you cant calculate up dps or you can see down to each NPC what they did.
How to detect over healing wasn't easy. Most addons can just check the max HP of the player and/or NPC. We cant do that. So what i did was said ok so each player starts with 0 HP. then as combat goes on i - damage and + healing. But i always check first. if the healing would put the player over 0 HP then the rest is counted as over healing.
How much php do you know? Would it help if i uploaded a alpha not everything works or has been tested version? If you want to see some output try. <a href=http://phprs.wowportal.dk/index.php?show=do>phprs Beta output</a>
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So you compare the event's time stamp to the one before it and if it's greater than two seconds you consider that to signal the end of combat? Which mod do you use to start and stop /combatlog ? From your post it sounds like you have a mod that enables and disables it only when in combat, which would make your 2+ second scenario effective. I'm still using AutoCL and it doesn't start or stop except when changing into specific zones so, if I understand your approach correctly, if they started resurrecting players within 2 seconds after combat stops, there would not exist a 2 second gap between events; at least, not in a log moderated by AutoCL.
I like your solution for over-healing.
I've been using PHP for a few years so feel free to post the code and I can help test, debug, and implement. It would probably also help if you posted a MySQL script to set up the database to match what you currently have.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
i just use /combatlog starts it and i leave it running. My lua is very limited only enough to maybe fix a but in a mod i use not more then that.
The only way i could think of to test if it was a wipe or not was to check if its more then 2 sec. A second check could be if there has gone more then say 5 sec with out damage being done to the mob or a player. Then they are probably rezing and rebuffing. There isnt a way to test if a player dies only if the mob dies.
I wont have time to upload a version for you until late next week. i haven't used source forge in years and need to try and remember how to upload the cvs tree. I will give you a copy of everything you need to get it running. i try and comment the code so you should be able to follow it. The import module is just one big class. there really does need to be a way to import the data without sending the processor to 80%+ on the server for 1+ hours :).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, you are right, UNIT_DIED only reports mob deaths; I had it in my head that it reported all deaths. Probably because the pre 2.4 logs did explicitly have lines for player deaths.
The problem I see with your method is mainly one of fine-tuning. I can think of scenarios where a two second gap could exist without the fight actually being over. For example, it might report a wipe on Ragnaros when Ragnaros submerged and the raid is waiting that 15 or so seconds for the sons to spawn. The same thing could happen with Nightbane in Karazhan; after the skeletons are dead, it takes Nightbane about 7-10 seconds to fly back down.
There is a chance that there will be some healing going on, preventing a two second lapse; but it's not guaranteed, especially if people are worried about pulling aggro when he lands since it resets the threat table.
Obviously too low of a threshold for inactivity results in incorrectly saying a wipe occurred and too high of a threshold might let wipes slip by unnoticed. It's this fine tuning dependent upon boss mechanics that was motivating me to find a more absolute method for determining end of combat; but that's out the window until Blizzard adds player deaths back to the combat log.
I would lean towards a slightly higher threshold than two seconds, maybe as high as five seconds. The second check looking for damage in a ~20 second stretch after a short period of inactivity would probably indicate a wipe with high probability of accuracy.
I'll start getting as many logs as I can from friends in different guilds since it seems like fine-tuning the thresholds will be an empirical process that needs to be tested as much as possible on every boss in the game.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
My guild clears MH and we are on ROS in BT so i can get combat logs for those. I can probably get into a kara or ZA raid over the next few weeks. Don't think i have any way of getting any combat logs for say SSC, TK, Gruul or mag. Guild Master might have some old combat logs laying around from when we where doing them but im not sure he used to post stuff to WWS.
Note: thats the only thing i haven't added to my import script yet. Right now i'm only running an import on one boss attempt. For the reasons you stated above i don't want to have to recode it until i come up with the best way of doing it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Now what happened to html tags will display in your post. BTW im going to be mostly away for the next week. But i will try to check my email on and off. Oh and i can almost always be found on wowwiki's IRC channel. if you want to chat.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm also interested in creating an open source WoW combat log parser (client 2.4 or higher). I haven't seen any activity here yet so I'm assuming you're still in the early stages of design. What features are you considering? For my project, I'm looking to create a Java based client that generates XML representing the aggregated data. I like the idea of using a well documented and easy to understand XML intermediate representation to allow any number of utilities (including possibly your's) to generate web pages, populate tables, etc that suit the user's specific needs.
This project of mine is still early in the design stage as I'm still trying to compile a list of requirements and features. Perhaps we can share some ideas to make both of our projects better or collaborate together.
Think what i have now would be considered Beta testing. Im trying to get it to the point that i can import a nights raid say 75mb log file in under an hour. The server i have at home is a P2 so not the fastest at processing. All the data gets stored into mysql i have reports displaying information in the form of simple web pages. I have been thinking of doing something with gd lib to do bar graphs and pie charts.
Im trying to avoid having any kind of client to run this. The server is capable of processing the combat log its self. Mysql is more then able to deal with the reports. The only issue is that you will have to FTP the combat log up to the server because php file upload is 2mb unless you change that in php.ini which i don't think everyone is going to have access to.
Reports done:
1. total raid damage, dps, healing
2. player breakdown. (by spell)
a. damage done.
b. healing done.
c. damage taken
d. healing taken.
3. player summary. basically just total's of everything dps and the like.
4. buffs and debuffs. Any buffs put on the player. you can see down to the number of pots this person took.
works but needs testing.
1. over healing.
2. merged pet data.
Issues and needs fixing.
1. _ENERGIZE mana returned from mana totems and shadow priests. Need to decide where to save this.
2. Deciding when combat has ended and starting new combat.
3. logging player deaths.
Think thats all for now.
The issue you mention of file size restrictions on uploads is one of the biggest initial reasons why I decided to aggregate the data on the client side. Although, after more consideration in the interest of making this as modular and robust as possible for people to create their own interfaces representing the aggregated data, client side makes more sense for my project.
Something I've been wondering is how are you starting the parser after you upload the combat log via FTP? I can only think of three ways to do that:
1) Invoke PHP through it's apache module (or IIS if you prefer) and provide the path to the file on the server as an argument. But that would run into problems with timeouts on both the client and server side as well as PHP's default max execution time of 30 seconds which often can't be changed on production servers since most run in safe mode.
2) Run the script through the CLI which is admittedly enabled by default since 4.3 but requires the user to have access to the command line which is not typical of hosting companies in my experience.
3) Run the script through the CLI by making an exec() or similar call in another script but this is usually restricted on production servers and would probably also at least cause the appearance of time outs since I think those calls all wait for the child to return.
Am I missing something or do you have a clever solution?
file is uploaded to a que dir. i have a job set up in cron tab on the server that checks for files in that dir. If it finds one it process it.
You really cant run it via apache even if you up the time out theres no way a web browser would be able to take it. It has to be run as a batch job on the server.
I haven't completely rejected the idea of a client to process it on your personal pc. I just want to get a working version of it up. That and its going to require i reinstall VS.net again. I would rather not have a client Mainly because being open source i feel the risk is going to be to grate that someone will slip a key logger into the code.
Well, that is a clever solution that would work for people who have complete control over the server.
The thought of a key logger being slipped into the client is a threat I had not considered; but I don't think it's likely, at least not in the official version since I'll personally be scrutinizing any code submitted before it's applied to the source tree. Users would need to be informed of the risks involved in using unofficial versions.
Here are some thoughts from my design that may or may not help you in determining when combat starts and ends. My design is heavily focused on partitioning the raid into boss attempts and a catch-all for trash. To that end, I've been working on ways to determine when encounters begin and end (either with a boss kill or a wipe). My solution requires maintaining a list of boss names and also provide a way for users to add names to the list as new raid zones come out or if they want parsing for something more trivial like a 5 man.
For setting the start of boss partitions, in each event where damage is dealt (and the raid is not currently engaged in a boss encounter, just a boolean), I check the name of the hostile entity (determined from the flags describing relationship to the person creating the log) against the boss list and set a partition barrier at that point if a match is found.
For setting the end of boss partitions, starting from the new barrier every event that contains a player (again determined by the entity flags) will look up that player in a player list. If the player is found, do nothing; if the player is not found, insert them into the list and increase the player count. A second variable tracks the number of dead players and increases whenever there is an event for player death and decreases whenever a player reincarnates, resurrects with a soulstone, or is resurrected by a Druid. Combat is over when the size of the player list equals the count of dead players or there is an event for the death of the boss.
Some problems I have identified include:
1) It's possible to easily identify a player being resurrected by a Druid because it will show up as a SPELL_CAST_SUCCESS event. I'm not sure if the use of reincarnation or soul stones generate events (simple to check, I'm just not sure).
2) If a Paladin uses divine intervention to save somebody to help recover from the wipe, the player death count will never equal the player count. I think this can be remedied by increasing the death count when a player receives this buff and decreasing it when the buff is removed.
3) The overhead involved in searching through a list every time damage is dealt, especially the boss list if it contains every boss for every raid zone. The lists would need to be alphabetical to enable binary searches keeping it O(log n) down from O(n). Additionally, the client could ask the user for the raid zone(s) prior to parsing to keep the list limited to just the bosses that might be encountered. Another optimization comes from the fact that large numbers of events are generated around a single mob and a short list containing the names of the most recent trash mobs (three seems like a good number) could be searched first to rule out based on existing matches there before searching the long list of bosses to rule out based on the absence of a match.
I think you need to look though the combat log and see how the data is stored there. Some of the things you have mentioned arn't even possible. Then i recommend checking wowwiki so that you can see how the data is formated. checking the lua code for a few of the more popular dps addons that have been converted to 2.4 is also a good idea. Only then can you begin to figure out how you want to read the data.
How do you want to use this data? I'm on the third rework of the database. Trust me cant just import 75 mb of data. A 75 mb combat log has around 4 million rows. You cant just dump 4 million rows into the database trust me. Mysql wasn't happy. Even if you do make a client you are going to have the same problems with how to import the data. the file would probably still have to be ftp'ed up. unless you think you can convert a 75 mb file to under 2mb of xml. using exec() or shell() are two very scary php functions that should be avoided at all costs.
I don't think having access to cron tab is to much of a problem. Anyone that would run this would probably also have phpbb, eqdkp, teamspeak and ventrillo so they will have shell access. if not then it will be hosting company's that may some day offer this as well and they will have the job in cron tab. I think i can live with the first few versions not having a client and requiring it be in cron tab. It is beta after all i think its probably better that someone know what they are doing before trying to run this.
If you know .net we could probably work on a client together. As i have never even seen a java code i don't think doing the client for me in java is something i would attempt.
I do have an excellent understanding of how the combat logs look, how they are formatted, the parameters to each event and their meaning, and the large file sizes from a night of raiding. I'm curious to know what you think is not possible. I also think it's possible to condense all the data down to <2MB of XML simply because it's aggregated data.
I'm not interested in individual events, just the aggregated data. I don't know if you used WWS before 2.4 but there was an option there to generate XML as well as the HTML files when self hosting. A night's worth of raiding resulted in log files in the 10 to 15 mb range but the XML generated that kept track of everything was only ~0.5 mb. My XML files will likely be bigger than this, but only because I intend to track data WWS did not. The combat log file sizes are much much bigger now, but mainly because each line representing an event is much longer now and includes more useful information such as the flags describing the entity in terms of relationship to the person who generated the log (self, party, raid, outside), it's behavior towards the player (hostile, friendly, neutral), how it's controlled, if it has a raid icon, etc.
There's nothing in my design where I intend to use exec or similar function, I was just listing that trying to guess how you're getting the combat log file as input to your PHP parser.
As for how I intend to use this data, I'm generating XML that is divided into partitions that represent a slice of time from the raid. In each partition (boss attempt/kill) I'm keeping track of each player and the damage they deal with each ability and statistics about that ability such as averages, max, total, crit count/percentage, max crit, resist percentages, up front damage, periodic damage, etc. Basically, any SPELL_DAMAGE, SWING_DAMAGE, or SPELL_PERIODIC_DAMAGE lines from the combat log, look at the srcName and attribute the data to that person's use of that ability.
Similar aggregate data for each player's damage received broken down by who dealt the damage and the ability used (although I may limit details to only damage dealt during boss encounters because that is a bit excessive). This is one of the areas I was talking about when I said I intend to track data WWS did not.
Similar stuff with healing and energy gains though not all of the things mentioned will apply (like crits for energy gains).
Whenever I determine a boss attempt to be over, I end the partition, write it to disk, free up any data structures used for that partition, and add any future events to the catch-all partition until I determine combat with another boss has begun.
As for what I'm doing with the XML, I'll probably create two separate tools to start. One that generates static XHTML for self-hosting (much like the old WWS client did though with some improvements) and another that does something similar to what you're doing - populates tables in a database. I'm interested in tracking performance over time for the entire raid and individual players and I'll need to be able to pull all relevant data for a player or a boss or combination of the two. It's something I had started using the self-hosting option of WWS but the removal of self-hosting in WWS2 prevents it now.
Ok heres how i am doing it.
The easiest way i have found to tell when combat stops is the fact that the log file stops for 2+ seconds. Even if they start rezing and buffing it wont matter they arnt in combat so the combat log has stopped.
During combat every NPC, PC and pet that does something has one row i the table associated to the spell they did and who they did it to.
Player X does spell Y to NPC Z
Player X does spell N to NPC Z
I have counters that detect the nr of times they did this and if it was a crit or not. So in the end you cant calculate up dps or you can see down to each NPC what they did.
How to detect over healing wasn't easy. Most addons can just check the max HP of the player and/or NPC. We cant do that. So what i did was said ok so each player starts with 0 HP. then as combat goes on i - damage and + healing. But i always check first. if the healing would put the player over 0 HP then the rest is counted as over healing.
How much php do you know? Would it help if i uploaded a alpha not everything works or has been tested version? If you want to see some output try. <a href=http://phprs.wowportal.dk/index.php?show=do>phprs Beta output</a>
So you compare the event's time stamp to the one before it and if it's greater than two seconds you consider that to signal the end of combat? Which mod do you use to start and stop /combatlog ? From your post it sounds like you have a mod that enables and disables it only when in combat, which would make your 2+ second scenario effective. I'm still using AutoCL and it doesn't start or stop except when changing into specific zones so, if I understand your approach correctly, if they started resurrecting players within 2 seconds after combat stops, there would not exist a 2 second gap between events; at least, not in a log moderated by AutoCL.
I like your solution for over-healing.
I've been using PHP for a few years so feel free to post the code and I can help test, debug, and implement. It would probably also help if you posted a MySQL script to set up the database to match what you currently have.
i just use /combatlog starts it and i leave it running. My lua is very limited only enough to maybe fix a but in a mod i use not more then that.
The only way i could think of to test if it was a wipe or not was to check if its more then 2 sec. A second check could be if there has gone more then say 5 sec with out damage being done to the mob or a player. Then they are probably rezing and rebuffing. There isnt a way to test if a player dies only if the mob dies.
I wont have time to upload a version for you until late next week. i haven't used source forge in years and need to try and remember how to upload the cvs tree. I will give you a copy of everything you need to get it running. i try and comment the code so you should be able to follow it. The import module is just one big class. there really does need to be a way to import the data without sending the processor to 80%+ on the server for 1+ hours :).
Well, you are right, UNIT_DIED only reports mob deaths; I had it in my head that it reported all deaths. Probably because the pre 2.4 logs did explicitly have lines for player deaths.
The problem I see with your method is mainly one of fine-tuning. I can think of scenarios where a two second gap could exist without the fight actually being over. For example, it might report a wipe on Ragnaros when Ragnaros submerged and the raid is waiting that 15 or so seconds for the sons to spawn. The same thing could happen with Nightbane in Karazhan; after the skeletons are dead, it takes Nightbane about 7-10 seconds to fly back down.
There is a chance that there will be some healing going on, preventing a two second lapse; but it's not guaranteed, especially if people are worried about pulling aggro when he lands since it resets the threat table.
Obviously too low of a threshold for inactivity results in incorrectly saying a wipe occurred and too high of a threshold might let wipes slip by unnoticed. It's this fine tuning dependent upon boss mechanics that was motivating me to find a more absolute method for determining end of combat; but that's out the window until Blizzard adds player deaths back to the combat log.
I would lean towards a slightly higher threshold than two seconds, maybe as high as five seconds. The second check looking for damage in a ~20 second stretch after a short period of inactivity would probably indicate a wipe with high probability of accuracy.
I'll start getting as many logs as I can from friends in different guilds since it seems like fine-tuning the thresholds will be an empirical process that needs to be tested as much as possible on every boss in the game.
My guild clears MH and we are on ROS in BT so i can get combat logs for those. I can probably get into a kara or ZA raid over the next few weeks. Don't think i have any way of getting any combat logs for say SSC, TK, Gruul or mag. Guild Master might have some old combat logs laying around from when we where doing them but im not sure he used to post stuff to WWS.
Note: thats the only thing i haven't added to my import script yet. Right now i'm only running an import on one boss attempt. For the reasons you stated above i don't want to have to recode it until i come up with the best way of doing it.
Now what happened to html tags will display in your post. BTW im going to be mostly away for the next week. But i will try to check my email on and off. Oh and i can almost always be found on wowwiki's IRC channel. if you want to chat.