From: Phil R. <pr...@en...> - 2015-10-26 18:46:09
|
Hi all, This past July, I gave a talk about using Python to examine malware: http://www.slideshare.net/mrphilroth/examining-malware-with-python https://www.youtube.com/watch?v=2gyAemhbxnE In it, I talk about using machine learning techniques to classify malware. Specifically, I compare the performance of classification models based on instructions generated by IDA Pro and instructions I generated myself with Capstone. Someone with this project made a comment about the talk on Twitter: https://twitter.com/capstone_engine/status/624580597650862080 Next month, I’m going to be giving a talk to a Meetup group in San Francisco where I’m going to include some of the same material. I wanted to check here before I give the talk so that I don’t misrepresent what Capstone is and is not. I don’t feel like I yet totally understand the issues behind that tweet. My message is going to be: “Disassembled instructions are a great feature to use when using machine learning models to classify malware. Results can vary based on what disassembler is used. I’ve found that a model based on features from a single pass disassembler like Capstone will produce slightly worse results than one based on IDA Pro disassembly. But the ease of use and repeatability of the results make it a better choice.” Is the error in those statements referring to Capstone Engine as the disassembler? Should I be referring to LLVM MC as the disassembler and Capstone as the framework through which I used it? Is there some other problem that I don’t yet understand? Thanks for any feedback. Phil Roth Data Scientist pr...@en... C: 240-997-8251 www.endgame.com <http://www.endgame.com/> ENDGAME |