From: Phil R. <pr...@en...> - 2015-10-27 15:38:47
|
Thanks for the responses. The Meetup is advertised here: http://www.meetup.com/Data-Mining-for-Cyber-Security/events/225431456/ My work with disassembly from capstone will be a small part of just one of the talks. But please come out if you’re interested. "what do you mean by "single pass disassembler"? this is how all the disassemblers work, not only Capstone.” I’m probably confusing reversing frameworks with disassemblers. It’s my understanding that Capstone will disassemble binary data as data is encountered, where as a reversing framework will analyze the entire input, figure out what the control flow is, and make decisions about functions, subroutines, and what is code and what is data. In the SciPy talk, I refer to all of this work as “doing an analysis pass” before getting disassembly. "also, can you elaborate where IDA produces better result?” In this case, the better result means that models trained on the disassembly do better in the machine learning competition metric. I ended the story here in the summer, but I’ve since been convinced that the better performance in the competition was great, but not really necessary in practice. "keep in mind that IDA is a complicated tool which does a lot more than just disassembling, why Capstone is designed to do just one simple thing: disassemble the binary you feed it. more complicated process must be done by your programs.” Definitely. I will stress this during the talk. And my current view is actually that the complications of running a full analysis tool like IDA is not worth the small performance gain. So my message is now edited to something like this: “Disassembled instructions are a great feature to use when classifying malware with machine learning models. I used capstone, a simple and easy to use disassembler, through its Python interface. If more complicated code analysis and reversing tools are used to generate the disassembled instructions, your classification models will provide slightly better results. I’ve found that the benefits of using the simpler tools outweigh the slightly degraded classification power." Phil Roth Data Scientist pr...@en... C: 240-997-8251 www.endgame.com <http://www.endgame.com/> ENDGAME From: Jay Oster <ja...@ko...> Date: Tuesday, October 27, 2015 at 1:56 AM To: "Capstone disassembly framework (www.capstone-engine.org)" <cap...@li...> Cc: Phil Roth <pr...@en...> Subject: Re: [Capstone-users] Capstone Engine is a Framework? Hi Phil, Which meetup will you be speaking at? I'll try to attend! (SF local here.) Also, I'd be willing to chat a bit in regard to Capstone, and share some thoughts and ideas. Cheers, Jay On Mon, Oct 26, 2015 at 7:49 PM, Nguyen Anh Quynh <aq...@gm...> wrote: > > > On Tue, Oct 27, 2015 at 2:30 AM, Phil Roth <pr...@en...> wrote: >> Hi all, >> >> This past July, I gave a talk about using Python to examine malware: >> http://www.slideshare.net/mrphilroth/examining-malware-with-python >> https://www.youtube.com/watch?v=2gyAemhbxnE > > thanks for sharing this. it looks like a nice work, congrats! > >> >> In it, I talk about using machine learning techniques to classify malware. >> Specifically, I compare the performance of classification models based on >> instructions generated by IDA Pro and instructions I generated myself with >> Capstone. Someone with this project made a comment about the talk on Twitter: >> https://twitter.com/capstone_engine/status/624580597650862080 >> >> Next month, I’m going to be giving a talk to a Meetup group in San Francisco >> where I’m going to include some of the same material. I wanted to check here >> before I give the talk so that I don’t misrepresent what Capstone is and is >> not. I don’t feel like I yet totally understand the issues behind that tweet. >> >> My message is going to be: “Disassembled instructions are a great feature to >> use when using machine learning models to classify malware. Results can vary >> based on what disassembler is used. I’ve found that a model based on features >> from a single pass disassembler like Capstone will produce slightly worse >> results than one based on IDA Pro disassembly. But the ease of use and >> repeatability of the results make it a better choice.” > > what do you mean by "single pass disassembler"? this is how all the > disassemblers work, not only Capstone. > > also, can you elaborate where IDA produces better result? > > keep in mind that IDA is a complicated tool which does a lot more than just > disassembling, why Capstone is designed to do just one simple thing: > disassemble the binary you feed it. more complicated process must be done by > your programs. > > >> >> Is the error in those statements referring to Capstone Engine as the >> disassembler? Should I be referring to LLVM MC as the disassembler and >> Capstone as the framework through which I used it? Is there some other >> problem that I don’t yet understand? > > Capstone is based on LLVM MC, but we go far beyond that: > http://www.capstone-engine.org/beyond_llvm.html > > let me know if you have more questions, thanks. > > Quynh > > ------------------------------------------------------------------------------ > > _______________________________________________ > Capstone-users mailing list > Cap...@li... > https://lists.sourceforge.net/lists/listinfo/capstone-users > |