Advertisement
Tag Archives: adobe

Adobe announces Source Code font

Today the Adobe Type team announced the release of their second open source type family, Source Code. This monospaced typeface is an adaptation of the open source type family released last month, Source Sans, which to date has been downloaded over 67,000 times from SourceForge. All six weights of Source Code and the source files can be downloaded from the Open@Adobe portal on SourceForge.

To learn more about the inspiration behind Source Code and how the design was adapted from Source Sans, refer to Paul Hunt’s Typblography post.

The Anvil Podcast: Malware Classifier

Rich: Adobe hosts a number of Open Source projects on SourceForge, in their Open@Adobe site. These projects are developed by Adobe employees, and I recently spoke with Karthik Raman, who has worked on a project called Malware Classifier. The Malware Classifier is a set of machine learning algorithms for identifying malicious vs. clean binaries for Win32 operating systems.

If you’d like to have your project featured on the SourceForge podcast, just drop me a note and we’ll schedule something.

If the embedded audio player below doesn’t work for you, you can download the audio in mp3 or ogg formats.

You can subscribe to this, and future podcasts, in iTunes or elsewhere, at http://feeds.feedburner.com/sourceforge/podcasts, and it’s also listed in the iTunes store.

Related Content:

Here’s my conversation with Karthik.

Rich: We’re talking about the Malware Classifier.

Karthik: Right. It’s a project up on the Adobe Open Source site, it’s called Malware Classifier.

Rich: Tell me something about the Malware Classifier. What is it trying to accomplish?

Karthik: It’s a tool that uses a machine learning algorithm to try to quickly determine whether a binary under analysis – a Win32 binary – is malware possibly, or a clean file. It uses four machine learning algorithms that were generated by running certain classifiers against a data set of about 100,000 malicious programs, and 16,000 clean programs. This is part of some research I did when I was a grad student at U.C. Irvine, and something I continued to do when I started working at Adobe about a year and a half ago. The tool released on SourceForge is a culmination of the research I did and it incorporates the distilled versions of four of the six classifiers that I used in my research.

Rich: When you say it uses a machine learning algorithm, does it have a feedback loop where you tell it whether it’s correct or not, and then it learns further from that, or it’s just based on the data that it’s already got?

Karthik: It’s more the latter. There’s no learning happening within the source code itself. It’s a result of training that happened in advance by training the classifiers against the data set that we discussed a minute ago. The results of that training was incorporated into the script. If you think about it, it really is simple. I’ve labelled the four algorithms that I’m using – the classifiers that I’m using. If you look at the python source code, there’s a bunch of decision trees that incorporate the learning that the algorithms experienced when they were training with the data set, and my hope is really that people will look at this stuff, and if they’re interested in machine learning and malware classification, either use the tool themselves, or extend it by running their own machine learning algorithm, and extending the current four set of classifiers or writing their own classifiers.

Rich: I’m curious if your research gave you any idea of how the classification of “malicious” might evolve over time. Would a tool like this run on today’s software work on software from six years ago, or six years in the future, do you think? Do the “malicious” techniques tend to persist over time?

Karthik: I have to take a couple of steps back and talk about what it is that these classifiers use to make a determination of whether something is possibly malicious, or something is clean. I used a technique called “feature reduction”. The end product of my research was that I identified seven features within the file format these binaries are compiled in – the file format is called the P.E., or portable executable format – essentially the values these seven features that are compared in a large decision tree in each of the four classifiers. The second aspect of your question is, would the classifiers be relevant to data or files from six years ago or files that are compiled in the future. I think in general, the problem in machine learning is there is a possibility of training our models to fit the data at hand – sort of over-fitting the problem, and I accept that’s a valid concern here as well because, for example, I only had 16,000 clean programs to train with, and I had 100,000 malicious programs, so you could argue that the algorithm learned that malicious programs all share the characteristics of of those 100,000 files, and clean programs all share only the characteristics of the 16,000 files. So there’s the disparity in the size of the data set, and also the specificity of the files that are used in this data set. The overall purpose of this project was to evangelize the idea that one could use machine learning and use it with a limited number of features to solve a problem within given parameters with an established false positive and true positive rate. I don’t expect this program to be used commercially, it’s just the idea that I’m trying to spread by the use of this tool.

Rich: Are you looking for a community of people to become involved in this project to move it forward, or is it pretty much done?

Karthik: I’ve given talks at a few conferences on this topic, and I’m hoping the community would look at that research and if they’re interested, pick it up. I’ve outlined the methods and techniques and given the background on how someone could be introduced into machine learning and follow the train of research that I did myself. So, yes, I’m hopeful that other people look at this and build on it themselves for their own environment. One example that comes to mind readily is, a lot of people work in research or analysis for I.T. companies. The application of research like this is that they could look at unknown binaries that their environments receive, and if their antivirus programs are in lag, they could train their models over time with the binaries in their environment, and then extent the script so that it grows for their particular environment. I am hopeful that the community picks up on this idea and goes to town with it.

There’s one aspect that I covered when I was speaking about this research at conferences – it’s that you need to be a domain expert in whatever domain you’re trying to apply machine learning in. There is the necessity that you understand what you’re doing when you’re trying to apply machine learning to that domain. So, I have some experience being a malware analyst, and that helped me along the way as I was determining which features to use in machine learning. I think in the end it comes back to looking at the research. There’s technical papers, there’s hundreds of slides that have been published, and there’s the source code that’s available, so I’m really hopeful that people out there who are keen on machine learning, which is, in my opinion, an underutilized technique in computer security in general, I’m hopeful that this research is at the vanguard of what people are interested in doing in the community and they look at the paper, the slides, and the tools, and build on it, and help make security better for everyone.

Rich: That’s really an interesting point. I think that people not involved with this field of programming tend to assume that you just sic the computer on things and it figures stuff out. It’s interesting that you point that out.

Karthik: This isn’t a panacea. You can make it fit to your problem, but you have to have some knowledge about the problem so that the solution can be brought to bear correctly.

Rich: Thanks so much for taking a few minutes to talk to me.

Karthik: My pleasure, Rich.

Adobe releases their first Open Source typefaces!

The SourceForge Anvil Podcast Rich: Today Adobe released a family of typefaces called Source Sans Pro. These typefaces were designed for user interfaces. These fonts are free. They’re released as Open Source, and they are released via the Open@Adobe website, which is hosted as SourceForge, along with many other Open Source projects that Adobe produces.

Paul Hunt, who is the designer of these fonts, is actually on vacation this week. But he generously accepted a phone call from me, and we talked a little bit about the fonts, and this Open Source project. Here’s that conversation.

If the embedded audio player below doesn’t work for you, you can download the audio in mp3 or ogg formats.

You can subscribe to this, and future podcasts, in iTunes or elsewhere, at http://feeds.feedburner.com/sourceforge/podcasts, and it’s also listed in the iTunes store.

Paul: What we’ve just released today was an Open Source typeface family that I’ve been working on for the past three years. And it’s a typeface family designed for user interfaces.

Rich: What font file formats will this be released in?

Paul: We released the fonts today in OpenType format with CFF outlines, and we also released the fonts in TrueType format as well.

Rich: What does it actually mean that a font is Open Source? What is the source that you’d be talking about in this case?

Paul: In a lot of cases I think the sources are simply the fonts. Usually, in the case of the Google web fonts, they also have some VFB files available. Adobe has a set of tools that we use for producing typefaces, and that set of tools is called the Adobe Font Development Kit for OpenType. It uses a set of files to compile the fonts. In our case, we have made all of the files that we used in the production of the Source Sans typefaces, we have made those Open Source. So if somebody is interested in following a similar work flow to what we used, they can do so because … I’m just going to refer to our toolkit as the SDK for the remainder of the interview … if people use our SDK tools, they can, because we make them freely available. They’re not Open Source, but anybody can download them and use them to produce fonts.

Rich: If somebody were to develop their own set of fonts using your tools, what kind of distribution mechanism are available?

Paul: I think it’s quite popular for people to distribute Open Source fonts through Googles “Google Web Fonts” directory. At Adobe, we have a partnership with SourceForge, so it made sense for us to go ahead and offer our fonts through that channel, although the fonts have also been put on Google Web Fonts, as well. The fonts went live everywhere today. They went live on SourceForge, they went live on Google Web Fonts, they went live on our own TypeKit service, as well as on other web font servers. A couple other places the fonts will be available – shortly will be in Google Docs, and in Google Presentations.

We did do a blog post on our Adobe TypBlography blog so if people are interested in more information about why we decided to make the fonts and those types of things they can visit our blog and get some more details that way.

Rich: Thanks so much for your time, and enjoy your vacation!