Still a work in progress, but I've begun writing a script to auto-detect the language of a file when Notepad++ is unable to by default. (Extension-less files, etc) At the moment, it has a few simple detections, all of which are configurable in the config file generated at config\py_autolang.cfg. I'll start putting together some documentation for the different options, but for now, you can learn more about each option by checking the method LanguageAutoDetector.__default_config. So far, it does the following detections:
Filename matching - Files named configure, for instance are automatically set to the bash language by default.
Partial filename matching - By default, Makefile\..+ (so Makefile.win32 for instance) will have their language set to makefile.
Shebang - Checks the first line for the familiar #!/usr/bin/env or #!/usr/bin/. It will match windows paths, linux paths, or even situations where there is no path, such as #!python.exe
Try_Shebang - If a premapped shebang does not exist, try using the program name as the language.
Contains_String - If a file contains a regular expression, set the language accordingly.
I still need to write code for some of the configuration values, such as the file_load_event options. (Right now, you'll need to run the script to add the callback, and then it will automatically try to detect on any file load. I'll change that soon to reflect the options set) I'm also going to integrate some caching so that the load time ends up a lot quicker.
importpickle,refromosimportpath__author__='Charles Grunwald (Juntalis) <cgrunwald@gmail.com>'def__setup_configobj__(deps,info):""" Setup configobj either in the lib folder or the local folder """fromosimportunlink,mkdirimportshutil# Download and unziptmpfile=deps.download(info['url'],info['filename'])tmpfolder=path.splitext(tmpfile)[0]deps.unzip(tmpfile,tmpfolder)unlink(tmpfile)modfolder=path.join(tmpfolder,'configobj-4.7.2')modpath=[path.join(modfolder,'configobj.py'),path.join(modfolder,'validate.py')](modpath,modconfigobj)=deps.install(modpath,'configobj')shutil.rmtree(tmpfolder,True)return(modconfigobj,modpath)""" Our dependencies """__dependencies__={'configobj':(__setup_configobj__,{'filename':'configobj.zip','url':'http://www.voidspace.org.uk/downloads/configobj-4.7.2.zip'})}classScriptDeps:""" Simple class to install any script dependencies we don't have at startup. """__modules__={}def__init__(self):deps=__dependencies__fordepindeps.keys():(setup,info)=deps[dep]self.__modules__[dep]=setup(self,info)defget(self,name):ifself.__modules__.has_key(name):returnself.__modules__[name]returnNonedefunzip(self,filename,dir):""" Extract zip file: filename to folder: dir. """importzipfile,osfromcStringIOimportStringIOzf=zipfile.ZipFile(filename)namelist=zf.namelist()dirlist=filter(lambdax:x.endswith('/'),namelist)filelist=filter(lambdax:notx.endswith('/'),namelist)# make basepushd=os.getcwd()ifnotpath.isdir(dir):os.mkdir(dir)os.chdir(dir)# create directory structuredirlist.sort()fordirsindirlist:dirs=dirs.split('/')prefix=''fordirindirs:dirname=path.join(prefix,dir)ifdirandnotpath.isdir(dirname):os.mkdir(dirname)prefix=dirname# extract filesforfninfilelist:try:out=open(fn,'wb')buffer=StringIO(zf.read(fn))buflen=2**20datum=buffer.read(buflen)whiledatum:out.write(datum)datum=buffer.read(buflen)out.close()except:importsyssys.stderr.write('Error while unzipping %s..\n'%filename)os.chdir(pushd)defdownload(self,url,filename):""" Download dependencies specified by dictionary __dependencies__"""fromurllibimporturlretrieveasdownloadfromtempfileimportgettempdirastempdir# Iterate through dependencies, downloading and moving.result=path.join(tempdir(),filename)download(url,result)returnresultdefinstall(self,modpath,modname):libdir=path.join(notepad.getNppDir(),'plugins','PythonScript','lib')modconfigobj=Nonetry:# install to pythonscript lib folder.formodfileinmodpath:shutil.move(modfile,libdir)except:# install to scripts/liblibdir=path.join(path.abspath(path.dirname(__file__)),'lib')ifnotpath.exists(libdir)ornotpath.isdir(libdir):mkdir(libdir)formodfileinmodpath:shutil.move(modfile,libdir)initfile=path.join(libdir,'__init__.py')ifnotpath.exists(initfile):initfile=open(initfile,'w')initfile.write('# Stub')initfile.close()modconfigobj=__import__('lib.%s'%modname)else:modconfigobj=__import__(modname)finally:modpath=path.join(libdir,'%s.py'%modname)return(modpath,modconfigobj)classSimpleCache(dict):""" Simple local cache. It saves local data in singleton dictionary with convenient interface Downloaded from http://code.activestate.com/recipes/577492-simple-local-cache-and-cache-decorator/ Author: Andrey Nikishaev License: GPL Copyright 2010, http://creotiv.in.ua """def__new__(cls,*args):ifnothasattr(cls,'_instance'):cls._instance=dict.__new__(cls)else:raiseException('SimpleCache already initialized')returncls._instance@classmethoddefgetInstance(cls):ifnothasattr(cls,'_instance'):cls._instance=dict.__new__(cls)returncls._instancedefget(self,name,default=None):"""Multilevel get function. Code: Config().get('opt.opt_level2.key','default_value') """ifnotname:returndefaultlevels=name.split('.')data=selfforlevelinlevels:try:data=data[level]except:returndefaultreturndatadefset(self,name,value):"""Multilevel set function Code: Config().set('opt.opt_level2.key','default_value') """levels=name.split('.')arr=selffornameinlevels[:-1]:ifnotarr.has_key(name):arr[name]={}arr=arr[name]arr[levels[-1]]=valuedefgetset(self,name,value):"""Get cache, if not exists set it and return set value Code: Config().getset('opt.opt_level2.key','default_value') """g=self.get(name)ifnotg:g=valueself.set(name,g)returngdefscache(func):defwrapper(*args,**kwargs):cache=SimpleCache.getInstance()fn="scache."+func.__module__+func.__class__.__name__+ \
func.__name__+str(args)+str(kwargs)val=cache.get(fn)ifnotval:res=func(*args,**kwargs)cache.set(fn,res)returnresreturnvalreturnwrapper# Try to import configobj module. If we cant, download it and set it up.try:importconfigobjhas_configobj=TrueexceptImportError:notepad.messageBox('Could not find module "configobj". Downloading and setting it up now..','Dependencies')deps=ScriptDeps()(configobj,configobj_path)=deps.get('configobj')has_configobj=configobjisnotNoneifhas_configobj:notepad.messageBox('Module "configobj" setup successfully.You can find it at:\n\n%s'%configobj_path,'Setup Successful')else:notepad.messageBox('Error: Could not import configobj.\nDownload at: http://www.voidspace.org.uk/python/configobj.html','Import Error')exit()classLanguageAutoDetector:""" Main class """__log=None__config=None__config_path=None__cache=None__detections=['filename','partial_filename','xml','shebang','try_shebang','contains_string'#'regex']__cache_ignore={'contains_string':None,'try_shebang':'shebang','partial_filename':'filename','regex':None}def__init__(self,config_file=None):config=self.__load_config(config_file)ifconfig['cache']['enabled']:self.__load_cache()#self.__log = logger.FileLogger()def__new__(cls,*args):ifnothasattr(cls,'_instance'):cls._instance=dict.__init__(cls,args)else:raiseException('LanguageAutoDetector already initialized')returncls._instance@classmethoddefgetInstance(cls):ifnothasattr(cls,'_instance'):cls._instance=dict.__new__(cls)returncls._instance@scachedefconfig(self,key=None,default=None):ifself.__configisNone:config=self.__load_config()else:config=self.__configifkeyisNone:returnconfigreturnself.__getdict(config,key,default)@scachedefcache(self,key=None,default=None):ifself.__cacheisNone:cache=self.__load_cache()else:cache=self.__cacheifkeyisNone:returncachereturnself.__getdict(cache,key,default)defset_lang(self,result,bufferID):ret=Falselang=self.__test_language(result)iflangisNone:ifself.config('errors.invalid_lang'):notepad.messageBox('Error: Specified language %s invalid.'%result,'Config Error')else:notepad.setLangType(lang,bufferID)ret=Truereturnretdefdetect(self,args):detections=self.config('detections.order')bufferID=args['bufferID']args['filename']=path.basename(notepad.getBufferFilename(bufferID))notepad.activateBufferID(bufferID)fordetectionindetections:func=getattr(self,'detection_%s'%detection.lower())result=func(args)console.write('Detection %s: %s\n'%(detection.lower(),result))ifresultisnotNone:ifself.set_lang(result,bufferID):break# filename# partial_filename# xml# shebang# try_shebang# contains_stringdefdetection_filename(self,args):filename=args['filename']config=self.config('detections.filename')forlanginconfig.keys():iffilenameinconfig[lang]:returnlangreturnNone@scachedefdetection_partial_filename(self,args):filename=args['filename']config=self.config('detections.partial_filename')forlanginconfig.keys():forpatterninconfig[lang]:rgx=re.compile("^%s$"%pattern,re.IGNORECASE)ifrgx.match(filename):# TODO: Cache here filename -> langreturnlangreturnNonedefdetection_xml(self,args):# Check for xml stuff.xml_config=self.config('detections.xml')xml_filename=args['filename']self.xml_lang=Nonedefcheck(contents,lineNumber,totalLines):val=contents.strip().lower()iflen(val)==0:return0else:forpatterninxml_config:pattern=pattern.lower()ifval.startswith(pattern):self.xml_lang='xml'ext=path.splitext(xml_filename)[1]iflen(ext)>0:xml_cache={ext:'xml'}# TODO: Cache herereturntotalLines-lineNumbereditor.forEachLine(check)returnself.xml_langdefdetection_shebang(self,args):shebang=self.__getshebang()ifshebangisNone:returnNoneconfig=self.config('detections.shebang')forlanginconfig.keys():forpatterninconfig[lang]:rgx=re.compile(r"(?:^#!((?:/[^\s]+/env(?:\.[a-z]+) |/[^\s]+/|[A-Z]:[^\s]*\\|[A-Z]:[^\s]*\\env\.[a-z]+ ?)?%s[\d.-_]*(?:\.[0-9a-z-_.])*)\b)\Z"%pattern,re.IGNORECASE)ifrgx.match(shebang):# TODO: Cache here shebang -> langreturnlangreturnNonedefdetection_try_shebang(self,args):shebang=self.__getshebang()ifshebangisNone:returnNonergx=re.compile(r"^#!(?:/[^\s]+/env(?:\.[a-z]+) |/[^\s]+/|[A-Z]:[^\s]*\\|[A-Z]:[^\s]*\\env\.[a-z]+ ?)?([^\s]+)[\d.-_]*(?:\.[0-9a-z-_.])*\b",re.IGNORECASE)match=rgx.search(shebang)ifmatch:result=match.group(1)lang=self.__test_language(result)iflangisNone:returnNonereturnresultreturnNonedefdetection_contains_string(self,args):text=editor.getText()config=self.config('detections.contains_string')forlanginconfig.keys():forpatterninconfig[lang]:rgx=re.compile("%s"%pattern,re.IGNORECASE|re.MULTILINE)ifrgx.search(text):# TODO: Cache here shebang -> langreturnlangreturnNonedefdetection_regex(self,args):passdef__getshebang(self):line=editor.getLine(0)ifline[0:2]=='#!':returnlinereturnNonedef__getdict(self,dict,name,default=None):"""Multilevel get function. Code: Config().get('opt.opt_level2.key','default_value') """ifnotname:returndefaultlevels=name.split('.')data=dictforlevelinlevels:try:data=data[level]except:returndefaultreturndatadef__load_config(self,cfg=None):""" Load configuration for script. If it doesn't exist, write the default configuration to file. """# Figure out config path.ifcfgisNone:cfg=path.join(notepad.getPluginConfigDir(),'py_autolang.cfg')self.__config_path=cfgifpath.exists(cfg):self.__config=configobj.ConfigObj(cfg)else:self.__config=self.__default_config()self.__save_config()returnself.__configdef__default_config(self,cfg=None):""" Default configuration for script. """# Figure out config path.ifcfgisNoneandself.__config_pathisNone:cfg=path.join(notepad.getPluginConfigDir(),'py_autolang.cfg')elifcfgisNoneandself.__config_pathisnotNone:cfg=self.__config_pathconfig=configobj.ConfigObj()config.filename=cfg# Main configconfig['script']={}config['script']['enabled']=Trueconfig['script']['autoload']=True# Errorsconfig['errors']={}config['errors']['invalid_lang']=True# Message box on invalid lexer specified.config['errors']['invalid_lexer']=True# Message box on invalid lang specified.# Script cacheconfig['cache']={}config['cache']['enabled']=Truecache_folder=path.abspath(path.join(path.dirname(cfg),'cache'))valid_folder=Falsewhilenotvalid_folder:ifpath.exists(cache_folder):ifnotpath.isdir(cache_folder):importstring,randomcache_folder+='-'+''.join(random.choice(string.ascii_uppercase+string.digits)forxinrange(3))else:valid_folder=Trueelse:fromosimportmkdirmkdir(cache_folder)config['cache']['folder']=cache_folder# Loggingconfig['logging']={}config['logging']['console']=Falseconfig['logging']['console_auto_open']=Falseconfig['logging']['file']=False# Loading eventconfig['file_load_event']={}config['file_load_event']['no_extension']=Trueconfig['file_load_event']['default_lexer']=Trueconfig['file_load_event']['always']=False""" Detection methods order - This tells what order to use detection methods. So if filename is executed before shebang, and filename matches a detection, the shebang line wont be run. To disable a detection method, set to 0. If any two methods have the same number, an error will be thrown. Possible detections: shebang try_shebang xml filename partial_filename contains_string regex - Not yet implemented """order=['filename','partial_filename','xml','shebang','try_shebang','contains_string']# Default Detections## Shebangs## These all match the follow regex pattern:### r"^#!(?:/[^\s]+/env(?:\.[a-z]+) |/[^\s]+/|[A-Z]:[^\s]*\\|[A-Z]:[^\s]*\\env\.[a-z]+ ?)?%s[\d.-_]*(?:\.[0-9a-z-_.])*\b" % keyshebang={# Shell## Bash/Korn Shell/C Shell/Z Shell/etc'bash':['(?:[czk]|ba)?sh'],# Python## CPython - http://python.org/## Cross Twine Linker (xtpython) - http://crosstwine.com/linker/python.html## Unpython Python to C compiler (unpython) - http://code.google.com/p/unpython/## IPython (ipython) - http://ipython.org/## PyPy - http://pypy.org/## Iron Python - http://ironpython.net/## Mozilla Embedded Python Console - http://www.thomas-schilz.de/MozPython/## TinyPy - http://www.tinypy.org/## Snipy - Personal project, you can remove if you want.## Enthought SciPy distribution - http://www.enthought.com/## Jython (jython) - http://www.jython.org/## Cython (Optimizing Python to C Compiler) - http://cython.org/## Typhon (typhon) - https://github.com/vic/typhon## Mython (mython) - http://mython.org/# Languages Close Enough to Python """## Nimrod - http://force7.de/nimrod/download.html## Serpent - http://sourceforge.net/projects/serpent/## Boo - http://boo.codehaus.org/'python':['(?:xt|un|i)?pythonw?','(?:py|i|moz|tiny|sni)pyw?(?:-c)?','epdw?','[jctm]ythonw?','(?:nimrod|ser?pent)','boo(?:c|i|ish)?'],# Perl## Perl - http://www.perl.org/## Parrot - http://parrot.org/### Hm, this one might be hard. Leaving parrot as perl for now'perl':['w?perl','parrot'],# Ruby## Ruby - http://www.ruby-lang.org/en/## Iron Ruby (ir, etc) - http://ironruby.net/## JRuby - http://jruby.org/## Ruby on Rails - http://rubyonrails.org/'ruby':['[ji]?[ie]r[wbi]{0,2}?(?:_swing)?','[ej]?ruby[wc]?','rake'],# Javascript## Node.Js - http://nodejs.org/## Narwhal - https://github.com/tlrobinson/narwhal## JSDB - http://www.jsdb.org/## Ringo Javascript - http://ringojs.org/## GlueScript - http://gluescript.sourceforge.net/## Rhino - http://www.mozilla.org/rhino/'javascript':['(?:node|npm)','(?:narwhal|tusk)','(?:jsdb|ringo|gluew?)','(?:rhino|js)'],# PHP'php':['(?:i?php(?:-cgi|-cli|-win)?|pharc?)']}filename={'bash':['configure'],'makefile':['Makefile']}partial_filename={'makefile':['Makefile\..+']}contains_string={'bash':['^mk_add_options','^ac_add_options']}config['detections']={'order':order,'filename':filename,'partial_filename':partial_filename,'xml':['<?xml ','<!DOCTYPE'],'shebang':shebang,'contains_string':contains_string}returnconfigdef__save_config(self):self.__config.write()def__load_cache(self):config=self.config()folder=config['cache']['folder']cache={}fordetectioninconfig['detections']['order']:ifdetectioninself.__cache_ignore:ifself.__cache_ignore[detection]isnotNone:detection=self.__cache_ignore[detection]ifcache.has_key(detection):continueelse:continuef=path.join(folder,detection)ifpath.exists(f)andpath.isfile(f):input=open(f,'rb')cache[detection]={'changed':False,'value':pickle.load(input)}input.close()else:cache[detection]={'changed':False,'value':None}self.__cache=cachereturnself.__cachedef__save_cache(self):config=self.config()folder=config['cache']['folder']saved=[]fordetectioninconfig['detections']['order']:ifdetectioninself.__cache_ignore:ifself.__cache_ignore[detection]isnotNone:detection=self.__cache_ignore[detection]ifdetectioninsaved:continueelse:continuef=path.join(folder,detection)cache=self.cache(detection)ifcache['changed']andcache['value']isnotNone:output=open(f,'wb')pickle.dump(cache['value'],output)output.close()self.__cache[detection]['changed']=Falsesaved.append(detection)def__test_lexer(self,lexer):result=Trueold_lexer=editor.getLexerLanguage()editor.setLexerLanguage(lexer)ifeditor.getLexerLanguage()=='null':result=Falseeditor.setLexerLanguage(old_lexer)returnresultdef__test_language(self,lang):importNpplang=lang.upper()result=Nonetry:result=getattr(Npp.LANGTYPE,lang)exceptAttributeError:result=Nonereturnresultimportsyssys.stdout=consoledefdetect_test(args):console.write('Language: %s\n'%notepad.getLangType(args["bufferID"]).__str__().lower())detector=LanguageAutoDetector()notepad.clearCallbacks([NOTIFICATION.FILEOPENED])notepad.callback(detector.detect,[NOTIFICATION.FILEOPENED])
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I also made a plugin for automatic language selection based on shebang, mode line ("vi" style), filename/filepath, file content (XML header…). It supports user defined languages, setting tabstops and indentation modes, etc. You can find it here https://sourceforge.net/projects/npppythonplugsq/files/Modeline%20Parser/.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Still a work in progress, but I've begun writing a script to auto-detect the language of a file when Notepad++ is unable to by default. (Extension-less files, etc) At the moment, it has a few simple detections, all of which are configurable in the config file generated at config\py_autolang.cfg. I'll start putting together some documentation for the different options, but for now, you can learn more about each option by checking the method LanguageAutoDetector.__default_config. So far, it does the following detections:
Filename matching - Files named configure, for instance are automatically set to the bash language by default.
Partial filename matching - By default, Makefile\..+ (so Makefile.win32 for instance) will have their language set to makefile.
Shebang - Checks the first line for the familiar #!/usr/bin/env or #!/usr/bin/. It will match windows paths, linux paths, or even situations where there is no path, such as #!python.exe
Try_Shebang - If a premapped shebang does not exist, try using the program name as the language.
Contains_String - If a file contains a regular expression, set the language accordingly.
I still need to write code for some of the configuration values, such as the file_load_event options. (Right now, you'll need to run the script to add the callback, and then it will automatically try to detect on any file load. I'll change that soon to reflect the options set) I'm also going to integrate some caching so that the load time ends up a lot quicker.
Anyways, any feedback would be appreciated.
Eh, cant seem to edit my post. Just realized I left a few lines in. You may want to get rid of the lines:
I also noticed that trying to copy the script and paste it into a file seems to mess up the newlines, so I threw it up on pastebin with the above fix:
http://pastebin.com/SWG3JuAT
I also made a plugin for automatic language selection based on shebang, mode line ("vi" style), filename/filepath, file content (XML header…). It supports user defined languages, setting tabstops and indentation modes, etc. You can find it here https://sourceforge.net/projects/npppythonplugsq/files/Modeline%20Parser/.