Dear Günter,
On Wed, Dec 07, 2011 at 10:49:49AM +0000, Guenter Milde wrote:
> On 2011-12-06, Kirill Smelkov wrote:
> > On Tue, Dec 06, 2011 at 01:33:56PM +0000, Guenter Milde wrote:
>
> >> Actually, the issue is even more complex:
>
> >> test_dependencies carries this comment:
>
> >> # docutils.utils.DependencyList records relative URLs, not platform paths,
> >> # so use "/" as a path separator even on Windows (not os.path.join).
>
> >> while config.txt says:
>
> >> Path to a file where Docutils will write a list of files that the
> >> input and output depend on [#dependencies]_, e.g. due to file
> >> inclusion. [#pwd]_ The format is one filename per line. This
> >> option is particularly useful in conjunction with programs like
> >> ``make``.
>
> >> The mentioning of ``make`` suggests to me that "filename" in config.txt
> >> implies platform paths.
>
> > I could only second "filename" to be platform paths - only in such form
> > they are useful for ``make`` as you've said, and I've actually hit the bug
> > because --record-dependencies is used in my doc build system.
>
> > If you decide to change --record-dependencies into URLs, could you
> > please also continue to provide a way to still record dependencies as
> > native file paths.
>
> `Platform paths` seems the suitable format of the entries: According to
> the specification in config.txt as well as actual behaviour in the case
> of HTML export, DependencyList records files that were touched during the
> document conversion, i.e. "files to watch for updates".
>
> However:
>
> The comment in the test reflects current behaviour: some dependencies
> *are* stored in the List and written to the file as relative URLs
> because this is the format required by the "image" directive and the
> "stylesheet" configuration setting. This is masked by the fact that on
> Unix, a URL without "scheme" part and a platform-path use the same syntax
> for simple cases (no spaces or special chars).
>
> This means that if we agree on "use platform paths", we need to change
> the code in directives/images.py (and the test).
>
> Also, it would make things clearer, if record_dependencies.add() is called
> *after* reading the respective file. Then, no dependency is recorded in
> case of, e.g., IO errors (this requires a change of the test, too).
>
> The encoding used in the "record" file should be chosen so that ``make``
> works wherever it is available. (How do you put (or reference) the
> dependencies in the Makefile?)
First of all sorry for looong delay with replying and thanks for
choosing utf8/make approach together with Martin
(http://repo.or.cz/w/docutils.git/commitdiff/5fe99c434c93fb9e2fe7950b6db587df85e08e8d).
Regarding your question on how to integrate dependency tracking into
Makefile here is how I do it:
first, some defines::
# this is where we keep our aux files
__aux := $(top_objdir).aux/
# target .dep file location
@.dep = $(__aux)$@.dep
# this is where shared settings / include file / etc... reside
__inc := $(top_srcdir)/include/
# conf-for <file> # abc.tex -> abc.conf
conf-for = $(wildcard $(basename $1).conf)
then, there is a rule to run rst2something:
# rst .conf files for target
@.rst-confs = $(__inc)docutils.conf \
$(call conf-for,$@)
# run-rst2any <tool> ... ; use like $(call run-rst2any,rst2xetex)
run-rst2any = $1 \
$(@.rst-confs:%=--config %) \
--record-dependencies=$(@.dep).in \
$< $@ \
$(foreach conf,$(@.rst-confs),&& echo $(conf) >>$(@.dep).in) \
&& \
$(RSTDEPS) $@ < $(@.dep).in > $(@.dep) \
&& \
rm $(@.dep).in
Here dependencies are first dumped with --record-dependencies, and then
filtered by own RSTDEPS=$(top_srcdir)/tools/navy-rst-deps which is
below:
---- 8< ---- (tools/navy-rst-deps)
#!/usr/bin/env python
"""generate dependencies for docutils/rst targets"""
# Theory of operations:
#
# 1. rst2<smth> processes input file and generates (--record-dependency option)
# another file which tracks all read files
# 2. We rework this list so that make can read it
# Usage: navy-rst-deps target < dep-list > file.deps
import sys
def die(msg):
print >>sys.stderr, msg
sys.exit(1)
def main():
try:
target = sys.argv[1]
except IndexError:
die('E: target not specified')
print 'deps_%s := \\' % target
for line in sys.stdin.readlines():
# each line of dep-lists consists of a single file target depends on, e.g.
#
# ecu/ecu-intro.txt
# ecu/ecu-intro-pic.txt
# ecu/ns-out.txt
# ecu/ns-out-pkt.txt
dep = line.strip()
print '\t%s\t\\' % dep
print
print '%s: $(deps_%s)' % (target, target)
print
print '$(deps_%s):' % target
if __name__ == '__main__':
main()
---- 8< ----
so that the .dep file is directly sourceble by make and has not only
target dependencies, but also adds empty phony rules for each listed
dependency, so that when a files is removed/renamed, make won't say it
can't do the build becuase it doesn't know how to rebuild the dependency
file. (read more about it here:
http://oreilly.com/catalog/make3/book/ch08.pdf, "Tromey’s Way")
and then, near the Makefile end, we load all the generated
dependencies::
# list of dirs involved
# XXX uniq instead of sort (sort deduplicates, but changes order)
modules := $(sort $(dir $(docs)))
# ensure modules hierarcy always exists under .aux/
# we do it in one go, so this should be pretty fast...
$(shell mkdir -p $(addprefix $(__aux),$(modules)))
# load automatically generated dependencies
include $(wildcard $(addprefix $(__aux),$(addsuffix *.dep,$(modules))))
That's how it works.
~~~~
I know you use TeX, so maybe below info would be a bit useful too (think
of it as of my "thank you" for whole latex/xetext stuff). If not -
sorry...
---- 8< ---- (Makefile snippets)
XELATEX := xelatex
TEXFLAGS:= -halt-on-error -interaction nonstopmode
TEXSTDERR:= $(top_srcdir)/tools/navy-tex-stderr --output-encoding=auto
TEXDEPS := $(top_srcdir)/tools/navy-tex-deps
# texjob for target
@.texjob = $(basename $@)
# run-anytex <tool> ...
# XXX no TEXSTDERR on verbose builds?
run-anytex = set -o pipefail && \
TEXINPUTS="$(__inc):$(dir $<):$$TEXINPUTS" \
$1 $(TEXFLAGS) -jobname $(@.texjob) -recorder $< \
| $(TEXSTDERR) \
&& \
$(TEXDEPS) $@ < $(@.texjob).fls > $(@.dep) \
# show [n] or ' '
__show-tex-pass := $${__texpass:+[}$${__texpass:- }$${__texpass:+]}
quiet_cmd_xelatex = XELATEX $(__show-tex-pass) $@
cmd_xelatex = $(call run-anytex,$(XELATEX))
# run-anytex-tilldone <tool-name>
run-anytex-tilldone = \
@$(call __cmd,$1) && \
__texpass=1 && \
while true; do \
if ! grep -q "Rerun" $(@.texjob).log ; then \
break; \
fi && \
__texpass=`expr $$__texpass + 1` && \
$(call __cmd,$1); \
done
%.pdf : %.tex
$(call run-anytex-tilldone,xelatex)
---- 8< ----
---- 8< ---- (tools/navy-tex-deps)
#!/usr/bin/env python
"""generate dependencies for .tex files"""
# Theory of operation:
#
# 1. TeX processes input file and generates (-recorder option) .fls file which
# tracks all opended/created files for input/output/etc...
# 2. We filter out this list and rework it so that make can read it
# Usage: navy-tex-deps target-name < file.fls > file.deps
import re
import sys
# files we should always ignore
rex_ignore = re.compile('.*\.(log|aux|out|toc|lof|lot)$')
def die(msg):
print >>sys.stderr, msg
sys.exit(1)
def main():
if len(sys.argv) != 2:
die('E: please provide target-name')
target_argv = sys.argv[1] # job target (as passed on cmdline)
target = None # job target (as extracted from .fls, if present)
deps = [] # job target dependencies
for line in sys.stdin.readlines():
# each line of .fls file consists of action and argument, e.g.
#
# PWD /path/to/dir
# INPUT /path/to/file1.fmt
# INPUT job.tex
# OUTPUT job.log
# OUTPUT job.pdf
what, arg = line.split()
if what == 'INPUT':
if rex_ignore.match(arg):
continue
# filter out duplicates
if arg in deps: # XXX O(n)
continue
# TODO optionally skip system wide deps
deps.append(arg)
elif what == 'OUTPUT':
if rex_ignore.match(arg):
continue
if target is not None:
die('?: multiple targets -- %s and %s' % (target, arg))
target = arg
elif what == 'PWD':
# ignore PWD
pass # XXX is it ok?
else:
die('E: unknown action in fls file: %r' % what)
# XeTeX omits OUTPUT target in .log file
if target is None:
target = target_argv
# now it's time to emit dependecy list
if target is not None:
if target != target_argv:
die('E: target mismatch (%s vs %s)' % (target, target_argv))
print 'deps_%s := \\' % target
for dep in deps:
print '\t%s\t\\' % dep
print
print '%s: $(deps_%s)' % (target, target)
print
print '$(deps_%s):' % target
if __name__ == '__main__':
main()
---- 8< ----
---- 8< ---- (tools/navy-tex-stderr)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""Peek TeX warnings and errors to stderr (and omit other 'plain' output)"""
# Theory of operation:
#
# 1. TeX writes errors, warnings and overall progress on stdout
# 2. We parse it and output parts which are similar to errors and
# warnings to stderr, and optionally all the rest to stdout.
#
# see also:
#
# - http://www.tug.org/pipermail/pdftex/2009-March/007960.html
# - http://www9-old.in.tum.de/people/stulp/scripts/convertlatexwarnings.php
# - http://en.wikibooks.org/wiki/LaTeX/Errors_and_Warnings
# Usage: tex ... | navy-tex-stderr [-v]
import re
import sys
import tex.encodings
tex.encodings.activate()
# what kind of TeX is running
rex_texid = re.compile('^This is ([^\s]+), Version')
# errors & warnings start like this
rex_error = re.compile('^!')
rex_warning = re.compile('.*warning', re.I)
# XXX do we need 'LaTeX Font Info: No file T2Aptm.fd' on stderr?
# overfull/underfull something starts like this
rex_overfull= re.compile('^(Overfull|Underfull)')
# TeX output font identifier format,
# UC: \T2A/ptm/m/n/17.28
# UC: \OML/cmm/m/it/12
# UC: \T2A/cmtt/m/n/12
# UC: \EU1/LinuxLibertineO(1)/m/n/12
rex_font_id = re.compile(r'\\(\w{2,4})/[\w()]{3,}/\w{1,2}/\w{1,2}/[\d\.]+')
# {} TeX encoding -> corresponding system encoding
tex_sys_encodings = {
'T2A': 'cp1251', # T2A is almost like CP1251 (at least it is readable)
'OML': 'tex/oml', # TeX math italic
'EU1': 'utf8' # for XeTeX, XXX ok? see euenc package
}
# TeX uses this symbols for special node display
tex_symb_special = '$| []'
def print_err(line):
print >>sys.stderr, line,
class EOF(Exception):
pass
def read_next():
line = sys.stdin.readline()
if not line:
raise EOF
return line
# TeX prints messages with max_print_line=79 characters. So we detect
# end-of-message by last line being non-79 characters in length
max_print_line = 79
def read_msg_text():
res = []
while 1:
try:
line = read_next()
except EOF:
# if we've already read some input, it's better to give caller a
# chance to process it first
if res:
break
else:
raise
assert len(line) <= max_print_line+1
assert line[-1] == '\n'
line = line[:-1] # without newline
res.append(line)
if len(line) != max_print_line:
break
return ''.join(res)
def format_msg_text(text):
"""format text back after read_msg_text"""
res = []
while text:
res.append( text[:max_print_line] )
text = text[max_print_line:]
return res
keep_stdout = 0 # whether to keep TeX progress on stdout
out_encoding= None # output encoding for TeX text
def main2():
# default TeX encoding. ASCII for TeX, UTF-8 for XeTeX
default_tex_encoding = 'ascii'
while 1:
line = read_next()
# ! Undefined control sequence.
# l.816 \foobar
# {Blah-blah-blah
if rex_error.match(line):
# `print_err` in TeX always prints two-line help message through `help2`
print_err(line)
for i in range(2):
line = read_next()
print_err(line)
# Package Fancyhdr Warning: \headheight is too small (12.0pt):
# Make it at least 26.95602pt.
# We now make it that large for the rest of the document.
# This may cause the page layout to be inconsistent, however.
#
elif rex_warning.match(line):
# warnings always last till an empty line
print_err(line)
while 1:
line = read_next()
print_err(line)
if line.isspace():
# actually want to detect '\n', but windows...
break
# Overfull \hbox (3.76617pt too wide) in paragraph at lines 110--114
# \T2A/ptm/m/n/12 ния суд-ном (ко-раб-лем), сни-жа-ет массо-габаритные ха-рак-те-
# ри-сти-ки обо-ру-до-ва-ния, обес-пе-чи-ва-ет улуч-
elif rex_overfull.match(line):
# it seems, there is no reliably way to detect how much lines
# follows after first 'Overfull ...' line -- we just think
# end-of-message line is line with < 79-characters (max_print_line
# in tex.web) length
print_err(line)
msg = read_msg_text()
xmsg= msg
if out_encoding:
# TeX tells us its current internal encoding as something like
# '\T2A/ptm/m/n/12'
tail = msg
xmsg = u''
# current system encoding of TeX output
tex_encoding = default_tex_encoding
while 1:
m = rex_font_id.search(tail)
if m:
chunk = tail[:m.start()]
else:
chunk = tail
# handle previous chunk
# "normal" encoding
if not tex_encoding.startswith('tex/'):
chunk = chunk.decode(tex_encoding)
# for TeX encodings we have to be careful
else:
s = u''
for c in chunk:
if c in tex_symb_special:
# keep special TeX symbols (e.g. $ markers) in ASCII
s += c
else:
s += c.decode(tex_encoding)
chunk = s
xmsg += chunk
# time to give up?
if not m:
break
# put font markers as is
xmsg += tail[m.start():m.end()]
# remember new tex encoding
tex_encoding_ = m.group(1) # e.g. 'T2A' or 'EU1'
tex_encoding = tex_sys_encodings.get(tex_encoding_) # e.g. 'cp1251'
if tex_encoding is None:
tex_encoding = default_tex_encoding
print_err('W: unknown TeX encoding \'%s\' mapped to \'%s\'\n' %
(tex_encoding_, tex_encoding))
# and proceed
tail = tail[m.end():]
# output resulting message
for line_out in format_msg_text(xmsg):
if isinstance(line_out, unicode):
line_out = line_out.encode(out_encoding)
print_err(line_out)
print_err('\n')
print_err('\n')
# This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian)
# This is XeTeX, Version 3.1415926-2.2-0.9995.2 (TeX Live 2009/Debian)
elif rex_texid.match(line):
# XeTeX uses UTF-8 as default encoding
m = rex_texid.match(line)
texid = m.group(1)
if texid.lower() == 'xetex':
default_tex_encoding = 'utf-8'
# TeX reports it's progress. Lots of stuff
else:
if keep_stdout:
print line,
import os
import locale
from getopt import getopt, GetoptError
progname= os.path.basename(sys.argv[0])
def exithelp(msg):
print >>sys.stderr, msg
print "Try `%s --help' for more information." % progname
sys.exit(2)
def usage():
print 'Usage: tex ... | %s [OPTIONS]' % progname
print '\n'+__doc__
print \
"""
-e, --output-encoding= output encoding for TeX text (default as-is).
`auto' means autodetect from system defaults.
-v, --verbose also keep TeX progress on stdout.
-h, --help display this help and exit.
"""
def main():
global keep_stdout, out_encoding
try:
opts, args = getopt(sys.argv[1:], 'e:vh', ['output-encoding=', 'verbose', 'help'])
except GetoptError, e:
exithelp(e.msg)
for o, a in opts:
if o in ['-e', '--output-encoding']:
if a == 'auto':
a = locale.getdefaultlocale()[1]
out_encoding = a
if o in ['-v', '--verbose']:
keep_stdout = 1
if o in ['-h', '--help']:
usage()
sys.exit(0)
try:
main2()
except EOF:
pass
if __name__ == '__main__':
main()
---- 8< ----
---- 8< ---- (tools/tex/encodings/__init__.py)
import codecs
from tex.encodings import oml
tex_codecs_registry = {
oml.__codec_name__ : oml.getregentry
}
def tex_search_codec(name):
try:
getregentry = tex_codecs_registry[name]
except KeyError:
return None
return getregentry()
def activate():
"""activate support for all TeX encodings"""
codecs.register (tex_search_codec)
---- 8< ----
---- 8< ---- (tools/tex/encodings/oml.py)
"""OML encoding map"""
import codecs
from unicodedata import lookup as U
__codec_name__ = 'tex/oml'
def MIC(name): return U('MATHEMATICAL ITALIC CAPITAL '+name)
def MIS(name): return U('MATHEMATICAL ITALIC SMALL '+name)
def MI (name): return U('MATHEMATICAL ITALIC '+name)
def TODO(name): return u'?'
# OML -> unicode
decoding_table = u''.join((
MIC('GAMMA'), # 0x00
MIC('DELTA'),
MIC('THETA'),
MIC('LAMDA'),
MIC('XI'),
MIC('PI'),
MIC('SIGMA'),
MIC('UPSILON'), # XXX UPSILON1 ?
MIC('PHI'), # 0x08
MIC('PSI'),
MIC('OMEGA'),
MIS('ALPHA'),
MIS('BETA'),
MIS('GAMMA'),
MIS('DELTA'),
MI ('EPSILON SYMBOL'),
MIS('ZETA'), # 0x10
MIS('ETA'),
MIS('THETA'),
MIS('IOTA'),
MIS('KAPPA'),
MIS('LAMDA'),
MIS('MU'),
MIS('NU'),
MIS('XI'), # 0x18
MIS('PI'),
MIS('RHO'),
MIS('SIGMA'),
MIS('TAU'),
MIS('UPSILON'), # EPSILON?
MI ('PHI SYMBOL'),
MIS('CHI'),
MIS('PSI'), # 0x20
MIS('OMEGA'),
MIS('EPSILON'),
MI ('THETA SYMBOL'),
MI ('PI SYMBOL'),
MI ('RHO SYMBOL'),
MIS('FINAL SIGMA'),
MIS('PHI'),
TODO('harpoonleftup'), # 0x28
TODO('harpoonleftdown'),
TODO('harpoonrightup'),
TODO('harpoonrightdown'),
TODO('hookrightchar'),
TODO('hookleftchar'),
TODO('triangleright'),
TODO('triangleleft'),
TODO('zerooldstyle'), # 0x30
TODO('oneoldstyle'),
TODO('twooldstyle'),
TODO('threeoldstyle'),
TODO('fouroldstyle'),
TODO('fiveoldstyle'),
TODO('sixoldstyle'),
TODO('sevenoldstyle'),
TODO('eightoldstyle'), # 0x38
TODO('nineoldstyle'),
TODO('period'),
TODO('comma'),
TODO('less'),
TODO('slash'),
TODO('greater'),
TODO('star'),
MI ('PARTIAL DIFFERENTIAL'), # 0x40
MIC('A'),
MIC('B'),
MIC('C'),
MIC('D'),
MIC('E'),
MIC('F'),
MIC('G'),
MIC('H'), # 0x48
MIC('I'),
MIC('J'),
MIC('K'),
MIC('L'),
MIC('M'),
MIC('N'),
MIC('O'),
MIC('P'), # 0x50
MIC('Q'),
MIC('R'),
MIC('S'),
MIC('T'),
MIC('U'),
MIC('V'),
MIC('W'),
MIC('X'), # 0x58
MIC('Y'),
MIC('Z'),
TODO('flat'),
TODO('natural'),
TODO('sharp'),
TODO('slurbelow'),
TODO('slurabove'),
TODO('lscript'), # 0x60
MIS('A'),
MIS('B'),
MIS('C'),
MIS('D'),
MIS('E'),
MIS('F'),
MIS('G'),
# There is no 'MATHEMATICAL ITALIC SMALL H' in unicode
# http://unicode.org/mail-arch/unicode-ml/y2006-m04/0187.html
U('PLANCK CONSTANT'), # 0x68
MIS('I'),
MIS('J'),
MIS('K'),
MIS('L'),
MIS('M'),
MIS('N'),
MIS('O'),
MIS('P'), # 0x70
MIS('Q'),
MIS('R'),
MIS('S'),
MIS('T'),
MIS('U'),
MIS('V'),
MIS('W'),
MIS('X'), # 0x78
MIS('Y'),
MIS('Z'),
MIS('DOTLESS I'),
MIS('DOTLESS J'),
TODO('weierstrass'),
TODO('vector'),
TODO('tie'),
))
# unicode -> OML
# charmap_build does support tables with 0x100 entries only
encoding_table = dict( (ord(v),k) for k,v in enumerate(decoding_table) )
# copy-paste of what gencodec.py produces
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name=__codec_name__,
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
---- 8< ----
Thanks,
Kirill
|