You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(4) |
Oct
|
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2003 |
Jan
|
Feb
|
Mar
|
Apr
(6) |
May
(4) |
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
|
Dec
|
| 2004 |
Jan
|
Feb
(1) |
Mar
(14) |
Apr
(71) |
May
(93) |
Jun
(17) |
Jul
(62) |
Aug
(25) |
Sep
(42) |
Oct
(43) |
Nov
(51) |
Dec
(70) |
| 2005 |
Jan
(15) |
Feb
(20) |
Mar
(1) |
Apr
(29) |
May
(18) |
Jun
(27) |
Jul
(71) |
Aug
(114) |
Sep
(89) |
Oct
(177) |
Nov
(107) |
Dec
(54) |
| 2006 |
Jan
(165) |
Feb
(111) |
Mar
(138) |
Apr
(99) |
May
(78) |
Jun
(85) |
Jul
(104) |
Aug
(97) |
Sep
(276) |
Oct
(313) |
Nov
(65) |
Dec
(35) |
| 2007 |
Jan
(21) |
Feb
(143) |
Mar
(136) |
Apr
(174) |
May
(99) |
Jun
(11) |
Jul
(225) |
Aug
(54) |
Sep
(118) |
Oct
(44) |
Nov
(31) |
Dec
(9) |
| 2008 |
Jan
(1) |
Feb
(1) |
Mar
(21) |
Apr
|
May
(23) |
Jun
|
Jul
(3) |
Aug
(4) |
Sep
(5) |
Oct
|
Nov
(1) |
Dec
|
| 2009 |
Jan
(1) |
Feb
(7) |
Mar
(38) |
Apr
(75) |
May
(12) |
Jun
(34) |
Jul
|
Aug
(6) |
Sep
(20) |
Oct
|
Nov
(13) |
Dec
(1) |
| 2010 |
Jan
(26) |
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
(3) |
Oct
(140) |
Nov
(88) |
Dec
(63) |
| 2011 |
Jan
(41) |
Feb
(35) |
Mar
(31) |
Apr
(20) |
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(11) |
Nov
(30) |
Dec
(8) |
| 2012 |
Jan
(28) |
Feb
(35) |
Mar
(27) |
Apr
(55) |
May
(57) |
Jun
(23) |
Jul
(18) |
Aug
(86) |
Sep
(20) |
Oct
(16) |
Nov
(65) |
Dec
(59) |
| 2013 |
Jan
(65) |
Feb
(77) |
Mar
(51) |
Apr
(16) |
May
(46) |
Jun
(19) |
Jul
(18) |
Aug
(4) |
Sep
(18) |
Oct
(18) |
Nov
(25) |
Dec
(38) |
| 2014 |
Jan
(71) |
Feb
(48) |
Mar
(32) |
Apr
(6) |
May
(17) |
Jun
(2) |
Jul
(1) |
Aug
(82) |
Sep
(40) |
Oct
(2) |
Nov
(8) |
Dec
(5) |
| 2015 |
Jan
(4) |
Feb
(12) |
Mar
(23) |
Apr
(12) |
May
|
Jun
(4) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: john s. <sk...@us...> - 2014-09-16 09:27:15
|
On 16/09/2014, at 6:34 PM, srean wrote: > You can keep using std:strings but avoid creating intermediate C style char[ ] strings. One could use ifstream or ofstream to get a std:string directly from the OS buffer (their rdbuff interface one can go pretty low level), but mixing that with FILE* handle would be messy but do able. Console I/O would work too if one swaps the rdbuff of cin or cout with that of ifstream or ofstream But you're only talking about one function (reading a line). There are other functions that require interfacing: RE2, SDL, etc etc. -- john skaller sk...@us... http://felix-lang.org |
|
From: john s. <sk...@us...> - 2014-09-16 08:25:02
|
On 16/09/2014, at 5:14 PM, srean wrote: > Oh I was aiming much lower, like eliminating copies of C strings to std::string. Well, varray[char] could be a good base. I could implement SubArray (in general). Varrays can't move or be resized. They're variable length up to a bound. Pass by address, mutable. SubArray would just be varray with start and end pos. Or start and length. There is a cost, getting either a C string in a stack buffer, or, a C++ string onto the heap. However c_str() already pays that price for C++ strings and already returns a varray[char] . The way arrays and pointers work in Felix is still too confusing. Even I can't figure it out :) -- john skaller sk...@us... http://felix-lang.org |
|
From: john s. <sk...@us...> - 2014-09-16 07:03:44
|
On 16/09/2014, at 3:24 PM, srean wrote:
>
> I think there are enough extra copies and other overheads that can be removed to beat Python at it. Note the C++ code itself is twice or more faster than the Python code, so passing the buck to C++ wont help.
I haven't seen the C++ version of it.
> One neednt allocate that list upfront although that turned out to be faster.
> There's got to be a way to make Felix yield competitive.
Of course there is, but it doesn't involve copying strings about.
Since Felix is a pass by value language, that seems inevitable,
even with some inlining, if one is using higher level functional
operations like split.
RE2's StringPiece would help if the base char array can be made
to persist. Its basically a struct { length, char const* } thing.
But that raises another deficiency in Felix. When you have a Felix native
structure containing pointers, you can nest it in another Felix native
structure. The compiler traverses the tree when building the array
of pointer offsets for the top level type.
There's no way to do this for any C data type *except* a type that
is already a pointer, you can label that like:
_gc_pointer type fred = "fred*";
Now, there is a way to model a *complete* C data type without
knowing the offsets:
type fat = "fat" scanner "fred_scanner";
The scanner is a C function that finds all the offsets.
This is how Judy Arrays are integrated into Felix,
with a custom scanner.
However, the scanner has to be applied to *pointer* onto
the heap. You cannot actually put one of these objects
into another Felix object because the compiler doesn't
know how to find the offsets. The run time system does,
via the scanner function, but that's no use.
The compiler generates a single offset array for all data types
unless there's a custom scanner. In fact, the compiler calls
a "standard custom scanner" and passes it the offset array.
I think the function is called "scan_by_offsets" :)
So now the point is for a StringPiece, if implemented
in C++ (as the RE2 one is), the fact I know where the
contained char* is doesn't help. I can make a custom scanner
for it, but then all the StringPiece have to be whole objects.
Technically: either "on the machine stack" or "whole objects
on the heap". Conservative scan takes care of the stack case.
There are some ways to fix this: one is to represent a data structure
type at run time not with a single flat RTTI object but recursively.
In other words, a "struct" with three fields would be represented
by an array of three pointers to the field types. At run time a recursive
descent can find all the offsets. This is obviously better because it
makes run time type construction a breeze. However the downside
is that the scanning for offsets would be slower.
The bottom line is that if I want string pieces .. I have to implement
them in Felix.
--
john skaller
sk...@us...
http://felix-lang.org
|
|
From: john s. <sk...@us...> - 2014-09-16 05:15:56
|
On 16/09/2014, at 1:57 PM, srean wrote:
> Now I am totally lost, sorry, could you do a redux (to a 5 yr old).
Just see the next post. It's already implemented, I just didn't realise :)
> I have a vague suspicion that this has been provoked by Ryan's code,
Actually, I was looking at a rope made using RAList (purely functional
random access list, already coded in the library).
The problem is you can't pattern match on it like an ordinary list.
I'm so used to using pattern matching now I can't abide using a language
without it. For example Scheme is used to make the parser, it has no pattern
matching and it sucks doing anything in it which involves translating
S-expressions: I would need an arcane list of comparators applied
like (if (cond) (if (cond) .... to match a term and that is beyond my small
brain.
So I want to bed worrying about how to make a rope and dreamed
up the general requirements for pattern matching. I think it's
a categorical adjunction but I'm not sure: you basically have to have
a partial inverse function, eg:
Uncons (Cons (a,b)) = (a,b)
Uncons isn't an inverse for lists: it only "inverts" a list made by Cons.
So if your data type has N constructors you need N matching
destructors to take them apart and get the constructor argument
back.
So fine, but you need to know which constructor was applied,
for a standard variant type its in the value as a integer tag,
and so a match just does a test on the tag to see which
destructor to apply, and how to cast the resulting argument
to the right type.
So basically for a constructor
Ctor: A -> U
you need two functions for union type U:
_match_ctor_Ctor: U -> bool // match checker
_ctor_arg_Ctor : U -> A // extractor
where the understanding is it is only safe to apply the extractor if the
checker returns true. SO basically a match:
match u with
| Ctor ?a => ...
is implemented by:
if _match_ctor_Ctor (u) then
var a :A = _ctor_arg Ctor (u)'
....
elif ...
This is actually implemented already!
For an actual Felix union type
the only difference is the compiler doesn't actually put the
_match_ctor and _ctor_arg functions into the symbol table, it just
generates their definition bodies "on the fly". In fact it USED to put
them into the symbol table (which made the symbol table very big).
Since union is generally:
struct _uctor_ { int variant; void *data; }
the match checker just compares variant with an integer
constant corresponding to the index of the constructor
in the union definition. Eg in a list, Empty=0, Cons=1.
And the argument extractor is just a cast and deref:
*(A*)u.data
where T is the argument type. The argument of Cons is a tuple
T * list[T]
> then its good, because i would rather have that tied up neatly before moving to something else. That Python is 3X faster is an unpardonable affront :)
Not really. Python has very good string handling, it evolved over a long time,
and being an actual scripting language, if the string operations are fast
enough the cost of interpreting bytecode is small compared to the string ops,
even with dynamic typing.
Felix can do it faster than Python, but the simple idioms, stuff like
iterators, for example, is coded with the constraint of using C++
underneath, string class in particular, and wanting to be easy
to code. The easy to code stuff isn't necessarily fast and its hard
to optimise because the compiler doesn't know what a string is.
Python doesn't either! So its no excuse. However Python
is object based and strings are immutable so passing them
around has a low cost. C++ strings aren't so the cost is quite high.
Gcc uses COW but that is banned now and not allowed by the
Standard, although now we have move constructors which should
copy rvalue strings at zero cost (i.e. constant time).
So without some work, things like split are going to be expensive
because they return 2 strings from one, so you not only get two new
strings, if you're not careful they get copied in the process of returning
them. And a string will get copied when passing it as an argument too.
This is a really good reason NOT to use eager evaluation!!
It's likely to say:
parameter = argument
which will copy argument to parameter (even if the argument is
already a single variable). Of course in Felix the parameter is then
stored into the functions class as a variable .. that's ANOTHER copy.
You can see why lazy evaluation and inlining are vital for performance:
it elides a lot of copying.
Still, Felix "style" more or less dictates pass by value and don't worry about it.
When the underlying types are expensive to copy this is going to rub off
if they can't do it smartly: C++ rvalue binders and thus move constructors
should help to fix this to some extent. And they had better, since now
COW strings aren't allowed.
--
john skaller
sk...@us...
http://felix-lang.org
|
|
From: john s. <sk...@us...> - 2014-09-16 00:58:20
|
In case you don't realise what this can mean here's an example: Consider this: const stl_npos: size = "::std::string::npos"; fun stl_find: string * string -> int = "(int)$1.find($2)" is cast; fun find (s:string, e:string) : opt[int] => match stl_find (s, e) with | ?i when i.size == stl_npos => None[int] | ?i => Some i endmatch ; The first function is fast (in that it is a direct call of the C++) but it is unsafe, because you might forget to check for stl_npos (std::string::npos in C++). The second function is safe: you can't forget because the union "opt" will not let you. But it is slow because it has to heap allocate its argument, because the compiler doesn't know what an "int" is. Here is code for a version of this which is safe AND fast: type pos = "int"; fun _match_ctor_Npos: pos -> bool = "$1==npos"; fun _match_ctor_Pos : pos -> bool = "$1!=npos"; fun _ctor_arg_Pos: pos -> int = "(int)$1"; fun stl_find: string * string -> Pos = "(int)$1.find($2)" is cast; The only trick is to ensure we don't get lazy evaluation here! We don't want you evaluate the find function 3 times! But the compiler has already taken care of that as it happens: a match argument is assigned to variable before doing the big switch. -- john skaller sk...@us... http://felix-lang.org |
|
From: john s. <sk...@us...> - 2014-09-16 00:38:59
|
On 16/09/2014, at 5:53 AM, john skaller wrote:
> One of the things which has been puzzling me is how to
> pattern match abstract data types.
OMG!
I should document Felix better!!
It's already implemented! Has been for years.
For primitives only, but still ...
/////////////////
ctor A : int = "$1";
ctor int : A = "$1";
var a : A = A(42);
println$ a.int;
fun _match_ctor_Even : A -> bool = "$1%2==0";
fun _match_ctor_Odd : A -> bool = "$1%2==1";
fun _ctor_arg_Even : A -> int = "$1/2";
fun _ctor_arg_Odd : A -> int = "$1/2";
proc decode (a:A) {
match a with
| Even ?x => println$ "Even " + x.str;
| Odd ?x => println$ "Odd " + x.str;
endmatch;
}
decode 42.A;
decode 43.A;
/////////////////
~/felix>flx --test=build/release ta
42
Even 21
Odd 21
////////////////////////////
I don't know if my code handles polymorphic types. But still .. instead of
using isNULL on a pointer .. which is risky ..
match p with
| NULL =>
| nonNULL ?q =>
endmatch
can be implemented!
All I have to do is allow this mechanism for a wider range of types;
at the moment it has to be a nominal type, and an primitive.
[Pattern matching on unions, tuples, records, etc is handled by
the compiler]
In fact, you can do it now, as above, for ANY type. by just giving
it a primitive model (eg A is a primitive model for int, which just
happens to be primitive also but that's irrelevant).
The only problem with the modelling method is it won't nest
so I really do have to expand the allowable range of types
to at least include abstract types (ones created with
type fred = new joe)
--
john skaller
sk...@us...
http://felix-lang.org
|
|
From: john s. <sk...@us...> - 2014-09-15 19:54:51
|
One of the things which has been puzzling me is how to pattern match abstract data types. This puzzle reawakened by considering RAList. An RAList, like a list, supports functions: empty () cons (elt, list) and it supports isempty (list) head (list) when not isempty tail (list) and also uncons = head,tail. Now the secret of list pattern matching is that empty () => 0, () cons (elt,list) => 1, (elt,list) in other words, the constructors Empty = 0 Cons = 1, and the terms constructed are just tuples: constructor, argument so a pattern match says: if run time constructor tag = 0, then empty() was called, argument is () = 1, then cons was called, so argument is (elt, list) and given the argument is at a known offset just after the tag, we can just cast it, and we have got the function's argument out. Clearly, nested patterns are handled by successive application of the above mechanism. So carefully note that the constructor tag is basically the selector for a particular projection function to extract the arguments from the expresion tuple (i.e. its the second projection, which depends on the result of applying the first projection). For a tuple (rather than a variant), the projections are statically known. However, Felix doesn't implement it quite like this. Instead it has two functions: a match comparator, and a match handler. The match handler is user code but includes the match extractor, the bit that, once you know the type, applies the projection chains needed to set the match variables used by the match handler. Now, my critical observation is this: we can rewrite this to if isempty(arg) then () else if not isempty(arg) then var elt,list = uncons (arg) Let's be more precise: suppose for some ABSTRACT data type T, we have constructor FUNCTIONS C0 : A0 -> T C1: A1 -> T ,,, and we have corresponding match checker functions IS0 : T -> bool IS1 : T -> bool ... and we have also corresponding extractors X0 : T -> A0 X1: T -> A1 ... then we can do a "general" decode of the abstract data type like: if IS0 a then H0 (X0 a) elif IS1 a then H1 (X1 a) ... where H0, H1, .. is the user handler code in the match branch, which is a function accepting a tuple of the type of the argument of the constructor C0, C1, etc. So the critical thing is we must list for the abstract data type these NAMED triples: N0: C0, IS0, X0 N1: C1, IS1, X1 The types must be as above: C0: A0 -> T, X0: T -> A0 (precondition IS0 T), however there's more: clearly: a = X0 (C0 a) In other words, X0 has to "undo" the effect of applying C0 to get back the argument to C0. So any old functions, even of the right types, aren't good enough. I have a gut feeling this is a categorical adjunction. Why the names? Obviously so we can now write: match expr with | N0 (?a0) => ... | N1 (?a1) => ... ... The match checker applied is ISi for name i, and the extractor Xi is used if the checker succeeds. Both the checker and extractors are ordinary functions so composition, that is, nested patterns, will work with the existing algorithm. In fact for pattern matching, we do not require the C0 component, however it could be useful as follows: define N0 (a) by C0 (a) In other words, make Ni a synonym for the function Ci so it looks like a other union types: we have a constructor name Ni which can be pattern matched. Note arbitrary abstract types might not admit this representation. Note that if you have an object with accessor functions, one could always provide the extractor (and checker for the precondition), and the name, but no constructor function. Conversely, one could provide a constructor with no checker or extractor. For example we might provide match 1.0 + 2.0 i with | Imag ?y => // imaginary part .. Here there's no constructor because we don't know the real part: the extractor loses it (assuming the checker was universally true, rather than checking for a pure imaginary!) On the other hand this looks cool: match z with | Polar (?angle, ?length) => ... The constructors don't have to be disjoint, nor the destructors. SO: I think I have found a good way to generalise pattern matching. -- john skaller sk...@us... http://felix-lang.org |
|
From: john s. <sk...@us...> - 2014-09-15 08:11:15
|
On 15/09/2014, at 3:31 PM, srean wrote:
> I think rope would be an overkill for simple cases of concatenation.
But not for complex ones: I changed this from concatenation to
using += and reserve .. but the reserve is a guess and I haven't done
this everywhere.
method fun make_frame (out:string) :string = {
var o = "";
reserve(&o,10000+out.len.int);
var h = #(d.heading.get_headings);
o+=menu_js;
o+=menu_style;
o+=#(d.heading.emit-js);
o+=#(d.button-factory.get-jscript);
o+=#(d.fileseq.get-jscript);
// Whole page defaults and background
o+='<div style="background-color:'+page_colour+'; font-family:sans-serif; color:'+text_colour+'">';
// LEFT MARGIN
// fixed top nav bar
o+='<div style="position:fixed; top:10px; left:10px; width:'+
(toc_width - 10).str+'px; height:30px;' +
' background-color:'+frame_colour+'; padding:4px;'+
' border-top-left-radius:10px; border-top-right-radius:10px">';
o+='</div>\n';
// left border
o+='<div style="position:fixed; top:40px; bottom:40px; left:10px; width:4px;' +
' background-color:'+ frame_colour+ ';">';
o+='</div>\n';
// Contents body
o+='<div style="position:fixed; top:48px; bottom:48px; left:14px; '+
' width:'+(toc_width - 10).str+'px;'+
' border:4px;overflow:auto;'+
' font-family:sans-serif; color:#404040; background-color:#70E0E0;">\n';
proc head1(s:string,i:int) {
o+= """<div class=m1 onclick="mshow('menu"""+ counter.str+"""','#"""+s+"""_h')" """;
o+=" onmouseover='hilite(this)' onmouseout='leave1(this)' >";
o+= s+"</div>\n";
o+= """<div class=sm id=menu"""+counter.str+""">\n""";
}
proc foot1() { o+="</div>\n"; }
proc break1(s:string,i:int) {foot1(); ++counter; head1(s,i); }
var counter = 0;
iter proc (i:int,s:string)
{
//println$ i,s;
if i == 1 do
if counter == 0 do ++counter; head1 (s,i);
else break1 (s,i);
done
elif i == 2 do
o+="<div class=m2 onmouseover='hilite(this)'";
o+=" onmouseout='leave2(this)'";
o+=""" onclick='window.location.replace("#"""+s+"""_h")'>"""+s+"""</div>\n""";
done
}
h
;
if counter > 1 do foot1; done;
o+="<script>counter_max="+counter.str+";</script>\n";
// right border
o+='<div style="position:fixed; top:40px; bottom:40px; left:'+(toc_width + 4).str+'px; width:4px;' +
' background-color:'+ frame_colour+ ';">';
o+='</div>';
// fixed bottom nav bar
o+='<div style="position:fixed; bottom:10px; left:10' +
'px; width:'+(toc_width - 10).str+'px; height:30px;' +
' background-color:'+frame_colour+';padding:4px;'+
' border-bottom-left-radius:10px; border-bottom-right-radius:10px"\n>';
o+='</div>\n';
// MAIN CONTENT
// fixed top nav bar
o+='<div style="position:fixed; top:10px; left:'+(toc_width+10).str+'px; right:10px; height:30px;' +
' background-color:'+frame_colour+'; padding:4px;'+
' border-top-left-radius:10px; border-top-right-radius:10px">';
o+=#(d.heading.emit-buttons);
o+=#(d.fileseq.shownav);
o+='</div>\n';
// left border
o+='<div style="position:fixed; top:40px; bottom:40px; left:'+(toc_width+10).str+'px; width:4px;' +
' background-color:'+ frame_colour+ ';">';
o+='</div>';
// body
o+='<div style="position:fixed; top:48px; bottom:48px; left:'+(toc_width+14).str+
'px; right:14px; padding:8px;'+
' border:4px;overflow:auto; font-family:sans-serif;'+
' color:'+text_colour+'; background-color:'+page_colour+';">';
o+=out;
o+='</div>\n';
// right border
o+='<div style="position:fixed; top:40px; bottom:40px; right:10px; width:4px;' +
' background-color:'+ frame_colour+ ';">';
o+='</div>';
// fixed bottom nav bar
o+='<div style="position:fixed; bottom:10px; left:'+(toc_width+10).str+
'px; right:10px; height:30px;' +
' background-color:'+frame_colour+';padding:4px;'+
' border-bottom-left-radius:10px; border-bottom-right-radius:10px"\n>';
o+=#(d.fileseq.shownav);
o+='</div>\n';
o+='</div>'; // whole page end
return o;
}
> Concatenation using the "grow by 1.x times and copy" would be half decent, but only half decent. Isn't there a "+= "operator for C++ strings or an "append" that does that already. One can go the expression templates route, but again that is more complicated.
Actually it isn't:
reduce concat3 (x:string, y:string, z:string) : x + y + z => cat3 (x,y);
That will reduce
x + y + z + a + b => cat3 (cat3 (x,y,z),a,b);
which is a lot better than binary. Slows down the compiler though.
> fgets interface leads to copying _at_least_ twice
>
> (i) copying from the buffer in C's stdlib to the buffer provided to fgets as an arg
> ii) copying the contents of the arg for concatenation
No, (ii) should be "making a C++ string from a C string".
Any concatenation or whatever is on top of that again.
>
> If one uses raw read then one can reduce one copy,
How? You still have to copy stuff out of the raw buffer into the
line buffer. And if the line buffer isn't big enough you have
to reallocate it. I'm assuming the C library fgets can do this
better than I can.
The bottom line is that immutability helps.
If you have that, functional strings seem a good idea too.
Python uses the short string optimisation too (the first so many characters
are stored in the object. If the string gets longer, they're replace by a pointer
to a heap allocated string).
I don't know how it does substrings though.
--
john skaller
sk...@us...
http://felix-lang.org
|
|
From: john s. <sk...@us...> - 2014-09-15 05:02:41
|
On 15/09/2014, at 2:35 PM, srean wrote:
> It is creating a a full list whereas the Python code doesnt, so using a generator style might recover even more ground. But I dont know the cost model of Felix GC at all, so you certainly would know better.
I think you mean iterator, which is a generator closure: readln itself is a generator,
that's just a function with side-effects.
readln would be faster because it just reads lines using fgets.
So there's no need to do a split.
Er ....
for file in split(file_file,"\n") do
println$ file;
var f = fopen_input file;
reading: while true do
var line = readln f;
if line == "" do break reading; done
index.add line (index.get_dflt (line,0)+1);
++counter;
done
f.fclose;
done
Time elapsed: 1.077088s, 56397 trx, or 52361 trx/sec
which is SLOWER.
(the outer split doesn't count).
Why is it slower? I have no idea. Probably because its faster to read
a whole file at once than with multiple fgets()?
[I still get an unknown exception for the larger file .. very annoying,
however at least I now get an error code :]
Seriously, with immutable garbage collected strings, stuff like split
is trivia: if a string consists of
(count, start)
then there's no copying involved at all. And its perfectly safe,
since the strings are immutable, and the array is garbage collected.
Concatenation remains slow, unless we use a rope.
However ropes are slower to scan. And a rope of individual
characters would be a disaster (as it is in Haskell where the standard
string is a list of chars .. :)
So the problem with ropes is quite nasty. One needs to know when
to compact the rope into a single fragment. Of course you can only
do this creating a rope, otherwise the functional behaviour is lost.
--
john skaller
sk...@us...
http://felix-lang.org
|
|
From: john s. <sk...@us...> - 2014-09-15 04:19:55
|
On 15/09/2014, at 1:56 PM, john skaller wrote:
> Not surprised at all. I could speed this up heaps and heaps!!
>
> fun rev_split (x:string, d:char): List::list[string] = {
> fun aux (x:string,y:List::list[string]) =>
> match find (x, d) with
> | None => Cons (x, y)
> | Some ?n => aux$ x.[n+1 to], List::Cons (x.[to n],y)
> endmatch
> ;
> return aux$ x, List::Empty[string];
> }
So using instead:
fun rev_split (x:string, d:string): List::list[string] = {
fun aux (pos:int,y:List::list[string]) =>
match stl_find_first_of (x, d, pos) with
| $(stl_npos.int) => y
| ?n => aux$ (n+1), List::Cons (x.[pos to n],y)
endmatch
;
return aux$ 0, List::Empty[string];
}
I get this:
Time elapsed: 0.785302s, 56424 trx, or 71850 trx/sec
which is almost twice as fast as this:
Felix: Time elapsed: 1.438417s, 56723 trx, or 39434 trx/sec
--
john skaller
sk...@us...
http://felix-lang.org
|
|
From: john s. <sk...@us...> - 2014-09-15 03:57:35
|
On 15/09/2014, at 1:34 PM, srean wrote:
> Two questions:
>
> i) Does Felix de-synchronize C++ I/O from cstdio ? That speed up things quite a bit
No .. but then Felix doesn't use C++ I/O streams. Its all C FILE*.
In any case this program doesn't do much I/O to the console.
>
> ii) Rather than using the function split, would it be better to use a generator that yields the lines. Felix has readln for that, correct ?
Dunno. I'm looking at the split function .. and I'm not surprised its horrendously slow.
Not surprised at all. I could speed this up heaps and heaps!!
fun rev_split (x:string, d:char): List::list[string] = {
fun aux (x:string,y:List::list[string]) =>
match find (x, d) with
| None => Cons (x, y)
| Some ?n => aux$ x.[n+1 to], List::Cons (x.[to n],y)
endmatch
;
return aux$ x, List::Empty[string];
}
So, ok, this is tail recursive. But it works by chopping off the head substring
and pushing it onto a list, then calls itself on the tail substring.
Which means .. copying the tail substring. It would be much faster to use
RE2::StringPiece here. That's a pointer and length count into an *existing*
string. No copying. At the end you'd have to copy the list of string pieces
into a list of actual strings.
Alternatively, one could just use the unsafe_cstr function and scan
the underlying array, making the new strings as we go. Or just pass in
the starting location for the rest of the scan (assuming there's a find_from
function that starts looking at a particular location).
So thanks for bringing my attention to this.
Unfortunately this is unlikely to speed up Ryan's original test case much.
Felix readln calls:
gen readln: ifile -> string ="::flx::rtl::ioutil::readln($1)";
string readln (FILE *f) {
bool doecho = flx_isstdin(f) && !flx_isatty (f);
if (doecho)
return echo_readln(f);
else
return raw_readln (f);
}
// includes newline if present
// null string indicates end of file
string raw_readln (FILE *fi)
{
if(fi)
{
string x = "";
char buffer[MYBUFSIZ+1];
buffer[MYBUFSIZ]='\0';
next:
bool eof = fgets(buffer, MYBUFSIZ, fi) == 0;
if(eof) return x;
x = x + string(buffer);
if(x[x.size()-1]=='\n') return x;
goto next;
}
else return "";
}
--
john skaller
sk...@us...
http://felix-lang.org
|
|
From: john s. <sk...@us...> - 2014-09-15 02:28:00
|
Python:
#!/usr/bin/env python
import sys, time
def read_file(index,path):
count = 0
with open(path, 'r') as f:
for line in f:
if line not in index:
index[line] = 0
index[line] += 1
count = count + 1
return count
def main():
files_file = sys.argv[1]
index = {}
master_count = 0
st = time.time()
with open (files_file) as f:
for file in f:
path = file[:-1]
print (path)
count = read_file(index,path)
master_count = master_count + count
x = time.time()
elapsed = x - st
rate = int ( float(master_count) / elapsed)
print('Time elapsed: %fs, trx %d rate= %d' % ((x-st), master_count, rate))
if __name__ == '__main__':
main()
Felix:
var index = strdict[int]();
var counter = 0;
var st = time();
var file_file = load (System::argv 1);
for file in split(file_file,"\n") do
println$ file;
for line in split (file.load, "\n") do
index.add line (index.get_dflt (line,0)+1);
++counter;
done
done
var et = time();
var elapsed = et - st;
var trxrate = counter.double / elapsed;
println $ f"Time elapsed: %fs, %d trx, or %.0f trx/sec" (elapsed, counter, trxrate);
Result: on a small list of files:
Python: Time elapsed: 0.218490s, trx 56426 rate= 258254
Felix: Time elapsed: 1.438417s, 56723 trx, or 39434 trx/sec
Python is about 6 times faster than Felix.
On a larger set of files, Python did well, Felix crashed with
an unknown exception.
Data produced by:
build/release/host/bin/flx_ls . 'src/.*' > files.txt
The unknown exception annoys me greatly.
Especially as:
src/codemirror/theme/3024-night.css
src/codemirror/theme/ambiance-mobile.css
src/codemirror/theme/ambiance.css
Unknown exception in thread!
Program exited normally.
(gdb)
Unknown exceptions are impossible.
Its either a Felix exception (all of which are known) or a C++ standard
library exception (all of which have diagnostic info Felix can decode).
[There was a bug, now fixed, that Felix code wasn't properly wrapped
by exception handler .. clearly it is now. And yes, this code is statically
linked so it isn't the clang dynamic_cast bug. The program also returns
0 which is also wrong]
--
john skaller
sk...@us...
http://felix-lang.org
|
|
From: john s. <sk...@us...> - 2014-09-13 10:21:59
|
I'm playing with C++11 move constructors but they don't seem
to be working with clang 3.3svn.
With this:
struct X {
char *data;
X(char const *p) { int n = strlen (p); data = (char*)malloc(n+1); strcpy (data,p); }
X(X &&a) : data(a.data) { printf("Move\n"); }
X(X const &a) { int n = strlen (a.data); data = (char*)malloc(n+1); strcpy (data, a.data); printf("Copy\n"); }
X operator + (X a) {
int n = strlen (data);
int m = strlen (a.data);
char *out = (char*)malloc (n + m + 1);
strcpy (out,data);
strcpy (out+n, a.data);
return X(out);
}
};
int main() {
X x = X("Hello World");
printf ("Result = %s\n",x.data);
X b = x + X(" what ")+ x + x + x + x;
printf ("Result = %s\n",b.data);
return 0;
}
I get a whole heap of copies.
If I use const & in operator + I get none.
It seems wrong. Pass by value should be invoking
the move constructor when the argument is an rvalue.
--
john skaller
sk...@us...
http://felix-lang.org
|
|
From: john s. <sk...@us...> - 2014-09-13 07:54:57
|
On 13/09/2014, at 3:53 PM, srean wrote: > Having visual cues to warn the programmer the he/she is using modulo arithmetic would be good. I am more concerned with + meaning: add numbers add string to string or character add list to list or element because making a list of strings you better get the parens in exactly the right place of you get elements in the list you didn't expect in one place and concatenate elements into string where you didn't want in others. I've screwed that up many times. Its just that we've run out of short Ascii operators, and the TeXy ones should be reserved for less frequently used things. -- john skaller sk...@us... http://felix-lang.org |
|
From: john s. <sk...@us...> - 2014-09-13 07:49:56
|
On 13/09/2014, at 4:06 PM, srean wrote:
> Always meant to ask: Felix library used to have a lot of modules, now everything seems to be a class,
Well yes, thats
1,$ s/module/class/g
in Vim.
> is that to minimize providing the instance type for every function in the module ?
No it was to get rid of two almost identical constructions.
You could always write
module X[T] { ...
but it was just sugar for adding the [T] to all the contained definitions.
So modules were always non-polymorphic.
--
john skaller
sk...@us...
http://felix-lang.org
|
|
From: john s. <sk...@us...> - 2014-09-13 07:36:50
|
On 13/09/2014, at 3:43 PM, srean wrote: > > In any case i am more excited about the content. The language ref really helped clear up quite a few things. Hopefully the indexing of the library files will make the library more accessible to practitioners .. considering most of Felix is defined in the library. > On a different topic, your rewrite of Ryan's code would have some pedagogic value too. It is always instructive to know how the language designer would go about writing the same code balancing performance and elegance. His code was pretty much the same as I would have written. What I changed was where the timings were done, to see what made it slow. If I wanted to make this fast I'd probably mmap the file instead of reading lines, then scan the whole file with a pointer, decoding numbers as found, and replacing the "newline" with a null byte, and calling Judy directly. This eliminates ALL string copying. In other words I'd do it in Felix "as if it were C". -- john skaller sk...@us... http://felix-lang.org |
|
From: john s. <sk...@us...> - 2014-09-13 05:41:58
|
On 13/09/2014, at 12:50 PM, srean wrote:
> A slightly peripheral operator related gripe (about C and C++) :
>
> unsigned_int - unsigned_in -> unsigned_int
>
> is just wrong.
Yes it is, but if you want a proper solution to this you need a very powerful
dependent type system. I'm not sure even ATS can handle it.
> There should not be a polymorphic subtract that takes two unsigned ints and returns and unsigned int. The value returned should be an int.
Those statements are contradictory :-)
It depends what you think an unsigned int is. The C model is
Z (i) + Z (j) => Z_{n^2) (i + j)
which is perfectly consistent, practical. I.e. do the calculation
then modulo back to the starting base.
If you want a signed difference, well, int isn't big enough anyhow.
> If one really wants wrap around semantics they should cast the result back to unsigned explicitly, or the language should offer some other operator, just not "-" for modulo subtraction.
Why not? It uses + for modular addition.
Really, however, the operation is as above. Division, for example, isn't
defined on base 2^n for any value of n other than 1. So there is no
modular division (except on bool :)
>
> Same holds for unsigned and signed. Mixed sign arithmetic is just fraught with peril in particular subtraction. Of course this is just band aid because even signed subtraction is not closed either but I have seen enough bugs resulting from unsigned_int - unsigned_in -> unsigned_int, not enough from their signed cousin.
Felix does not do any implicit conversions (well not that kind anyhow).
It also doesn't define bitwise operators &, ^, | or ~ on signed integers.
Haskell's solution is correct: infinite precision integers. But then, it is
also very slow .. :)
Ultimately computer programs are local approximations: they work
within certain limits. For (most) integer operations you can say
the result is correct if the arguments are small enough.
Unfortunately keeping track of what is small enough is very hard
and not practical by any means other than a run time check.
--
john skaller
sk...@us...
http://felix-lang.org
|
|
From: john s. <sk...@us...> - 2014-09-13 05:27:32
|
On 13/09/2014, at 12:53 PM, srean wrote: > Oh awesome! Not a fan of the color scheme, bit too garish for my eyes, but still is a great feature to have. Feel free to suggest a better one, please use #XXXXXX specification. Of just change it, test, and commit. Remember I use bright colours when designing so I can actually see boundaries, so the current colours were just meant to be visible, not final. -- john skaller sk...@us... http://felix-lang.org |
|
From: john s. <sk...@us...> - 2014-09-13 01:34:05
|
There is a problem with operators like +, - and * which are both
infix and prefix. When you write
a - b
the parser knows you mean infix. But what about:
- (a,b)
Is this the prefix operator applied to pair, or is it the infix
operator?
Normally you wouldn't define an single argument operator on a tuple,
so if you write:
fun - (a:int) => .. // negative of a
fun - (a:int, b:int) => .. // subtraction
it's all clear. We actually have two functions here named "-"
which overload on a single argument, of types:
int
and
int * int
So everything works .. until you get to type classes.
Then it all falls apart. Consider:
class X[T] {
fun - (a:T) => ..
fun - (a:T,b:T) => ...
}
open[K] X[K];
var x = - (1,1); // WOOPS!
The open statement makes
fun -[T] : T -> ... // T = int * int
fun -[T] : T * T .. // T = int
visible and so the call is ambiguous. Even if you write:
- of (int * int) (1,1)
it doesn't help because both specialisations lead to that signature.
This problem cannot happen in a type class itself, because T inside
the class is not a type variable: it is existentially quantified not
universally quantified internally, which means it is just an unknown
abstract type, not a variable.
That's the difference between classes and modules.
In particular:
class X {
fun -[T] (a:T) => ..
fun -[T] (a:T,b:T) => ..
is NOT the same as the original class X I showed. This class is
a module, it isn't polymorphic, the individual functions are
universally quantified, that is, they're polymorphic.
In the former case the whole class is universally quantified
and polymorphic but the individual functions are monomorphic.
For them T is just a single unknown type, the same T everywhere.
However when you do:
open[K] X[K];
you're making all classes X with parameter K available, which means
all specialisations of the functions. Which ensures all calls to
operator - with a pair argument are ambiguous.
You may never have seen such an open statement .. but this
one is syntactic sugar for it, and is equivalent:
open X;
The moral of the story is: do NOT use the same operator name to define
a prefix and infix function.
Now, this does NOT mean we cannot write
- 1 and 1 - 1
and have it work. The parser knows which function it is.
But it generates
-1 and - (1,1)
at the moment.
In the old days, operator symbols couldn't be function names.
Instead we had
class X [T] {
fun neg (a:T) => ...
fun add (a:T,b:T) => ..
that is, the names were distinguished. The problem there was you had
to KNOW the mapping made by the parser. You had to know
prefix * mapped to a function named "deref".
So I allowed operator names as identifiers and changed the parser
and thus dug myself into a hole.
I could got back to "neg". Another solution is:
prefix - (a:int) => ....
infix - (a:int, b:int) =>
Another tricker solution:
fun (a:int) - (b:int) => ...
although I can't see how this fits with the grammar for lambdas or
C++ bindings.
It's almost possible to do this:
fun (a:int) - (b:int) / (c:int) => ...
i.e. provide a pattern. This is, after all, exactly what reduce does.
The only trick is that reductions aren't guaranteed to be provided.
Still .. it looks really cool!
--
john skaller
sk...@us...
http://felix-lang.org
|
|
From: john s. <sk...@us...> - 2014-09-12 06:24:15
|
I have upgraded the webserver to show a left margin menu for *.flx files now. Tell me this isn't nice: http://felix-lang.org/share/lib/std/algebraic.flx [Remember to reload if you looked at the page recently] -- john skaller sk...@us... http://felix-lang.org |
|
From: john s. <sk...@us...> - 2014-09-12 05:30:08
|
So here's another goal: the webserver plugins, and the webserver itself, take quite a bit of time to build. There are also some tools not essential to the build. So these should be packages. -- john skaller sk...@us... http://felix-lang.org |
|
From: john s. <sk...@us...> - 2014-09-11 11:41:16
|
On 11/09/2014, at 8:15 PM, srean wrote: > Hi John > > you are spending so much time and effort behind the layout. I wouldnt worry too much about it, if the content is there even the existing website looked great, in fact i like the old color scheme and all. Appearances matter. At least my brain is strong on visual pattern matching. I once almost got fired because as a contractor working on some C++. I spent the first day editing every single file to remove gratuitous blank lines. The code was so spaced out I couldn't get enough on the screen to see the patterns. I don't want to write documents on every dang library file, there are too many. If I can index all the functions in a file with an analysis tool, it may help get a grasp on the code. -- john skaller sk...@us... http://felix-lang.org |
|
From: john s. <sk...@us...> - 2014-09-11 09:39:20
|
Following the introduction of left margin menu for fdocs. I'm moving to add left margins to plain flx files too. What will happen is the menu will contain a list of things defined in the file so you can jump to them easily. Not all definitions will be listed, and not all layouts of a definition will be correctly detected: we're NOT parsing the code with the Felix parser, we're using crude regexps. Nevertheless it should be effective for library code which can be written to ensure it is recognised, and should work reasonably well for user code written in reasonable style too. [The "parser" is already written, it's used to generate the contents and index pages for the library] IN addition, since there's a "language" spec being written, the keyword mouseovers which give a hint what "proc" or "println" mean, might be made clickable and jump into the language definition (provided I can write it based on keywords somehow). -- john skaller sk...@us... http://felix-lang.org |
|
From: john s. <sk...@us...> - 2014-09-09 00:26:43
|
First, relatively good news: flx_web is sitting on 3% memory. [I have a cron job that regularly deletes the log file so we don't get disk full] Next .. since everyone complained .. the webserver now has a new format for fdoc files. This will be up shortly as a demo for your criticism ... http://felix-lang.org/share/src/web/ref/spec_01.fdoc http://felix-lang.org/share/src/lib/std/strings/string.fdoc http://felix-lang.org/share/src/lib/std/datatype/list.fdoc Basically there's a left margin menu similar to ReadTheDocs (RTD) [I'm pretty good at stealing other people's ideas, integrating them, and making them .. er .. well .. 'better' .. :] Please excuse the colour scheme (I like high contrast bright colours when testing so I can see the HTML boxes easily .. makes things easier to line up). Also don't forget Felix serves docs with a long expiry period so you can browse from you local cache, so if you've been browsing recently you may have to force a reload. ==================================================== Advantages of RTD: * produces PDF * HTML works on any webserver * currently looks a bit sexier * maintained by someone else :) * active project with many users * docs live on a separate webserver * fast automatic rebuild of docs (based on commits to Github) Advantages of fdoc * no third party software required * layout actually works (RTD is broken) * works with all existing fdoc files * is under our control * HTML generated on the fly * fdoc is a better format the RST * integrates with Felix code better hyperlinks, colouring, tooltips: in fact remember Felix can *execute* fdocs, and some library code (eg string) is in fdoc format already * can be modified to do what we want BUGS: browser BACK goes to previous URL including #tag, not previous document. Disadvantage of both systems * when clicking on a level 1 heading, the tree redisplay causes the mouse to now be over some unrelated section. This is bad UI design really. ========================================================= Improvement, issue, ideas etc. Contribute! Both RTD and my new code have a big disadvantage: they can't work properly on a phone. The left margin menu is fixed width. RTD makes the menu go away if the window is too small. Making a user controlled menu width is simple enough in principle, however it means all the style stuff has to be calculated in Javascript. Then the boundary can be made draggable, we can set a variable, and have the widths controlled by that variable. Many fdocs only have one level 1 heading. This really defeats the utility of the menu. Partly this is because there was no @title feature in the old days so the h1 heading served as the title. The fdocs could be edited but another idea is to make the menu code a bit smarter, e.g. if there is only one h1 heading, provide the tree based on h2 headings instead. The menu could also be expanded to 3 levels. Another option is to not do the collapsing stuff if the whole menu fits (i.e. view it as a space optimisation). The behaviour would then be a bit inconsistent. ======================================================= It would be useful to produce a translator for RST for the webserver though. It makes sense for tool --help output to be RST. Restructured text was designed to be readable as plain text, as well as being typeset. It is a lot better at that than, say, fdoc. With an RST translator, the tool documentation "quick guide" can be produced by simply running the tools with --help option. Any volunteers? Its an interesting project to do in Felix. I can provide the infrastructure for the webserver but a standalone program to do the translation would be interesting. Although of course you could just use sphinx. :) ====================================================== A more useful project might be an fdoc to RST translator. That's a much simpler program than RST to HTML translator. A pretty good tutorial task for a beginner Felix programmer. -- john skaller sk...@us... http://felix-lang.org |
|
From: john s. <sk...@us...> - 2014-09-08 10:25:15
|
Just an alert: the dependency checking in the rebuild operation is not working correctly. A modification to a webserver plugin did not propagate into the executable. Felix currently doesn't not check linkage dependencies, it just relinks every time. However a source code change, in this case including a change to a plugin interface, should have rebuilt the object files and shared libs of the plugins, and subsequently the static linked web server should have been relinked. This didn't seem to happen. Plugins are not type safe (because they use extern "C" linkage). If the plugin supplies an object with a type different from that expected by a client, all hell breaks lose, in my case a segfault. This should NOT happen in the build process. Correctness with plugins is ensured the same way with C: a shared header is type checked in both supplier and client against the code in each. The actual interface code is not checked, you have to get that right, but this is relatively simple when using an object to encapsulate a set of methods. An actual change to the interface requires rebuilding the supplier plugin and all clients. That should be automatic in the build process because the interface has to be included with an include directive, and "flx" checks for changes in all include files. So I have no idea what has failed. However there is an issue to note which might be related: there was a previous bug associate with this probem: The x86_64 is a very badly designed processor. Unfortunately it has very poor support for relocatable code. Consequently when using shared libraries, special tables have to be set up by the linker and very unfortunately accessing stuff through these tables is not transparent. In particular special sequences of machine instructions are required. This is not required for static linkage, so statically linked code can be faster than that required for dynamic linkage. In other words, there are two kinds of object files you can generate: ordinary ones and ones with -fPIC. Felix can generate both. The fast ones are *o files. The position independent ones use *.os [On both OSX and LInux] This problem doesn't exist on WIndows. It also doesn't exist on new OSX versions, because Apple has banned non PIC code. Unfortunately "flx" cannot generate BOTH object layouts at once. We first generate C++ and compile to say *.os layout then a shared lib. If we then try again to make a static link object, flxg is not run because it is the same C++ code. The C++ compiler should be rerun though, since the output file is different. My gut feeling is the dependency files: *.o.d *.os.d are not working right: perhaps the location of the files is wrong or something. Not sure. I can confirm my new code actually works after a complete rebuild. -- john skaller sk...@us... http://felix-lang.org |