You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(3) |
Sep
|
Oct
(1) |
Nov
(19) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(9) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Alexander <ale...@ne...> - 2003-01-20 09:45:10
|
> stat(2): > > Not all of the Linux filesystems implement all of the time > fields. Some file system types allow mounting in such a > way that file accesses do not cause an update of the > st_atime field. (See `noatime' in mount(8).) > > mount(8): > noatime > Do not update inode access times on this > file system (e.g, for faster access on the > news spool to speed up news servers). > > *shrugs* > > The root on kahn is mounted with noatime. noatime I can understand, that means the filesystem will not have to be updated each time a file is _accessed_. But not updating the filesystem when the file is _changed_ is another matter. Or actually, the filesystem is updated, but fstat()s on the fd that was opened before the change returns and old mtime. //Eel |
|
From: Thomas N. <th...@xm...> - 2003-01-20 09:03:29
|
On Mon, 20 Jan 2003, Alexander Hav=E4ng wrote:
>Hi,
>
>The server opens (fopen) users.txt before it does setuid() and fork()
>stuff. Then for each user request, it fstat()s the fd and checks if the
>file has changed, and if so, rewinds and reads the file.
>This works on all systems I've tested it on, except for kahn, which runs
>gentoo linux.
>
>Anyone got a clue why this is happening?
>
>//Eel
stat(2):
=20
=09Not all of the Linux filesystems implement all of the time
=09fields. Some file system types allow mounting in such a
=09way that file accesses do not cause an update of the
=09st_atime field. (See `noatime' in mount(8).)
mount(8):
=09noatime
=09Do not update inode access times on this
=09=09file system (e.g, for faster access on the
=09=09news spool to speed up news servers).
*shrugs*
The root on kahn is mounted with noatime.
Best regards, ############################
Thomas Nilsson # X MultiMedia System #
# webmaster/design/support #
# www.xmms.org #
############################
|
|
From: Alexander <ale...@ne...> - 2003-01-20 08:50:37
|
Hi, The server opens (fopen) users.txt before it does setuid() and fork() stuff. Then for each user request, it fstat()s the fd and checks if the file has changed, and if so, rewinds and reads the file. This works on all systems I've tested it on, except for kahn, which runs gentoo linux. Anyone got a clue why this is happening? //Eel |
|
From: Tobias <to...@to...> - 2003-01-13 17:58:27
|
Both client, lib and server compiles on MacOSX with the libs from Fink (fink.sourceforge.net). Go tell a friend. //Tobias |
|
From: Alexander <ale...@ne...> - 2003-01-07 08:27:11
|
> Have you made any promises that gestapo-0.1 should be compatible with the final > gstp 1.0 spec? Nah. > Who is Thomas? Thomas is on the XMMS devel team. He did a great roadmap for XMMS at http://staff.xmms.org/priv/xmms_roadmap.png :) //Alexander |
|
From: Daniel R. <no...@me...> - 2003-01-06 17:24:18
|
After havinge read an article from one of the people that originally created sourceforge (http://www.osdir.org/modules.php?op=modload&name=News&file=article&sid=102&mode=thread&order=0&thold=0) about VA Software management I must say that I don't feel confident that the service will be around in the long perpsective. Therefore I think that it would be a good idea to register some kind of domain name that the gstp project can use in the long term. That domain can be pointed to the sf.net servers for the time being, but when the day comes when we want to redirect the domain elsewhere it should be easy. So, do we have any suggestions about domain names? Domains that cost money: gstp.nu - seems available, if you're into nu-names gstp.se - is nice, but a little difficult to register (But i've hear that will change in a few months.) gstp-protocol.org - kind of ugly imho gstp-project.org - also kind of ugly (follows the pattern of other software projects) Domains for free: gstp.netintact.se - perhaps good pr for netintact?, will you stay around forever? gstp.resare.com - will definitely stay around for as long as I live and stay sane :) Any more suggsetions? -- begin:vcard fn:Daniel Resare tel;cell:+46739442044 tel;work:+468332040 adr;work:Scheelegatan 36; 112 28; Stockholm; Sweden end:vcard pgp fingerprint: 8D97 F297 CA0D 8751 D8EB 12B6 6EA6 727F 9B8D EC2A |
|
From: Daniel R. <no...@me...> - 2003-01-06 17:07:39
|
On Mon, Jan 06, 2003 at 12:06:57AM +0100, Alexander Haväng wrote: > Gestapo 0.1 was released on freshmeat (as GSTP). > > This is confusing. > > GSTP is the protocol, Gestapo is the reference implementation. > We need to straighten this out :) IMHO gestapo is a great name. Congratulations on the release! (Of course I think it's important to separate the reference implementation from the specification) Have you made any promises that gestapo-0.1 should be compatible with the final gstp 1.0 spec? > > Before the next release, some administrivia needs to be taken care of. > > The webpage gstp.sourceforge.net needs to be created. I vote for something minmialistic that strictly follows XTHML/1.1 > The protocol specification needs to be cleaned up. I'd like to volutnteer for that, but I will need all the help I can get. > > And we need some kind of TODO-list and/or roadmap. I'll ask Thomas for the > latter ;) > Who is Thomas? /noa -- begin:vcard fn:Daniel Resare tel;cell:+46739442044 tel;work:+468332040 adr;work:Scheelegatan 36; 112 28; Stockholm; Sweden end:vcard pgp fingerprint: 8D97 F297 CA0D 8751 D8EB 12B6 6EA6 727F 9B8D EC2A |
|
From: Tobias <to...@to...> - 2003-01-06 14:31:21
|
On Mon, Jan 06, 2003 at 12:06:57AM +0100, Alexander Haväng wrote: > Gestapo 0.1 was released on freshmeat (as GSTP). > > This is confusing. > > GSTP is the protocol, Gestapo is the reference implementation. > We need to straighten this out :) then we should rename the client and server to gestapo-client and gestapo-server and the lib to libgestapo maybe? > Before the next release, some administrivia needs to be taken care of. > > The webpage gstp.sourceforge.net needs to be created. I can do this. I am thinking no fancy graphic page, more text :) > The protocol specification needs to be cleaned up. Noa? > And we need some kind of TODO-list and/or roadmap. I'll ask Thomas for the > latter ;) ;-) |
|
From: <ale...@ne...> - 2003-01-05 23:07:12
|
Gestapo 0.1 was released on freshmeat (as GSTP). This is confusing. GSTP is the protocol, Gestapo is the reference implementation. We need to straighten this out :) Before the next release, some administrivia needs to be taken care of. The webpage gstp.sourceforge.net needs to be created. The protocol specification needs to be cleaned up. And we need some kind of TODO-list and/or roadmap. I'll ask Thomas for th= e latter ;) //Alexander |
|
From: Daniel R. <no...@me...> - 2002-11-14 12:33:34
|
lör 2002-11-09 klockan 18.41 skrev Tobias Rundström: > On fre, nov 08, 2002 at 04:09:21 +0100, Daniel Resare wrote: > > 1) Do nothing. This is the FTP way, and non-ascii paths will not work reliably > > Nah. > > > 2) Do all charset conversion on the client and announce the charset that the > > server perhaps in the response of the hello command. If opt for this method, > > it would be a good idea to say that the encoding of the server must be > > ascii-compatible, so that communicating with the server in pure ascii will > > always work. > > This is much harder then UTF-8 IMHO. > > > 3) Mandate UTF-8. No charset info would need to be exchanged at on session > > setup. > > Yes this is the best way in my point of view. It might be some work done in > the server to get this to work smooth, but then we don't need to answer 1000 > support questions about incorrect charsets and such. > > GLib contains good functions for utf-8 convertion. We should take a better > look at them and see how much we can get for "free". > What you're looking for is g_filename_to_utf8() http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.html#g-filename-to-utf8 /noa |
|
From: <ale...@ne...> - 2002-11-12 06:05:41
|
>> Will stay 16 bit, an 8 bit field would would misalign the header. >> > > Oki. What are the alignment issues? (4 byte align? On what platforms > does that matter?) If there are more "non-nessecary" bits in the header= , > perhaps we should save them in a "padding/reserved for future expansion= " > field? I had a quick look in the protocol, and we're not really well aligned anywhere, so I think we should just forget about alignment. Alignment is silly anyway, a protocol should not have to care about CPU boundaries. It's easy to work around with memcpy() anyway. I've modified the create command, changed all fd's to 16 bit, and modifie= d the command and reply opcode to 8 bit, and will commit this to the CVS an= y time now :) //Alexander |
|
From: Daniel R. <no...@me...> - 2002-11-11 23:12:36
|
m=E5n 2002-11-11 klockan 18.43 skrev Alexander Hav=E4ng: > Will stay 16 bit, an 8 bit field would would misalign the header. >=20 Oki. What are the alignment issues? (4 byte align? On what platforms does that matter?) If there are more "non-nessecary" bits in the header, perhaps we should save them in a "padding/reserved for future expansion" field? /noa |
|
From: <ale...@ne...> - 2002-11-11 17:45:57
|
Will stay 16 bit, an 8 bit field would would misalign the header. |
|
From: Tobias <to...@to...> - 2002-11-09 17:41:44
|
On fre, nov 08, 2002 at 04:09:21 +0100, Daniel Resare wrote: > 1) Do nothing. This is the FTP way, and non-ascii paths will not work reliably Nah. > 2) Do all charset conversion on the client and announce the charset that the > server perhaps in the response of the hello command. If opt for this method, > it would be a good idea to say that the encoding of the server must be > ascii-compatible, so that communicating with the server in pure ascii will > always work. This is much harder then UTF-8 IMHO. > 3) Mandate UTF-8. No charset info would need to be exchanged at on session > setup. Yes this is the best way in my point of view. It might be some work done in the server to get this to work smooth, but then we don't need to answer 1000 support questions about incorrect charsets and such. GLib contains good functions for utf-8 convertion. We should take a better look at them and see how much we can get for "free". //Tobias |
|
From: Daniel R. <no...@me...> - 2002-11-08 15:08:39
|
Today I was on the train again, and wrote this about charsets: Charsets from a GSTP point of view This text tries to explain everything you need to know about charsets to decide on how to implement multilingual file naming in GSTP. My conclusion will be the following: mandate the use of UTF-8 in all paths sent over wire in the GSTP protocol. The long story: In the beginning there was ASCII. It is a very simple idea: use a standardized table of characters, encode one character per byte and use 7 bits per byte. It could be used to encode all characters used in normal english, without strange foreign accents, dots and rings. Since the 8th in every octet wasn't used, there was room for extension in the future. The idea of encoding one character per byte is nice, but it limits the total number of encodable characters to 2**8 - some 50 reserved code points. So, when people wanted to add other national characters to ASCII using the 8th bit, where was a number of charsets developed. Those were the ISO-8859-xx famliy, containing for example ISO-8859-1 used in western europe, ISO-8859-5 that contains Cyrillic characters and ISO-8859-6 containg arabic chars. Having multiple charsets has it's difficulties. There is no simple way of determining what charset is used by just looking at the file. So that data must be kept somewhere in order to be able to decode a file properly to display it's contents. There is also the problem with mixing different languages in the same text. There is simply no good way of having for example latvian characters in a swedish text encoded in ISO-8859-1. Asian encodings Asian writings, for example Chinese, Japanese and Korean, use thousands of glyphs and they want to encode them. Ascii extensions simply wouldn't do. So, omeone invented a mechanism called ISO-2022. Evil it was, very evil. It had different modes, and control sequences to switch between those modes. That means, for example that there is no way of determining if 0x2f really is an ascii '/' or if it is part of some other character in a different mode without having a full iso-2022 parser. Do I need to say it's complicated? It's complicated. Unicode and it's encodings In parallel with this development an international standardization group called the Unicode consortium sat down and defined a mapping between glyphs and code points (integer numbers) for all known scripts in the world. With unicode you can use several different encodings. The simple ones being UCS-2 that was later obsoleted by UCS-4. It simpely uses two (and four) bytes to encode each character. UCS-2 has the drawback of not being able to encode all of unicode (it has grown past 2**16 codepoints). These encodings has probems, they are for example not suitable for using standard C string operations in them, since they are full of null bytes, and they are quite space inefficient for storing texts with lots of ascii they also don't define a byte order, and different implementations use different byte ordering. A more complicated encoding is UTF-8. It is ascii-compatible in the sense that an ascii string is always expressed as the exact same bytes in utf-8. To encode all the codepoints above ascii in unicode it uses self-terminated sequences of bytes with the high bit set. This means that the number of bytes not nessecarily is the same as the number of chars in an utf-8 string. But utf-8 has some nice properties also: it stores ascii and 'simple' languages (read non far-east) efficiently. It can't have null-bytes embedded in strings. 0x2f is always '/', so to 'walk a path' you don't need to parse utf-8 at all. It also has very strict rules about how multi-byte sequences musst look like, what makes it very easy to determine the length of a string, and to find malformed utf-8 sequences. If you try to parse a non-utf8 string as utf-8 you will fail quickly. So where does this leave us when deciding on charset for the GSTP specification? We have a couple of options: 1) Do nothing. This is the FTP way, and non-ascii paths will not work reliably 2) Do all charset conversion on the client and announce the charset that the server perhaps in the response of the hello command. If opt for this method, it would be a good idea to say that the encoding of the server must be ascii-compatible, so that communicating with the server in pure ascii will always work. Pro: - it is simple to implement on the server side, as you can make the server use the encoding used in the filesystem. - If the encoding on both client and server is the same, no conversion will take place. Con: - It puts a greater burden on the client implementations, and you if we don't define a list of allowed encodings in the spec, new server implementations using strange encodings can show up and the clients will perhaps need to be updated to handle that. This is not a big problem on systems using recent glibc, but on other systems considerable amounts of code may be needed just to handle charset conversion - If the spec mandates utf-8 the server can verify that the data sent actually is utf-8. This is impossible if the server has for example iso-8859-1 as charset. - Lazy server implementors might not think about users that would like to store files with named outside of for example latin1. If the spec mandates utf-8, the server implementations will need to at least think about the issues and make sure that users can not store files on the server with filenames that can not be correctly stored on the filesystem. 3) Mandate UTF-8. No charset info would need to be exchanged at on session setup. Pro: - Simple client implementations. Almost every system can convert from the local charset to utf-8. - The guarantee that all conforming servers will be able to handle all filenames, or at least fail gracefully with an error message. - The server can verify that clients send well-formed utf-8. - Server implementors is encouraged to think about non-ascii users. The probability is higher that most server will handle for example far-east filenames without problems. Con: - More complicated server implementation. The server needs to convert the incoming and outgoing pathnames from the charset of the filesystem. On the other hand, this conversion is between two well known and well supported charsets which is probably simpler than demanding that the client implements converseion between the local charset and several different possible charsets that the server can have. - If the server and client uses the same file system charset, and that charset does not is not utf-8, an "unneeded" conversion must be preformed. Perhaps it is clear now that i would like to recomend the 3rd alternative. Alternative 2 is also ok, if the spec is clear that utf-8 is the recomended charset and that it explicitly forbids iso-2022 type encodings. One can also note that utf-8 is the only encoding allowed in for example ogg vorbis comments, and that RedHat 8.0 now has utf-8 as the default character encoding in all standard locales. If I remember correctly the win32 systems use UTF-16 as filename encoding, and therefore need to convert filenames when talking to the outside world anyway as utf-16 is not ascii-compatible. |
|
From: Daniel R. <no...@me...> - 2002-11-07 16:59:11
|
tor 2002-11-07 klockan 17.06 skrev Alexander Hav=E4ng: > > As for cleartext. I will run a fair comparison test between NULL-cipher > and raw TCP data, and if the difference in time for transfering 10000 > 5MB files is more than 1-2%, then I'll fold. >=20 A shootout! That's cool. /noa |
|
From: Daniel R. <no...@me...> - 2002-11-07 16:50:30
|
tor 2002-11-07 klockan 17.18 skrev Alexander Hav=E4ng: > > If GSTP will ever be a widely used protocol it will mean that it will h= ave > > many more or less messed up client implementations, some of them very > > wide-spread. To handle those clients workarounds in server behaviour ma= y be > > needed on the server side, like what is done today in the HTTP protocol= (for > > example many versions of keep-alive doesn't get handled correctly in > > internet explorer). Unfortunately the HTTP protcol doesn't specify how > > client software should identify them self other than that they should s= end > > a a string to the server. This leads to advanced logic in the server fo= r > > pattern matching in the client version string to try to determine the e= xact > > version of the client software. >=20 > We haven't figured out how to use the CAPABILITIES command yet.. but I > think that command should work out most of the problems I can think of. > But.. you do have a point, and if we can figure out a good way of > distributing nice client software ids then I think we should add your > fields to the hello or capabilities command. Capabilities have the reversed functionality, when a client needs to work around peculiarities in the server. When it is the other way around you need good info that can identify a client implementation in the HELLO command. I think that some byte string that identifies the client plus a numeric version number (8 + 8 bits) should do it. An 16 bit vendor ID + some official registry where you can register your implementation is perhaps also something to think about > > 2**32 char path lengths? > > 32 bit file name length in the open command? 2**10 should be enough for > > anyone, rounded to the nearest byte, 16 bits. This applies to all field= s > > that indicate length of paths or globs of any kind. >=20 > We have 2 choices.. 16 bit or 32 bit. 16 bit will limit paths to 65k. > I personally think that is too small, and just the idea that it might be > too small should be enough incentive to keep the 32 bits. > So.. no, not granted :) >=20 Are you serious? 65k paths are EXTREMELY long. Like 650 directories in each others, 100 chars each. I'd definitely say that if you need paths even remotely close to 65k long (say 1k) you are doing lots of stuff terribly wrong and deserve breakage. Just out of curiosity, do you have an example of an application where paths longer than 65k would be thinkable? One thing to think about when defining the specification is that if you define the standard to being able to handle 2**32 char paths, implementors need be able to handle that too, or at least fail in some predictable way. That is an additional burden. > > 2**32 filedescriptors? >=20 > 65k open files on one server is not enough. While this is not the > exactly the same thing as the protocol filedescriptor field, it's > easiest to implement that way, and 2 bytes isn't that much overhead > anyway. I'm fairly open on this though..=20 >=20 I thought the obvious way to implement a server was to fork() off one process per connection, thus you would have a private number of open files per client connection. If you're going the threaded way, a mapping table between protocol filedescriptor and server filedescriptor is trivial, and wouldn't hurt performance much. You say "2 bytes isn't that much overhead anyway" and that is correct. However, two bytes here and two bytes there adds up quickly and the protocol becomes more bloated than it needs to be (not much, but he perfectionist in me don't like it) > > The response message for a read command has a field for "read message > > length". This field seems redundant, as the information can be easily > > calculated from the message size field (the first field in the message)= .=20 > > This also applies to the write command. >=20 > It does.. ehm.. haven't got time to look at the source right now.. there > might be a reason for this.. and then again, there might not :) I'll > check. > > > How does a response indicate that there are more responses to come? >=20 > A response that is not SUCCESS or FAILURE will _always_ have more > responses to come. A command->response chain always ends with a SUCCESS > or FAILURE response from the server, unless the client requests that he > doesn't want a SUCCESS reply (this is because you don't want a SUCCESS > response for each write command). Oh, that's the way you do it. I'll document that then :) > > How do you determine what a symlink points to when using the create > > command? >=20 > Ehm.. dunno.. I'll modify the create command. >=20 > > When writing a client that operates over a slow link it would be useful= to > > know the total size of the reply message(s) that corresponds to for > > example a "list directory" command. If this information is sent, progre= ss > > could be displayed to the user in a reliable way. The client could esti= mate > > the total size of the replies by estimating the size of a single file > > property and multiply it with the file count, but that seems like a hac= k to > > me. >=20 > This is true.. I'll split some of the responses into different types, > one for the "first" response that includes the total length (if known). >=20 good > > To enable file transfer between systems with different file name encodi= ngs > > i propose that the specification dictates that all filenames should be > > encoded in the UTF-8 charset. The other way to do this would be to have= a > > mechanism for the client to query the server for filename encoding char= set > > and then do any encoding conversion on the client side. >=20 > This.. is something I really hate.. charsets suck. Just tell me the > right way to do it and I'll do it if the implementation doesn't suffer > too much.=20 >=20 Yes it's a mess, especially when you mix in the far east people. I'll do a writeup of the different ways to do it in a separate email. I'll also create a TODO list for protocol changes and clarifications. /noa |
|
From: Alexander <ale...@ne...> - 2002-11-07 15:59:03
|
> If GSTP will ever be a widely used protocol it will mean that it will have > many more or less messed up client implementations, some of them very > wide-spread. To handle those clients workarounds in server behaviour may be > needed on the server side, like what is done today in the HTTP protocol (for > example many versions of keep-alive doesn't get handled correctly in > internet explorer). Unfortunately the HTTP protcol doesn't specify how > client software should identify them self other than that they should send > a a string to the server. This leads to advanced logic in the server for > pattern matching in the client version string to try to determine the exact > version of the client software. We haven't figured out how to use the CAPABILITIES command yet.. but I think that command should work out most of the problems I can think of. But.. you do have a point, and if we can figure out a good way of distributing nice client software ids then I think we should add your fields to the hello or capabilities command. > 2**32 char path lengths? > 32 bit file name length in the open command? 2**10 should be enough for > anyone, rounded to the nearest byte, 16 bits. This applies to all fields > that indicate length of paths or globs of any kind. We have 2 choices.. 16 bit or 32 bit. 16 bit will limit paths to 65k. I personally think that is too small, and just the idea that it might be too small should be enough incentive to keep the 32 bits. So.. no, not granted :) > 2**32 filedescriptors? 65k open files on one server is not enough. While this is not the exactly the same thing as the protocol filedescriptor field, it's easiest to implement that way, and 2 bytes isn't that much overhead anyway. I'm fairly open on this though.. > The response message for a read command has a field for "read message > length". This field seems redundant, as the information can be easily > calculated from the message size field (the first field in the message). > This also applies to the write command. It does.. ehm.. haven't got time to look at the source right now.. there might be a reason for this.. and then again, there might not :) I'll check. > How does a response indicate that there are more responses to come? A response that is not SUCCESS or FAILURE will _always_ have more responses to come. A command->response chain always ends with a SUCCESS or FAILURE response from the server, unless the client requests that he doesn't want a SUCCESS reply (this is because you don't want a SUCCESS response for each write command). > How do you determine what a symlink points to when using the create > command? Ehm.. dunno.. I'll modify the create command. > When writing a client that operates over a slow link it would be useful to > know the total size of the reply message(s) that corresponds to for > example a "list directory" command. If this information is sent, progress > could be displayed to the user in a reliable way. The client could estimate > the total size of the replies by estimating the size of a single file > property and multiply it with the file count, but that seems like a hack to > me. This is true.. I'll split some of the responses into different types, one for the "first" response that includes the total length (if known). > To enable file transfer between systems with different file name encodings > i propose that the specification dictates that all filenames should be > encoded in the UTF-8 charset. The other way to do this would be to have a > mechanism for the client to query the server for filename encoding charset > and then do any encoding conversion on the client side. This.. is something I really hate.. charsets suck. Just tell me the right way to do it and I'll do it if the implementation doesn't suffer too much. //Eel |
|
From: Alexander <ale...@ne...> - 2002-11-07 15:47:17
|
> more opcodes, or options to open, that doesn't matter to me. I think > that from a documentation point of view an option that changes semantics > so drastically would be less well structured than having a separate > opcode, but that's just personal taste. True. I'll fold :) > oh. that's good :) I would suggest that an option in the auth command > packet indicated that the client is interested in anonymous login, > instead of using a special "magic" username that means "this is not > really a username". Using no AUTH command would make error messages more > difficult to understand. open(fil.txt) would fail with "anoymous access > denied / permission denied" or something like that. Granted. As for cleartext. I will run a fair comparison test between NULL-cipher and raw TCP data, and if the difference in time for transfering 10000 5MB files is more than 1-2%, then I'll fold. //Eel |
|
From: Alexander <ale...@ne...> - 2002-11-07 15:43:31
|
> Suggestion: shrink the opcode field from 16 to 8 bits. Granted. //Eel |
|
From: Daniel R. <no...@me...> - 2002-11-07 13:10:58
|
I had some time left on the train to Stockolm. Here are some random thoughts that I wrote down when reading the spec. Machine readable client version number. If GSTP will ever be a widely used protocol it will mean that it will have many more or less messed up client implementations, some of them very wide-spread. To handle those clients workarounds in server behaviour may be needed on the server side, like what is done today in the HTTP protocol (for example many versions of keep-alive doesn't get handled correctly in internet explorer). Unfortunately the HTTP protcol doesn't specify how client software should identify them self other than that they should send a a string to the server. This leads to advanced logic in the server for pattern matching in the client version string to try to determine the exact version of the client software. To do this in a better way when we have the ability, i suggest that two fields are added to the HELLO command, a major and a minor client software version number. 8bits each should be enough for these values. 2**32 char path lengths? 32 bit file name length in the open command? 2**10 should be enough for anyone, rounded to the nearest byte, 16 bits. This applies to all fields that indicate length of paths or globs of any kind. 2**32 filedescriptors? Is it reasonable to allocate space for 2**32 open filedescriptors per session? If anyone uses more than 2**8 then we have a problem, so allocating 16 bits for fd values should be more than enough. Redundant length information in read and write? The response message for a read command has a field for "read message length". This field seems redundant, as the information can be easily calculated from the message size field (the first field in the message). This also applies to the write command. How does a response indicate that there are more responses to come? Is there a way for a response to indicate that there are more responses coming that applies to a specific command? Or the other way around, is there a flag or something in the last response to a command that indicates that it is the last response. This is especially important for example when a file is truncated on the server while the client is reading it. If there is no way to determine that end of file is reached, the client might "hang" waiting for a repsonse message that never shows up. How do you determine what a symlink points to when using the create command? Total size of metadata return values? When writing a client that operates over a slow link it would be useful to know the total size of the reply message(s) that corresponds to for example a "list directory" command. If this information is sent, progress could be displayed to the user in a reliable way. The client could estimate the total size of the replies by estimating the size of a single file property and multiply it with the file count, but that seems like a hack to me. Specified filename charset To enable file transfer between systems with different file name encodings i propose that the specification dictates that all filenames should be encoded in the UTF-8 charset. The other way to do this would be to have a mechanism for the client to query the server for filename encoding charset and then do any encoding conversion on the client side. all for now /noa -- begin:vcard fn:Daniel Resare tel;cell:+46739442044 tel;work:+468332040 adr;work:Scheelegatan 36; 112 28; Stockholm; Sweden end:vcard pgp fingerprint: 8D97 F297 CA0D 8751 D8EB 12B6 6EA6 727F 9B8D EC2A |
|
From: Daniel R. <no...@me...> - 2002-11-04 15:24:49
|
m=E5n 2002-11-04 klockan 10.38 skrev Alexander Hav=E4ng: > > Of course sendfile() isn't portable, but the general idea of > > zero-copy-networking is. I find it hard to believe that any > > implementation of zero copy networking on any platform will include the > > ability to files and then return the control to userspace every 32k, bu= t > > you never know. >=20 > Why not? > ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t > count); I think I missed 'efficiently', but it doesn't matter. I'm no big expert on low level performance, I just mentioned it as something to think about. > I don't think we'll be seeing any TLS code in the kernels any time soon > though :) True. > > Perhaps one could add an operation "get" that would be the one roundtri= p > > equivalent for "open; read; close". After having worked a bit with > > networking in GPRS environments (read: lousy latencies) i know that > > having the ability to cut down on roundtrips is a good thing. >=20 > I see your point.. and we'll keep this open. I don't want to introduce > any new commands for doing the same thing, but I will consider adding a > new option for OPEN that tells the server to start sending data right > away, as well as closing it on EOF. >=20 more opcodes, or options to open, that doesn't matter to me. I think that from a documentation point of view an option that changes semantics so drastically would be less well structured than having a separate opcode, but that's just personal taste. > > I believe that mixing in personal tastes like this one in fundamental > > protocol decisions will possibly prevent the protocol from being widely > > used. >=20 > One of the problems with say, the FTP protocol, is that is to > overdeveloped. Only god knows all the features in FTP, and there are > millions of ways of doing things. > Just look at all the different PORT and PASV commands used today. > Some use PORT, others EPORT, or LPORT, or ZPORT..=20 > I guess what I'm saying is that there should be one way of doing things, > and only one. >=20 True. The problem with ftp is that it is badly engineered, and that is a different thing from saying "we should have the overhead of TLS everywhere because I dislike unencrypted network traffic in my network monitoring program" > Nothing stands in the way of changing the underlying transport layer of > GSTP from TLS to plain TCP-data, other than me thinking it will fork() > things where no forking is necessary. I don't see what this has to do with fork(2). Perhaps you mean two different code-paths when only one is necessary. > After all, when downloading a file that takes 5 minutes to download, > what does it matter that it takes another second or so at the worst to > negotiate a TLS connection? >=20 > When downloading 500 files, it's still only done once, compared to 500 > TCP handshakes with FTP. >=20 I was thinking more from the server's point of view. A scenario: when Microsoft switches to GSTP to transfer their security updates for their operating system in 2005 there are ~100 million Internet connected windows computers that automatically downloads a 28k file from their server approximately at the same time. For one 28k file i would approximate that perhaps 10% of the cpu for sending the file would go into TLS negotiation, and perhaps perhaps 50% for doing the encryption (if a weak crypto is chosen). If they could instead use non-encrypted transport for their service they could send exactly the same bits over wire to all the different clients, and save a considerable percentage of the cpu time on the servers. This of course assumes a highly efficient GSTP implementation, but that is no reason to limit the theoretical maximum performance at protocol level (by mandating TLS). > If the need should arise for an even more lightweight GSTP protocol for > handheld-devices or slow links, then we'll call it "diet GSTP" and stick > a green keyhole on it. Developing a different protocol "variant" just because we didn't bother to do it efficiently the first time seems like the FTP way of developing standards :P =20 > > Don't you see anonymous downloads (when no user authentication what so > > ever takes place) as a significant use case of GSTP? I find that > > strange. >=20 > Of course I do, in due time. > Anonymous logins either don't issue a AUTH command, or use the standard > FTP way with "anonymous/ftp" + "emailaddress". > This should be in the protocol description, but it's not :) oh. that's good :) I would suggest that an option in the auth command packet indicated that the client is interested in anonymous login, instead of using a special "magic" username that means "this is not really a username". Using no AUTH command would make error messages more difficult to understand. open(fil.txt) would fail with "anoymous access denied / permission denied" or something like that. > Also.. the Internet is a fast moving but extremely slow changing place.. > FTP will not die quietly :) > GSTP is a slow moving project.. so we have time to make it right :) >=20 > The server is getting more and more stable each day, I'll run some evil > tests with 10000 users across gigabit ethernet or loopback and see if we > can handle the load. I don't think it will be a problem. >=20 Sounds interesting :) /noa |
|
From: Daniel R. <no...@me...> - 2002-11-04 14:48:22
|
I can think of several occasions when being bit-efficient would be an advantage for a protocol like GSTP. Here is one small improvement that might be worth considering: I think that if we get more than 2**7 opcodes in the protocol we've thought something wrong. Therefore 8 bits for opcodes should be more than enough. The worst case to handle a situation where we must handle more than 2**8 opcodes, would be to have a special extended opcode that holds more information in the command message body. Suggestion: shrink the opcode field from 16 to 8 bits. |
|
From: Alexander <ale...@ne...> - 2002-11-04 09:26:21
|
> 1) Convert the current documentation text file to RFC2629 format. > RFC2629 specifies an xml dtd for Internet Drafts and RFC documents. > There are tools to convert from the format to correctly formatted text > as well as HTML at http://xml.resource.org/ Yup, sounds like the way to go. I suspect there are good tools to convert this XML to txt? > 2) Ask pesky questions to this list about everything in the > specification that I can think of being non-optimal or unclear. :) > 3) Extend the documentation so that it can be used as an guide for > protocol implementors, with information from the creators, of course I see two things that really need documentation. 1) The protocol and implementation techniques. 2) The gstplib implementation. > 4) Write a very simple command-line file GSTP client that works a bit > like GNU wget without even looking at the reference implementation. If i > succeed we can know that the specification serves it's purpose. Having a > secondary independent implementation of the protocol is also good when > we want to promote the protocol as an Internet standard (RFC). gstpget(1) seems like a very suitable project. In theory you could use gstplib, but from a RFC-suitability standpoint, this would not be the best way to go. > What do you fellows think about this plan? I must once again warn you > that I don't have much time to put in, but i guess something is better > than nothing. GSTP is a slow moving project anyway.. there's no rush :) //Alexander |
|
From: Alexander <ale...@ne...> - 2002-11-04 09:20:09
|
> Of course sendfile() isn't portable, but the general idea of > zero-copy-networking is. I find it hard to believe that any > implementation of zero copy networking on any platform will include the > ability to files and then return the control to userspace every 32k, but > you never know. Why not? ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count); I don't think we'll be seeing any TLS code in the kernels any time soon though :) > Perhaps one could add an operation "get" that would be the one roundtrip > equivalent for "open; read; close". After having worked a bit with > networking in GPRS environments (read: lousy latencies) i know that > having the ability to cut down on roundtrips is a good thing. I see your point.. and we'll keep this open. I don't want to introduce any new commands for doing the same thing, but I will consider adding a new option for OPEN that tells the server to start sending data right away, as well as closing it on EOF. > I believe that mixing in personal tastes like this one in fundamental > protocol decisions will possibly prevent the protocol from being widely > used. One of the problems with say, the FTP protocol, is that is to overdeveloped. Only god knows all the features in FTP, and there are millions of ways of doing things. Just look at all the different PORT and PASV commands used today. Some use PORT, others EPORT, or LPORT, or ZPORT.. I guess what I'm saying is that there should be one way of doing things, and only one. Nothing stands in the way of changing the underlying transport layer of GSTP from TLS to plain TCP-data, other than me thinking it will fork() things where no forking is necessary. After all, when downloading a file that takes 5 minutes to download, what does it matter that it takes another second or so at the worst to negotiate a TLS connection? When downloading 500 files, it's still only done once, compared to 500 TCP handshakes with FTP. If the need should arise for an even more lightweight GSTP protocol for handheld-devices or slow links, then we'll call it "diet GSTP" and stick a green keyhole on it. > Don't you see anonymous downloads (when no user authentication what so > ever takes place) as a significant use case of GSTP? I find that > strange. Of course I do, in due time. Anonymous logins either don't issue a AUTH command, or use the standard FTP way with "anonymous/ftp" + "emailaddress". This should be in the protocol description, but it's not :) Also.. the Internet is a fast moving but extremely slow changing place.. FTP will not die quietly :) GSTP is a slow moving project.. so we have time to make it right :) The server is getting more and more stable each day, I'll run some evil tests with 10000 users across gigabit ethernet or loopback and see if we can handle the load. I don't think it will be a problem. //Alexander |