Thread: [asio-users] asio and good error handling
Brought to you by:
chris_kohlhoff
From: Jeff A. <je...@p2...> - 2018-12-27 20:59:34
|
I've used asio to implement some simple tcp and udp messaging. All's well until there's an error. The asio example code mostly just quits on error, which simplifies examples but isn't great for real life: a single service bounces and everything else decides to bounce, too, as they see a broken connection. I've not found any good examples or guidelines for how to handle different errors, perhaps along the lines of "if you get the this error while reading data, then this is a good action to take". I know this isn't one size fits all, but I also know that I don't want to wait around for less common errors just to figure out what triggered them and how to handle them. Even relatively straight-forward things like connection_refused (ECONNREFUSED) can be a bit sticky: do I always want to close and shutdown the socket or are their cases where that's excessive? Maybe if I'm the client, but if I'm the server I can't reconnect, I'm accepting and talking to an ephemeral port on the client. Thanks for any pointers. ps: After getting over some initial hurdles, I've been quite impressed with asio. Many thanks to the developers. -- Jeff Abrahamson +33 6 24 40 01 57 +44 7920 594 255 https://www.p27.eu/jeff/ https://www.transport-nantes.com/ |
From: Dmitrij V <dm...@gm...> - 2018-12-27 21:32:42
|
There are two ways to handle the errors. 1) exceptions: try { /* ... */ } catch (const asio::error_code &ec) { std::cerr << "error: " << ec.message() << std::endl; } 2) and by place: error_code ec; asio::read(sock, buff, ec); if (ec) std::cerr << "error: " << ec.message() << std::endl; If you want detailed error reporting - see class error_code (header <asio/error_code.hpp>) and its methods like: value(), category() ... PS: For me enough string by error_code::message() (for logs)... -- the best regards |
From: Jeff A. <je...@p2...> - 2018-12-27 21:46:36
|
Thanks, but I think I must not have been clear. This code will run in production. Seeing in the logs what errors have occurred is important, and I do that. But just as important is for the binary to recover. If it got an error because someone with whom it communicates failed, it should just try again, but not too fast, to reconnect if it can. If it's a network issue, maybe it doesn't need to try to reconnect the socket, just understand that it timed out and to try to retransmit again. I have enough programs running that something will always be having an issue with something. But all the examples I've been finding are, well, examples: they don't provide much help understanding the set of recovery strategies for each of the errors I might receive. I've got enough of my own bugs to keep me going: I'd be very happy not to have to get this working by thinking through every last case. Jeff On 27/12/18 22:32, Dmitrij V wrote: > There are two ways to handle the errors. > > 1) exceptions: > > try { /* ... */ } > catch (const asio::error_code &ec) { > std::cerr << "error: " << ec.message() << std::endl; > } > > 2) and by place: > > error_code ec; > > asio::read(sock, buff, ec); > > if (ec) std::cerr << "error: " << ec.message() << std::endl; > > > If you want detailed error reporting - see class error_code (header > <asio/error_code.hpp>) > and its methods like: value(), category() ... > > PS: For me enough string by error_code::message() (for logs)... > > -- > the best regards > > > _______________________________________________ > asio-users mailing list > asi...@li... > https://lists.sourceforge.net/lists/listinfo/asio-users > _______________________________________________ > Using Asio? List your project at > http://think-async.com/Asio/WhoIsUsingAsio -- Jeff Abrahamson +33 6 24 40 01 57 +44 7920 594 255 https://www.p27.eu/jeff/ https://www.transport-nantes.com/ |
From: Vinícius d. S. O. <vin...@gm...> - 2018-12-28 01:59:00
|
Em qui, 27 de dez de 2018 às 19:01, Jeff Abrahamson <je...@p2...> escreveu: > I've not found any good examples or guidelines for how to handle > different errors, perhaps along the lines of "if you get the this error > while reading data, then this is a good action to take". I know this > isn't one size fits all, but I also know that I don't want to wait > around for less common errors just to figure out what triggered them and > how to handle them. Even relatively straight-forward things like > connection_refused (ECONNREFUSED) can be a bit sticky: do I always want > to close and shutdown the socket or are their cases where that's > excessive? Maybe if I'm the client, but if I'm the server I can't > reconnect, I'm accepting and talking to an ephemeral port on the client. > > Thanks for any pointers. > Boost.Asio's core error handling transport is boost::system::error_code. There is nothing magical to unwrap out of this abstraction. It is a 2-tuple with a numerical error code (an enum) and a domain. The first component is like errno. If you and I try to extend the error types, we may hit code clash as we try to use the same numerical value for different error codes. That's why there is the error category which acts as an error domain. However I found hard to grasp the difference between error code and error condition. The “portable” name sometimes associated as the difference has given me no hints as to whenever to use each one. The following posts have enlightening information/examples as to this other matter (so do pay attention): - http://blog.think-async.com/2010/04/system-error-support-in-c0x-part-1.html - http://blog.think-async.com/2010/04/system-error-support-in-c0x-part-2.html - http://breese.github.io/2017/05/12/customizing-error-codes.html As to the main point of your question, which is trying to figure it out which errors can arise out of which operations. That's a harder question to answer. That's a question that can only be answered if you narrow it to a specific platform. Once a platform has been chosen, you can read the platform's manual (e.g. the Linux Programmer's Manual manpages if you're on Linux) to have the answer. Many errors will only happen on specific circumstances (e.g. EINTR on Linux's read() will only happen if a signal is sent to the thread, ...). Also, IO errors are open-ended/unbounded and new error types can arise in the future. For instance, modern file systems are known to employ compression and you could get an insufficient space error just by flipping one byte that changes the compression ratio and suddenly there is no more space even thou you're adding no new bytes to the filesystem (this error would never happen on FAT32). For this reason, Rust developers behind io::ErrorKind gave it special treatment (you can never perform an exhaustive match on them without adding a wildcard match): https://github.com/rust-lang/rust/blob/fb86d604bf65c3becd16180b56267a329cf268d5/src/libstd/io/error.rs#L90 But I witnessed some very intelligent Rust developers, but with not-so-much real-world experience in programming and far too much mathematical background[1] to bias their judgement to unwelcome this decision. They miss the point. IO is just the closest we get to the border between our closed abstractions and an open world and are the region most affected by incomplete modelling. “The means whereby to identify dead forms is Mathematical Law. The means whereby to understand living forms is Analogy.” — O. Spengler “Mathematizing represents a very simple and easy human activity, because it deals with fictitious entities with all particulars included, and we proceed by remembering. [...] Physical or daily-life abstractions differ considerably from mathematical abstractions. [...] In general, physical abstractions, including daily-life abstractions are such that particulars are left out — we proceed by a process of forgetting. In other words, no description or ‘definition’ will ever include all particulars.” — Korzybski “This fact has significant ramifications when considering the availability vs. consistency tradeoff that was purported by the CAP theorem. It is not the case that if we guarantee consistency, we have to give up the guarantee of availability. We never had a guarantee of availability in the first place! Rather, guaranteeing consistency causes a reduction to our already imperfect availability.” — http://dbmsmusings.blogspot.com/2018/09/newsql-database-systems-are-failing-to.html (the last quote/example is a trick that I myself have fallen into) And last, as for your request for a “general strategies” to adopt on error handling, I feel this request is more appropriate to a “design patterns book” than to Boost.Asio documentation. It'll really depend a lot on your use case and I don't feel it is necessary for Boost.Asio examples to show any error handling strategy besides showing an error message and aborting operation. [1] other programmers with “far too much mathematical background” didn't fall for this trap, so “too much mathematics” is not a problem here at all and the problem is something else (maybe a confusion in the orders of abstractions) -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/ |
From: Jeff A. <je...@p2...> - 2018-12-28 10:52:33
|
On 28/12/18 02:58, Vinícius dos Santos Oliveira wrote: > Em qui, 27 de dez de 2018 às 19:01, Jeff Abrahamson <je...@p2... > <mailto:je...@p2...>> escreveu: > > I've not found any good examples or guidelines for how to handle > different errors, [...]. > > Thanks for any pointers. > > > Boost.Asio's core error handling transport is boost::system::error_code. > > There is nothing magical to unwrap out of this abstraction. It is a > 2-tuple with a numerical error code (an enum) and a domain. The first > component is like errno. If you and I try to extend the error types, > we may hit code clash as we try to use the same numerical value for > different error codes. That's why there is the error category which > acts as an error domain. > > However I found hard to grasp the difference between error code and > error condition. The “portable” name sometimes associated as the > difference has given me no hints as to whenever to use each one. The > following posts have enlightening information/examples as to this > other matter (so do pay attention): > > * http://blog.think-async.com/2010/04/system-error-support-in-c0x-part-1.html > * http://blog.think-async.com/2010/04/system-error-support-in-c0x-part-2.html > * http://breese.github.io/2017/05/12/customizing-error-codes.html > > > As to the main point of your question, which is trying to figure it > out which errors can arise out of which operations. That's a harder > question to answer. That's a question that can only be answered if you > narrow it to a specific platform. Once a platform has been chosen, you > can read the platform's manual (e.g. the Linux Programmer's Manual > manpages if you're on Linux) to have the answer. Many errors will only > happen on specific circumstances (e.g. EINTR on Linux's read() will > only happen if a signal is sent to the thread, ...). Thanks very much, Vinícius. Your explanation and the links to Chris' explanations of the intentions of the error support mechanism make things quite a bit clearer. I also understand, through this discussion (thanks Vinnie and others), that even knowing my operating environment (linux) my desire for generalisation and clarity is, notwithstanding, probably excessive. I'll deal with those errors I find and re-read my old network programming texts to remind myself of some others that maybe I've forgotten. Indeed, one of the things that led me to asio was the realisation that the higher level libraries (ZMQ, RabbitMQ, others), which did more, still required me to keep in my head most of the networking fundamentals, but then often made it harder to respond. I'm finding that asio is a good level of abstraction for me and my tasks. > Also, IO errors are open-ended/unbounded and new error types can arise > in the future. For instance, modern file systems are known to employ > compression and you could get an insufficient space error just by > flipping one byte that changes the compression ratio and suddenly > there is no more space even thou you're adding no new bytes to the > filesystem (this error would never happen on FAT32). [...] In passing, that's a very nice example, thanks. > [...] > And last, as for your request for a “general strategies” to adopt on > error handling, I feel this request is more appropriate to a “design > patterns book” than to Boost.Asio documentation. It'll really depend a > lot on your use case and I don't feel it is necessary for Boost.Asio > examples to show any error handling strategy besides showing an error > message and aborting operation. I don't fully agree with you on that point. I understand the desire of library developers to define the scope of their project, and talking about design patterns for using a library leads inextricably into opinion (even if well-founded and interesting opinion when coming from the main developers). It also leads to a perceived feature creep, since, hey, we want to write a good library, and here's this stuff that goes well beyond documentation. At the same time, every serious developer who uses the library (asio in this case) will face the question of what to do in the face of errors or exceptions, and shutting down and giving up, like in life, is usually only appropriate on tasks where we don't care. So there's real value to be added by explaining at a high level how one might use the library effectively and develop really useful and robust software with it. It's unfortunate if each developer has to go through that learning process with the same poor tools and explanations. For example, Chris' blog posts that you linked <http://blog.think-async.com/2010/04/system-error-support-in-c0x-part-2.html> would make some very nice example code <Thanks%20very%20much,+Vin%C3%ADcius.++Your+explanation+and+the+links+to+Chris%27+explanations+of+the+intentions+of+the+error+support+mechanism+make+things+quite+a+bit+clearer.++I+also+understand,%20through%20this%20discussion%20%28thanks%20Vinnie%20and%20others%29,%20that%20even%20knowing%20my%20operating%20environment%20%28linux%29%20my%20desire%20for%20generalisation%20and%20clarity%20is%20notwithstanding%20excessive.%20%20I%27ll%20deal%20with%20those%20errors%20I%20find%20and%20re-read%20my%20old%20network%20programming%20texts%20to%20remind%20myself%20of%20some%20others%20that%20maybe%20I%27ve%20forgotten.,,Indeed,%20one%20of%20the%20things%20that%20led%20me%20to%20asio%20was%20realisation%20that%20the%20higher%20level%20libraries%20%28ZMQ,%20RabbitMQ,%20others%29,%20which%20did%20more,%20still%20required%20me%20to%20keep%20in%20my%20head%20most%20of%20the%20networking%20fundamentals,+but+then+often+made+it+harder+to+respond.++I%27m+finding+that+asio+is+a+good+level+of+abstraction.> on how to handle asio errors, even if only to show how to handle a disconnect or something trivial and common (with the comment that other response strategies are possible, but here's a good way to check for the errors and initiate the response). http://blog.think-async.com/2010/04/system-error-support-in-c0x-part-2.html https://www.boost.org/doc/libs/1_69_0/doc/html/boost_asio/examples/cpp11_examples.html Asio is hardly unique in this preference to limit scope to the point of forcing users each to learn on their own how to use it reliably. -- Jeff Abrahamson +33 6 24 40 01 57 +44 7920 594 255 https://www.p27.eu/jeff/ https://www.transport-nantes.com/ |
From: Vinnie F. <vin...@gm...> - 2018-12-28 04:17:50
|
On Thu, Dec 27, 2018 at 5:59 PM Vinícius dos Santos Oliveira <vin...@gm...> wrote: > However I found hard to grasp the difference between error code and error > condition. The “portable” name sometimes associated as the difference has > given me no hints as to whenever to use each one. This was my experience as well. And it is a shame, because error_code and error_condition are incredibly well-designed (now that I finally understand them). The blog posts and tutorials floating around the web can sometimes be a little cryptic with references to "system" errors and that sort of nonsense. I find it easiest to explain error_condition as the equivalent of an "error group." That is, multiple distinct values of error_code can map to the same error_condition. This allows code which can fail to concisely provide the exact cause of failure (error_code) while also allowing the code to determine the general category of error. For example, in a WebSocket implementation, there are many ways that network input can cause a failure. Examples: * A WebSocket Upgrade request specified HTTP/1.0 (instead of the required HTTP/1.1) * A WebSocket Upgrade request is missing the Host field * A frame was received with an illegal opcode * A control frame had the "fragment" bit set Note that the first 2 error codes above are handshake failures, while the last two are protocol errors. If the caller wants to distinguish between a handshake error or a protocol error (a common use-case) they would need to manually check the error code against the list of all known error codes, and know in which grouping they should be placed. Beast reports these four cases as distinct errors: <https://www.boost.org/doc/libs/1_69_0/libs/beast/doc/html/beast/ref/boost__beast__websocket__error.html> The error condition system allows specific error codes to be mapped to general error conditions. In the list above there are two conditions. The first is the "handshake failed" condition, while the second is the "protocol error" condition. In Beast this is represented using an error condition enum: <https://www.boost.org/doc/libs/1_69_0/libs/beast/doc/html/beast/ref/boost__beast__websocket__condition.html> Thanks to the magic (or curse) of argument dependent lookup in the design of error_code, comparisons of error codes against error conditions Just Work. Believe it or not, this compiles and works: boost::system::error_code ec = websocket::error::bad_http_version; BOOST_ASSERT(ec == websocket::condition::handshake_failed); We are actually comparing an error_code initialized from the specific, low-level error, against an error condition which represents a high level, generic grouping of websocket failure modes. That's amazing! The beauty of this wonderfully designed error system is that I can add more error codes that existing users don't know about, and map my error codes to the known conditions, then any user code which compares beast errors against the predefined error conditions will Just Work when one of the new error codes is returned. C++ really deserves a better set of documentation that explains in clear and simple terms how error codes and error conditions work, it took me quite a while to figure out and I hear from other people that they have the same experience. error_code is a little C++ gem waiting to be discovered by the general public. Regards |
From: Dmitrij V <dm...@gm...> - 2018-12-28 08:43:49
|
Vinícius dos Santos Oliveira wrote: > ... http://dbmsmusings.blogspot.com/2018/09/newsql-database-systems-are-failing-to.html Thanks for the link ! There are interesting post about consistency and available. In this moment I am writing my own database. The available is always in priority for me - guaranted by indexing, the consistency guaranted by shared mutex (in this moment I have no joins and internal functions like COUNT, foreign keys..). Sorry for no link to sources, that is closed project... -- the best regards |
From: Matt G. <mat...@jc...> - 2018-12-28 11:04:32
|
My approach is usually to add special-case handling for specific errors, as needed. Everything else is simply lumped together in a generic case. Generally speaking, I handle errors by resetting the connection state machine. Depending on what protocol you’re implementing, there might be other actions you can take to recover. In case it helps to talk about specifics, here's a snippet of a state machine I wrote to drive libdbus. In this case, the state machine stops if no further op is scheduled. Furthermore, calling OpState::processError() eventually results in the appropriate cleanup and potentially restart. Unfortunately, I can’t post the whole thing. Perhaps it could be improved through the use of error conditions. void Fd::handleOp( OpType type, const boost::system::error_code &error ) { OpState &state = m_state[type]; state.isScheduled = false; if (!error) { // process the waiting data state.processWatches( type ); // queue another op, if needed if (!state.watches.empty()) this->scheduleOp( type ); } else if (error != asio::error::make_error_code( asio::error::operation_aborted ) && error != asio::error::make_error_code( asio::error::shut_down )) { // Fwd connection loss to DBus if (error == asio::error::make_error_code( asio::error::connection_aborted ) || error == asio::error::make_error_code( asio::error::connection_reset ) || error == asio::error::make_error_code( asio::error::broken_pipe )) { state.processError( DBUS_WATCH_HANGUP ); } else state.processError( DBUS_WATCH_ERROR ); } } Matt -----Original Message----- From: Jeff Abrahamson [mailto:je...@p2...] Sent: Thursday, December 27, 2018 15:43 To: asi...@li... Subject: [asio-users] asio and good error handling I've used asio to implement some simple tcp and udp messaging. All's well until there's an error. The asio example code mostly just quits on error, which simplifies examples but isn't great for real life: a single service bounces and everything else decides to bounce, too, as they see a broken connection. I've not found any good examples or guidelines for how to handle different errors, perhaps along the lines of "if you get the this error while reading data, then this is a good action to take". I know this isn't one size fits all, but I also know that I don't want to wait around for less common errors just to figure out what triggered them and how to handle them. Even relatively straight-forward things like connection_refused (ECONNREFUSED) can be a bit sticky: do I always want to close and shutdown the socket or are their cases where that's excessive? Maybe if I'm the client, but if I'm the server I can't reconnect, I'm accepting and talking to an ephemeral port on the client. Thanks for any pointers. ps: After getting over some initial hurdles, I've been quite impressed with asio. Many thanks to the developers. -- Jeff Abrahamson +33 6 24 40 01 57 +44 7920 594 255 https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.p27.eu%2Fjeff%2F&data=02%7C01%7Cmatthew.gruenke%40jci.com%7Cbd17a4e12cd44591693c08d66c3e85ec%7Ca1f1e2147ded45b681a19e8ae3459641%7C0%7C0%7C636815413136760272&sdata=wcS5kXvlmRmjM8IYWdefRBXhVMigvhQ3BIEFbQ9PJZw%3D&reserved=0 https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.transport-nantes.com%2F&data=02%7C01%7Cmatthew.gruenke%40jci.com%7Cbd17a4e12cd44591693c08d66c3e85ec%7Ca1f1e2147ded45b681a19e8ae3459641%7C0%7C0%7C636815413136760272&sdata=z6tzb9PM%2Fr6LVD9va41TuMRu5DgNzOSg6p1MLma9oDg%3D&reserved=0 _______________________________________________ asio-users mailing list asi...@li...<mailto:asi...@li...> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fasio-users&data=02%7C01%7Cmatthew.gruenke%40jci.com%7Cbd17a4e12cd44591693c08d66c3e85ec%7Ca1f1e2147ded45b681a19e8ae3459641%7C0%7C0%7C636815413136760272&sdata=rBXMBRg%2FcK5riHdPDgw8fBz7B%2B7EK%2F89KEZneyHdR28%3D&reserved=0 _______________________________________________ Using Asio? List your project at https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fthink-async.com%2FAsio%2FWhoIsUsingAsio&data=02%7C01%7Cmatthew.gruenke%40jci.com%7Cbd17a4e12cd44591693c08d66c3e85ec%7Ca1f1e2147ded45b681a19e8ae3459641%7C0%7C0%7C636815413136760272&sdata=HHXWCBqP2zQ0AA4ngIwD6uNghjG4C6%2BjcR4uc4jPCoM%3D&reserved=0 |
From: Vinnie F. <vin...@gm...> - 2018-12-28 13:50:14
|
On Fri, Dec 28, 2018 at 4:13 AM Matt Gruenke <mat...@jc...> wrote: > else if (error != asio::error::make_error_code( asio::error::operation_aborted ) && Umm........errrrmm.... how do I put this... ...there's no need to call make_error_code, in fact calls to make_error_code should be rare. This function is called for you by the implementation to implicitly convert an error enumeration value to error_code. Your code above could be written as: else if (error != asio::error::operation_aborted && On the bright side, this will save you some typing going forward :) :) :) Regards |
From: Bjorn R. <br...@ma...> - 2018-12-28 16:32:20
|
On 12/28/18 5:17 AM, Vinnie Falco wrote: > understand them). The blog posts and tutorials floating around the web > can sometimes be a little cryptic with references to "system" errors > and that sort of nonsense. > > I find it easiest to explain error_condition as the equivalent of an > "error group." That is, multiple distinct values of error_code can map > to the same error_condition. This allows code which can fail to > concisely provide the exact cause of failure (error_code) while also > allowing the code to determine the general category of error. As one of said blog posts succinctly states: <quote> Use std::error_code for error propagation. Use std::error_code for comparison within a category. Use std::error_condition for comparison between categories. </quote> |
From: Vinnie F. <vin...@gm...> - 2018-12-28 16:40:59
|
On Fri, Dec 28, 2018 at 8:33 AM Bjorn Reese <br...@ma...> wrote: > > understand them). The blog posts and tutorials floating around the web > > can sometimes be a little cryptic with references to "system" errors > > and that sort of nonsense. > > As one of said blog posts succinctly states: > > <quote> > Use std::error_code for error propagation. > Use std::error_code for comparison within a category. > Use std::error_condition for comparison between categories. > </quote> I'm not sure if you are quoting this to support my statement or to provide a counter-example... I'll assume you are supporting my statement that most of the blog posts are cryptic :) |
From: Bjorn R. <br...@ma...> - 2018-12-28 18:25:54
|
On 12/28/18 5:40 PM, Vinnie Falco wrote: > On Fri, Dec 28, 2018 at 8:33 AM Bjorn Reese <br...@ma...> wrote: >>> understand them). The blog posts and tutorials floating around the web >>> can sometimes be a little cryptic with references to "system" errors >>> and that sort of nonsense. >> >> As one of said blog posts succinctly states: >> >> <quote> >> Use std::error_code for error propagation. >> Use std::error_code for comparison within a category. >> Use std::error_condition for comparison between categories. >> </quote> > > I'm not sure if you are quoting this to support my statement or to > provide a counter-example... > > I'll assume you are supporting my statement that most of the blog > posts are cryptic :) Do you find my quote above cryptic? |
From: Vinnie F. <vin...@gm...> - 2018-12-28 18:30:25
|
On Fri, Dec 28, 2018 at 10:27 AM Bjorn Reese <br...@ma...> wrote: > >> <quote> > >> Use std::error_code for error propagation. > >> Use std::error_code for comparison within a category. > >> Use std::error_condition for comparison between categories. > >> </quote> > > > Do you find my quote above cryptic? Yes! This is exactly the problem! I mean, I understand the quote NOW, that I have finally figured out how error codes and categories work. But I suspect that quote above is practically meaningless for an average user who has never encountered it before, including myself. Regards |
From: Bjorn R. <br...@ma...> - 2018-12-29 11:43:56
|
On 12/28/18 7:30 PM, Vinnie Falco wrote: > Yes! This is exactly the problem! I mean, I understand the quote NOW, > that I have finally figured out how error codes and categories work. > But I suspect that quote above is practically meaningless for an > average user who has never encountered it before, including myself. You would obviously need to read the rest of the blog post that leads to the quoted guidelines, but the guidelines can be applied even without a thorough understanding of error codes and categories -- at the very least it tells you not to worry about error_condition unless you write your own error category. |