The threading feature is fairly new and hasn't seen a lot
of testing and/or peer review, so it is certainly possible
that it could contain some bugs.
A couple of things that come to mind:
If you set Threaded to TRUE for the tCurlMulti, you should
leave Threaded set to FALSE for the individual TCurl's.
(Otherwise, you'll get a lot more threads than you need.)
If possible, try to avoid any lengthy, CPU-intensive
processing from within your event handlers.
I will try to dig a little deeper into this in the
next few days, maybe there are some things I can do
inside the curlpas code to minimize this situation.
Thanks for your input!
- Jeff
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am testing/using cURLPas where HTTrack amd others have failed me --sending query URLs to dictionary sites and saving the retrieved answers for further analysis. I need this for my linguistics project.
While neither the URLs, nor the retrieved (ripped?) pages are anything complex, the sheer number of them can be overwhelming. Here is an example of such a query URL http://dictionary.reference.com/search?q=abacus and since the pages sizes are quite small (as I dont need any pictures etc), it is impreative that the app be threaded to get as high throughput as possible.
Anyway, that is where I am at. Now, the questions/comments :-)
I have been over the curlpas sources several times, and I'd like to konow your opinion about a few things --mostly things/suggestions about altering the unit structure.
Threading?
----------
It is confusing, to me at least, to have threading for TCurl items, and it seems it is not available in Windows anyhow; so wouldn't it be better to have the Threaded property in TCurlMulti only?
And, route all communication between the GUI and TCurls through one single TCurlMulti component in order to handle sychronizing (through mutexes or criticalsections)?
TCurl as TComponent, or TPersistent or TCollectionItem?
--------------------------------------------------------
IMHO, there doesn't really seem to be a need for TCurl to be a descendant of TComponent, it would be better if it was TPersistent (or TCollectionItem --se further below)
Keeping in mind that TCurlMulti is simply a container class for TCurl items, having TCurl as component is both an overkill and also unnecessary. Turning TCurl into TPersistent does not impair the functionality at all, yet makes it a ight-weight class.
TCurl = class(TCollectionItem) is even better, in the sense that, every TCurl item becomes available as items of a TCollection; quite useful for design-time as they show up in component editor --which is, arguably, be more developer-friendly.
Summary
----------
I have done some these changes. I have combined various *.inc files into one single unit. The resultant unit isn't terribly big, it is only about 140 kB; so unless you will object for some reason, I'd like to keep it a monolithic unit.
I have also removed TCurlBase all together as it is not, IMO, needed.
Now, TCurl is a desecendant of TPersistent; I haven't gone as far as TCollectionItem yet but it is easy.
TCurl (as item) does not have Threaded property anymore. TCurlMulti does. And, TCurlMulti also have MaxActiveThreadCount property.
Closing
----------
These are my suggestions/questions. I'd like to know of your opinion. I could send you my work if you're interested
Cheers,
Adem
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> it is imperative that the app be threaded to
> get as high throughput as possible.
The Threaded property of TCurl(Multi) has nothing to do with improving
throughput. Its only purpose is to allow the GUI to "breathe" during
blocking system calls. The underlying curl_multi_perform library call
works its magic by polling through an array of sockets. Running each
CURL handle on a separate thread won't give you any better performance.
> It is confusing, to me at least, to have threading for TCurl
> items, and it seems it is not available in Windows anyhow;
Unless I missed something, it certainly should be available,
the thing that is missing on Windows is mutex locking.
Admittedly, I don't know enough about Windows threading to know
if EnterCriticalSection and friends are compatible with msvcrt's
threading model, and I haven't found anything on MSDN that tells
me what to use instead. I'm not even sure if or where or when
a mutex is needed.
> so wouldn't it be better to have the Threaded property
> in TCurlMulti only?
Applications that only use a single TCurl might also need threading.
> there doesn't really seem to be a need for TCurl to be a
> descendant of TComponent.
The main advantage of using TComponent is that it allows the IDE
to generate event handlers. Other than that, I see no reason for
TCurl to be a component. The main reason for TCurlMulti to inherit
from TCurlBase is to allow it to share the threading code.
Indeed, that does add some overhead - maybe there needs to be
three classes, something like :
"TCurlComponent", "TCurlMultiSession" and "TCurlSingleItem"
- but that certainly doesn't make things any simpler :-)
> having TCurl as component is both overkill and unnecessary
Actually, the FreePascal console version inherits from TObject,
which does save a lot of bloat on FPC/Linux. Since that is my
primary development target, inheriting from TPersistent or
TCollection would add to the overhead, rather than reducing it.
FWIW, if you can find a way to $undef the {$DEFINE CURL_COMPONENT}
and still manage to compile on windows, that eliminates the
TComponent ancestory, and uses the "psuedo-component" instead.
( which is nothing more than a TObject with added "Tag" and "Owner"
properties. )
> I'd like to keep it a monolithic unit.
That's purely a matter of personal preference, and I prefer
to keep the separate include files, at least in my copy.
> These are my suggestions/questions.
> I'd like to know of your opinion.
Well, you have definitely raised some valid concerns.
> I could send you my work if you're interested
I appreciate your effort on this. If you make changes
that fix bugs, improve performance, reduce bloat, etc.
then I would certainly be interested!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
First, I'd like to thank the cURL team and Jeffrey for releasing this library. It is a life saver at times.
And, here is the 'but' bit :-)
When I enable Threaded for TCurlMulti, I am getting a lot of AVs.
This happens when the output of multiple TCurl items try to supply feedback information to the GUI.
Are the events (that each TCurl item fires) somehow synchronized for GUI purposes or am I supposed to handle that?
Cheers,
Adem
Hello, Adem...
The threading feature is fairly new and hasn't seen a lot
of testing and/or peer review, so it is certainly possible
that it could contain some bugs.
A couple of things that come to mind:
If you set Threaded to TRUE for the tCurlMulti, you should
leave Threaded set to FALSE for the individual TCurl's.
(Otherwise, you'll get a lot more threads than you need.)
If possible, try to avoid any lengthy, CPU-intensive
processing from within your event handlers.
I will try to dig a little deeper into this in the
next few days, maybe there are some things I can do
inside the curlpas code to minimize this situation.
Thanks for your input!
- Jeff
Hi Jeff,
[warning: long text]
I am testing/using cURLPas where HTTrack amd others have failed me --sending query URLs to dictionary sites and saving the retrieved answers for further analysis. I need this for my linguistics project.
While neither the URLs, nor the retrieved (ripped?) pages are anything complex, the sheer number of them can be overwhelming. Here is an example of such a query URL http://dictionary.reference.com/search?q=abacus and since the pages sizes are quite small (as I dont need any pictures etc), it is impreative that the app be threaded to get as high throughput as possible.
Anyway, that is where I am at. Now, the questions/comments :-)
I have been over the curlpas sources several times, and I'd like to konow your opinion about a few things --mostly things/suggestions about altering the unit structure.
Threading?
----------
It is confusing, to me at least, to have threading for TCurl items, and it seems it is not available in Windows anyhow; so wouldn't it be better to have the Threaded property in TCurlMulti only?
And, route all communication between the GUI and TCurls through one single TCurlMulti component in order to handle sychronizing (through mutexes or criticalsections)?
TCurl as TComponent, or TPersistent or TCollectionItem?
--------------------------------------------------------
IMHO, there doesn't really seem to be a need for TCurl to be a descendant of TComponent, it would be better if it was TPersistent (or TCollectionItem --se further below)
Keeping in mind that TCurlMulti is simply a container class for TCurl items, having TCurl as component is both an overkill and also unnecessary. Turning TCurl into TPersistent does not impair the functionality at all, yet makes it a ight-weight class.
TCurl = class(TCollectionItem) is even better, in the sense that, every TCurl item becomes available as items of a TCollection; quite useful for design-time as they show up in component editor --which is, arguably, be more developer-friendly.
Summary
----------
I have done some these changes. I have combined various *.inc files into one single unit. The resultant unit isn't terribly big, it is only about 140 kB; so unless you will object for some reason, I'd like to keep it a monolithic unit.
I have also removed TCurlBase all together as it is not, IMO, needed.
Now, TCurl is a desecendant of TPersistent; I haven't gone as far as TCollectionItem yet but it is easy.
TCurl (as item) does not have Threaded property anymore. TCurlMulti does. And, TCurlMulti also have MaxActiveThreadCount property.
Closing
----------
These are my suggestions/questions. I'd like to know of your opinion. I could send you my work if you're interested
Cheers,
Adem
> it is imperative that the app be threaded to
> get as high throughput as possible.
The Threaded property of TCurl(Multi) has nothing to do with improving
throughput. Its only purpose is to allow the GUI to "breathe" during
blocking system calls. The underlying curl_multi_perform library call
works its magic by polling through an array of sockets. Running each
CURL handle on a separate thread won't give you any better performance.
> It is confusing, to me at least, to have threading for TCurl
> items, and it seems it is not available in Windows anyhow;
Unless I missed something, it certainly should be available,
the thing that is missing on Windows is mutex locking.
Admittedly, I don't know enough about Windows threading to know
if EnterCriticalSection and friends are compatible with msvcrt's
threading model, and I haven't found anything on MSDN that tells
me what to use instead. I'm not even sure if or where or when
a mutex is needed.
> so wouldn't it be better to have the Threaded property
> in TCurlMulti only?
Applications that only use a single TCurl might also need threading.
> there doesn't really seem to be a need for TCurl to be a
> descendant of TComponent.
The main advantage of using TComponent is that it allows the IDE
to generate event handlers. Other than that, I see no reason for
TCurl to be a component. The main reason for TCurlMulti to inherit
from TCurlBase is to allow it to share the threading code.
Indeed, that does add some overhead - maybe there needs to be
three classes, something like :
"TCurlComponent", "TCurlMultiSession" and "TCurlSingleItem"
- but that certainly doesn't make things any simpler :-)
> having TCurl as component is both overkill and unnecessary
Actually, the FreePascal console version inherits from TObject,
which does save a lot of bloat on FPC/Linux. Since that is my
primary development target, inheriting from TPersistent or
TCollection would add to the overhead, rather than reducing it.
FWIW, if you can find a way to $undef the {$DEFINE CURL_COMPONENT}
and still manage to compile on windows, that eliminates the
TComponent ancestory, and uses the "psuedo-component" instead.
( which is nothing more than a TObject with added "Tag" and "Owner"
properties. )
> I'd like to keep it a monolithic unit.
That's purely a matter of personal preference, and I prefer
to keep the separate include files, at least in my copy.
> These are my suggestions/questions.
> I'd like to know of your opinion.
Well, you have definitely raised some valid concerns.
> I could send you my work if you're interested
I appreciate your effort on this. If you make changes
that fix bugs, improve performance, reduce bloat, etc.
then I would certainly be interested!