From: Jonathan C. R. <jon...@ba...> - 2004-04-20 07:46:19
|
Hi Bernard et. al., As mentioned before on the mailing list, I am working on a play queue/multi song buffer, and incorporating mp3 playback into podzilla. In the process, I have run into a problem with memory management. After coding and testing on the host (works great with libmad!) I cross-compiled and installed the app on my iPod. Upon initialisation I allocate 16M for the buffer, but this fails on the iPod. (Okay, I probably need to allocate several blocks instead of one chunk, because there is definately more than 20 M free, albeit not contiguous.) I then wrote a simple-minded programme to probe the maximum amount of contiguous memory: int main() { int bytes = 1; char* buffer; while (buffer = (char*) malloc(bytes * sizeof(char))) { free(buffer); printf ("success with %d bytes\n", bytes); bytes <<=1; } return 1; } The app hangs after 64-512 bytes, with endless errors scrolling by on the lcd display, too fast to read. I am invoking the app via telnet with the 2.4.24-ipod0 kernel. I have not tried it standalone. As there are examples of malloc in existing code, I can only presume that there is a problem with free(), or that I am missing the point entirely... (I haven't coded plain c in a couple of years). Anbody else seen this behaviour? If not, any pointers as to how I can debug it? (Slowing down the console messages would be a start!) A quick update on my progress: I have built a standalone libmad app with console input (play/pause/quit) and with a modest (1 song) buffer (using mmap). This works fine on the iPod, except of course that it is not 100% real time... (a couple of tweaks in the configuration of libmad did improve matters drastically though! I still have hope...). Otherwise, I have built a circular buffer for multiple-track caching into Podzilla. Playback is incorporated into the main loop by decoding a single frame in between each successive poll of the event loop. As soon as I have sorted out the allocation problems and tidied up the code a little, I will post my patches. Thanks, Jonathan. P.S. since I upgraded to the ipod0 kernel, I have not been able to ftp -p 192.10.1.1 anymore. Anyone know what's going on there? |
From: Matthew J. S. <ge...@do...> - 2004-04-20 08:26:56
|
Forgive me if I'm mistaken about what you're trying to do, but would it not be possible simply to get the songs chosen for play, and attempt to allocate memory per song, and just retain a list of where these songs are in memory and then point to those for playing? It seems sort of strange that you're trying to re-implement something that's technically built into the memory manager. If there is no contiguous block large enough to hold the mp3 then you could attempt to split, and store half in one part half in another, and continue to split until it does fit... just an idea. On Tue, 20 Apr 2004 09:46:11 +0200 "Jonathan C. Ross" <jon...@ba...> wrote: > Hi Bernard et. al., > > As mentioned before on the mailing list, I am working on a play > queue/multi song buffer, and incorporating mp3 playback into podzilla. > In the process, I have run into a problem with memory management. After > coding and testing on the host (works great with libmad!) I > cross-compiled and installed the app on my iPod. Upon initialisation I > allocate 16M for the buffer, but this fails on the iPod. (Okay, I > probably need to allocate several blocks instead of one chunk, because > there is definately more than 20 M free, albeit not contiguous.) I then > wrote a simple-minded programme to probe the maximum amount of > contiguous memory: > > int main() > { > int bytes = 1; > char* buffer; > > while (buffer = (char*) malloc(bytes * sizeof(char))) { > free(buffer); > printf ("success with %d bytes\n", bytes); > bytes <<=1; > } > > return 1; > } > > The app hangs after 64-512 bytes, with endless errors scrolling by on > the lcd display, too fast to read. I am invoking the app via telnet > with the 2.4.24-ipod0 kernel. I have not tried it standalone. As there > are examples of malloc in existing code, I can only presume that there > is a problem with free(), or that I am missing the point entirely... (I > haven't coded plain c in a couple of years). > > Anbody else seen this behaviour? If not, any pointers as to how I can > debug it? (Slowing down the console messages would be a start!) > > A quick update on my progress: I have built a standalone libmad app with > console input (play/pause/quit) and with a modest (1 song) buffer (using > mmap). This works fine on the iPod, except of course that it is not > 100% real time... (a couple of tweaks in the configuration of libmad did > improve matters drastically though! I still have hope...). Otherwise, > I have built a circular buffer for multiple-track caching into > Podzilla. Playback is incorporated into the main loop by decoding a > single frame in between each successive poll of the event loop. > > As soon as I have sorted out the allocation problems and tidied up the > code a little, I will post my patches. > > Thanks, > Jonathan. > > P.S. since I upgraded to the ipod0 kernel, I have not been able to ftp > -p 192.10.1.1 anymore. Anyone know what's going on there? > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > iPodlinux-devel mailing list > iPo...@li... > https://lists.sourceforge.net/lists/listinfo/ipodlinux-devel > |
From: Benjamin H. <be...@ke...> - 2004-04-21 00:23:35
|
On Wed, 2004-04-21 at 07:36, Matthew J. Sahagian wrote: > Forgive me if I'm mistaken about what you're trying to do, but would it ure > be possible simply to get the songs chosen for play, and attempt to allocate > memory per song, and just retain a list of where these songs are in memory > and then point to those for playing? It seems sort of strange that you're > trying to re-implement something that's technically built into the memory > manager. If there is no contiguous block large enough to hold the mp3 then > you could attempt to split, and store half in one part half in another, > and continue to split until it does fit... just an idea. It's a lot more easier imho to just have a ring buffer of MP3 data chunks. One chunk can be a complete song, or a partial chunk (buffer too full to store a complete chunk). The iPod can typically "wakeup" when it's at about 20 seconds of buffer exhaustion to refill with the next songs in the list (Provided the refilling can be done without interrupting the playback, dunno if the dual CPU architecture allow you that, I doubt your IDE can do DMA, can it ?) > > On Tue, 20 Apr 2004 09:46:11 +0200 > "Jonathan C. Ross" <jon...@ba...> wrote: > > > Hi Bernard et. al., > > > > As mentioned before on the mailing list, I am working on a play > > queue/multi song buffer, and incorporating mp3 playback into podzilla. > > In the process, I have run into a problem with memory management. After > > coding and testing on the host (works great with libmad!) I > > cross-compiled and installed the app on my iPod. Upon initialisation I > > allocate 16M for the buffer, but this fails on the iPod. (Okay, I > > probably need to allocate several blocks instead of one chunk, because > > there is definately more than 20 M free, albeit not contiguous.) I then > > wrote a simple-minded programme to probe the maximum amount of > > contiguous memory: > > > > int main() > > { > > int bytes = 1; > > char* buffer; > > > > while (buffer = (char*) malloc(bytes * sizeof(char))) { > > free(buffer); > > printf ("success with %d bytes\n", bytes); > > bytes <<=1; > > } > > > > return 1; > > } > > > > The app hangs after 64-512 bytes, with endless errors scrolling by on > > the lcd display, too fast to read. I am invoking the app via telnet > > with the 2.4.24-ipod0 kernel. I have not tried it standalone. As there > > are examples of malloc in existing code, I can only presume that there > > is a problem with free(), or that I am missing the point entirely... (I > > haven't coded plain c in a couple of years). > > > > Anbody else seen this behaviour? If not, any pointers as to how I can > > debug it? (Slowing down the console messages would be a start!) > > > > A quick update on my progress: I have built a standalone libmad app with > > console input (play/pause/quit) and with a modest (1 song) buffer (using > > mmap). This works fine on the iPod, except of course that it is not > > 100% real time... (a couple of tweaks in the configuration of libmad did > > improve matters drastically though! I still have hope...). Otherwise, > > I have built a circular buffer for multiple-track caching into > > Podzilla. Playback is incorporated into the main loop by decoding a > > single frame in between each successive poll of the event loop. > > > > As soon as I have sorted out the allocation problems and tidied up the > > code a little, I will post my patches. > > > > Thanks, > > Jonathan. > > > > P.S. since I upgraded to the ipod0 kernel, I have not been able to ftp > > -p 192.10.1.1 anymore. Anyone know what's going on there? > > > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: IBM Linux Tutorials > > Free Linux tutorial presented by Daniel Robbins, President and CEO of > > GenToo technologies. Learn everything from fundamentals to system > > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > > _______________________________________________ > > iPodlinux-devel mailing list > > iPo...@li... > > https://lists.sourceforge.net/lists/listinfo/ipodlinux-devel > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > iPodlinux-devel mailing list > iPo...@li... > https://lists.sourceforge.net/lists/listinfo/ipodlinux-devel -- Benjamin Herrenschmidt <be...@ke...> |
From: Jonathan C. R. <jon...@ba...> - 2004-04-20 10:26:47
|
Hi, I think you're overlooking something here... What if the track exceeds physical memory in size? (I don't know whether you've tried playing a long movement of a symphony or an audio book with the patched mp3example, but believe me, it won't work ;-) ). It's not like I'm trying to re-implement the functionality of the memory manager anyway - I'm just trying to use it. My problem was encountered when invoking free(). ( In the worst case, there is actually something physically wrong with my iPod ;). ) The way I see it, we can copy Apple's caching mechanism - have 'up to 4 songs cached' (i.e. around 16M), and even improve on it. The improvement would be in caching the first couple of frames of a larger number of songs - say, the next 12 songs, and the last 12. The improvement here is that, in my experience, one tends to skip tracks very frequently when playing large playlists. When you skip forward two or three tracks you get a cache miss, and the unit hangs while the disk spins up. By caching 'heads' of the songs, we'll have a much more responsive play queue. Cheers, Jonathan. On Tue, 2004-04-20 at 23:36, Matthew J. Sahagian wrote: > Forgive me if I'm mistaken about what you're trying to do, > but would it not be possible simply to get the songs > chosen for play, and attempt to allocate memory per song, > and just retain a list of where these songs are in memory > and then point to those for playing? It seems sort of > strange that you're trying to re-implement something > that's technically built into the memory manager. > If there is no contiguous block large enough to hold > the mp3 then you could attempt to split, and store > half in one part half in another, and continue to split > until it does fit... just an idea. |
From: Matthew J. S. <ge...@do...> - 2004-04-20 18:16:32
|
Fair enough, but the search for the contiguous memory still seems like reimplementation. Even if you want a couple frames of 24 songs, you could easily split those frame and then use the sizes of those for your malloc call. Either way your ultimate goal it to just keep track of them with an index. so to speak. Then once someone flips to that song it rips those initial frames from the memory location and starts to read the rest off of the disk. Either way, my statement about reimplementing had to do with finding contiguous blocks of memory, which seems for the most part, unnecessary, assuming you have a good enough method for tracking where the initial frames are loaded. One idea I had originally was a RAMDISK (which would also seem to solve your problem rather easily (at least with whole songs). I was looking forward to a RAMDISk for various parts of podzilla -- including graphics that get displayed as part of the software. Maybe the RAMDISK could simply be pushed forward for what you're going to use it for... not sure. On Tue, 20 Apr 2004 12:26:31 +0200 "Jonathan C. Ross" <jon...@ba...> wrote: > Hi, > > I think you're overlooking something here... What if the track exceeds > physical memory in size? (I don't know whether you've tried playing a > long movement of a symphony or an audio book with the patched > mp3example, but believe me, it won't work ;-) ). > > It's not like I'm trying to re-implement the functionality of the memory > manager anyway - I'm just trying to use it. My problem was encountered > when invoking free(). ( In the worst case, there is actually something > physically wrong with my iPod ;). ) > > The way I see it, we can copy Apple's caching mechanism - have 'up to 4 > songs cached' (i.e. around 16M), and even improve on it. The > improvement would be in caching the first couple of frames of a larger > number of songs - say, the next 12 songs, and the last 12. The > improvement here is that, in my experience, one tends to skip tracks > very frequently when playing large playlists. When you skip forward two > or three tracks you get a cache miss, and the unit hangs while the disk > spins up. By caching 'heads' of the songs, we'll have a much more > responsive play queue. > > Cheers, > Jonathan. > > On Tue, 2004-04-20 at 23:36, Matthew J. Sahagian wrote: > > Forgive me if I'm mistaken about what you're trying to do, > > but would it not be possible simply to get the songs > > chosen for play, and attempt to allocate memory per song, > > and just retain a list of where these songs are in memory > > and then point to those for playing? It seems sort of > > strange that you're trying to re-implement something > > that's technically built into the memory manager. > > If there is no contiguous block large enough to hold > > the mp3 then you could attempt to split, and store > > half in one part half in another, and continue to split > > until it does fit... just an idea. > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > iPodlinux-devel mailing list > iPo...@li... > https://lists.sourceforge.net/lists/listinfo/ipodlinux-devel > |
From: Benjamin H. <be...@ke...> - 2004-04-21 00:24:33
|
On Wed, 2004-04-21 at 17:26, Matthew J. Sahagian wrote: > Fair enough, but the search for the contiguous memory still seems like reimplementation. Even if you want a couple frames of 24 songs, you could easily split those frame and then use the sizes of those for your malloc call. Either way your ultimate goal it to just keep track of them with an index. so to speak. Then once someone flips to that song it rips those initial frames from the memory location and starts to read the rest off of the disk. Either way, my statement about reimplementing had to do with finding contiguous blocks of memory, which seems for the most part, unnecessary, assuming you have a good enough method for tracking where the initial frames are loaded. One idea I had originally was a RAMDISK (which would also seem to solve your problem rather easily (at least with whole songs). I was looking forward to a RAMDISk for various parts of podzilla -- including graphics that get displayed as part of the software. Maybe the RAMDISK could simply be pushed f! orw > ard for what you're going to use it for... not sure. I still wonder why you bother... use a ring buffer :) Ben. |
From: Bernard L. <le...@bo...> - 2004-04-20 08:43:13
|
Hi Jonathan, sounds like you're making some really good progress there! It would be really neat if we could get libmad decoding at full speed. The big win there will probably be if we can move some of the data that is accessed frequently into the on-chip ram. The only question is which data to move. Currently the onchip ram is used as a circular buffer for buffering data that is to be written to the DAC (the CPU is a produced and the COP a consumer on this buffer). At the moment it is using the entire 96K for that purpose so if that were reduced then some could be used by the user process. (E.g. just create a pointer to 0x40000000 somewhere ;)). Regarding the big malloc problem that is a little strange. The patches in the mp3example player malloc up a chunk large enough for the entire file so you can definitely malloc more than 512 bytes. The uclinux 2.4 has a special slab allocator (non power of 2 version) which we use to allow big mallocs. (Sofar 2.6 doesn't have one so I'm not sure what the plan there is). In /proc are some files which can give you some status on the memory subsystem. I can't remember their exact names but they should be fairly obvious. As for the -ipod0 and ftp problems the default ip address changed to 192.168.222.2, maybe that is causing you some problems? Anyhow looking forward to seeing some of your patches ;) cheers bern. "Jonathan C. Ross" <jon...@ba...> said: > Hi Bernard et. al., > > As mentioned before on the mailing list, I am working on a play > queue/multi song buffer, and incorporating mp3 playback into podzilla. > In the process, I have run into a problem with memory management. After > coding and testing on the host (works great with libmad!) I > cross-compiled and installed the app on my iPod. Upon initialisation I > allocate 16M for the buffer, but this fails on the iPod. (Okay, I > probably need to allocate several blocks instead of one chunk, because > there is definately more than 20 M free, albeit not contiguous.) I then > wrote a simple-minded programme to probe the maximum amount of > contiguous memory: > > int main() > { > int bytes = 1; > char* buffer; > > while (buffer = (char*) malloc(bytes * sizeof(char))) { > free(buffer); > printf ("success with %d bytesn", bytes); > bytes <<=1; > } > > return 1; > } > > The app hangs after 64-512 bytes, with endless errors scrolling by on > the lcd display, too fast to read. I am invoking the app via telnet > with the 2.4.24-ipod0 kernel. I have not tried it standalone. As there > are examples of malloc in existing code, I can only presume that there > is a problem with free(), or that I am missing the point entirely... (I > haven't coded plain c in a couple of years). > > Anbody else seen this behaviour? If not, any pointers as to how I can > debug it? (Slowing down the console messages would be a start!) > > A quick update on my progress: I have built a standalone libmad app with > console input (play/pause/quit) and with a modest (1 song) buffer (using > mmap). This works fine on the iPod, except of course that it is not > 100% real time... (a couple of tweaks in the configuration of libmad did > improve matters drastically though! I still have hope...). Otherwise, > I have built a circular buffer for multiple-track caching into > Podzilla. Playback is incorporated into the main loop by decoding a > single frame in between each successive poll of the event loop. > > As soon as I have sorted out the allocation problems and tidied up the > code a little, I will post my patches. > > Thanks, > Jonathan. > > P.S. since I upgraded to the ipod0 kernel, I have not been able to ftp > -p 192.10.1.1 anymore. Anyone know what's going on there? > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > iPodlinux-devel mailing list > iPo...@li... > https://lists.sourceforge.net/lists/listinfo/ipodlinux-devel > -- |
From: Jonathan C. R. <jon...@ba...> - 2004-04-20 11:36:32
|
Hi again, Bernard, On Tue, 2004-04-20 at 10:43, Bernard Leach wrote: > sounds like you're making some really good progress there! It would be really > neat if we could get libmad decoding at full speed. The big win there will > probably be if we can move some of the data that is accessed frequently into > the on-chip ram. The only question is which data to move. Currently the > onchip ram is used as a circular buffer for buffering data that is to be > written to the DAC (the CPU is a produced and the COP a consumer on this > buffer). At the moment it is using the entire 96K for that purpose so if that > were reduced then some could be used by the user process. (E.g. just create a > pointer to 0x40000000 somewhere ;)). Whoa! The entire decoder should work directly in that memory! Isn't ~16-32K enough for the DAC buffer? Should I just reduce DMA_BASE (in audio.c) and then 'allocate' a decoder circular buffer somewhere after that? I reckon having the decoder struct and the bit stream buffer in the on-chip ram will be good enough, but perhaps we'll have to get the Huffman tables and some of the code on there too... We'll see. Doing it in user space definitely makes sense for now - but when we've gotten that working, I reckon it would be smart to actually move the decoder to the kernel. We could use the AFMT_MPEG for the SNDCTL_DSP_SETFMT call, looks quite clean to me... ( Why isn't there an AFMT_OGG value defined in <linux/soundcard.h> ? ;) ) And then we could migrate it to the COP so it could do something more than feeding the DAC.... Before I start work, which kernel/cvs-branch would you rather I worked in? I haven't actually migrated to 2.6 yet; should I? Next question: how is the work on a mixer device coming? It would be neat to have a prototype 'now playing' window with volume control within the near future. Or is there an ioctl for accessing d2a_set_vol directly? > Regarding the big malloc problem that is a little strange. The patches in the > mp3example player malloc up a chunk large enough for the entire file so you > can definitely malloc more than 512 bytes. The uclinux 2.4 has a special slab > allocator (non power of 2 version) which we use to allow big mallocs. (Sofar > 2.6 doesn't have one so I'm not sure what the plan there is). Yeah, I saw the malloc() in the patch... I also didn't have any problems with an mmap of ~7M in my prototype libmad code. My problem seems to simply be related to successive malloc/free calls. It would be good to know if anybody can reproduce this with the code I posted earlier. Actually I'm not so sure I was using the 2.4.24-ipod kernel any more - I remember installing it, hearing that the mp3example playback was worse than in the rc2 kernel and then rolling back (I probably should have let you know this when you did the release...) I'll try reproducing the memory problem with various permutations of stand-alone, telnet and the various kernels if I have the time, and tell you how I get on. > In /proc are some files which can give you some status on the memory > subsystem. I can't remember their exact names but they should be fairly obvious. anything other than /proc/meminfo I should look for? ;) > As for the -ipod0 and ftp problems the default ip address changed to > 192.168.222.2, maybe that is causing you some problems? Well, as I can connect, run ls/cd etc.. No, I don't think the IP address is the problem. All works fine except for the actual transfer. Again, this is actually with the rc2 kernel, not 2.4.24-ipod. Catch you soon, Jonathan. |
From: Bernard L. <le...@bo...> - 2004-04-20 12:01:09
|
"Jonathan C. Ross" <jon...@ba...> said: > Hi again, Bernard, > > > Whoa! The entire decoder should work directly in that memory! Isn't > ~16-32K enough for the DAC buffer? Should I just reduce DMA_BASE (in > audio.c) and then 'allocate' a decoder circular buffer somewhere after > that? I reckon having the decoder struct and the bit stream buffer in > the on-chip ram will be good enough, but perhaps we'll have to get the > Huffman tables and some of the code on there too... We'll see. I'm not sure of what good values would be. Some profiling/experimentation is needed. One reason the DAC buffer is even there in the first place is that the COP/CPU have separate data caches (non-snooped) so they can see inconsistent views of SDRAM, the chip ram isn't cached so there is no problem there. This could be fixed by cache flushing (I have the basics of that sorted but not-implemented) and then that buffer could be moved to sdram. Apple's firmware definitely sticks code in there (that looks like audio stuff) but I'm not sure. If you define a special linker segment for 0x40000000 and then relocate the decoder there that might be a good speedup. What we really need though is some kind of profiling on memory usage for the decoder. Perhaps something like the Armulator does this (or could be modified). If we knew a little more about the memory usage pattern then it would be easier to optimise usage of the fast ram. > Doing it in user space definitely makes sense for now - but when we've > gotten that working, I reckon it would be smart to actually move the > decoder to the kernel. We could use the AFMT_MPEG for the > SNDCTL_DSP_SETFMT call, looks quite clean to me... ( Why isn't there an > AFMT_OGG value defined in <linux/soundcard.h> ? ;) ) And then we could > migrate it to the COP so it could do something more than feeding the > DAC.... Sure, thats the long term plan. > Before I start work, which kernel/cvs-branch would you rather I worked > in? I haven't actually migrated to 2.6 yet; should I? 2.4. 2.6 is great except that it corrupts any disk you mount read-write ;) Most of the drivers are 1-1 copies sofar and will stay that way until the IDE is sorted (at which point 2.6 will becomee my dev platform). > Next question: how is the work on a mixer device coming? It would be > neat to have a prototype 'now playing' window with volume control within > the near future. Or is there an ioctl for accessing d2a_set_vol > directly? Last night I was working on audio recording, once that is working I'll add mixer support. It should be pretty easy as the only part missing is the Linux device plumbing that connects userspace to the d2a_set_vol. > > Regarding the big malloc problem that is a little strange. The patches in the > > mp3example player malloc up a chunk large enough for the entire file so you > > can definitely malloc more than 512 bytes. The uclinux 2.4 has a special slab > > allocator (non power of 2 version) which we use to allow big mallocs. (Sofar > > 2.6 doesn't have one so I'm not sure what the plan there is). > > Yeah, I saw the malloc() in the patch... I also didn't have any > problems with an mmap of ~7M in my prototype libmad code. My problem > seems to simply be related to successive malloc/free calls. It would be > good to know if anybody can reproduce this with the code I posted > earlier. Ok, that is strange. I'll give it a go next chance I get. Are you using a particular version of uclibc? Or just the one in the toolchain? Which toolchain version? It could be worth checking the uclibc and uclinux mailing lists. > Actually I'm not so sure I was using the 2.4.24-ipod kernel any more - I > remember installing it, hearing that the mp3example playback was worse > than in the rc2 kernel and then rolling back (I probably should have let > you know this when you did the release...) I'll try reproducing the > memory problem with various permutations of stand-alone, telnet and the > various kernels if I have the time, and tell you how I get on. What was the mp3example playback problem? I didn't realise there was any difference between rc2 and the final version there... Its possible that I upgraded my toolchain for the final version but I'm fairly sure I didn't. Anyhow I realise what a PITA that type of testing is but any info you come up with would be great. > > In /proc are some files which can give you some status on the memory > > subsystem. I can't remember their exact names but they should be fairly obvious. > > anything other than /proc/meminfo I should look for? ;) /proc/slabinfo and there is a new uclinux one that shows the memory map. > > As for the -ipod0 and ftp problems the default ip address changed to > > 192.168.222.2, maybe that is causing you some problems? > > Well, as I can connect, run ls/cd etc.. No, I don't think the IP > address is the problem. All works fine except for the actual transfer. > Again, this is actually with the rc2 kernel, not 2.4.24-ipod. Oh, strange again. I normally use wget from the iPod to fetch stuff from the PC with http. I'll try with ftp... cheers, bern. |
From: Benjamin H. <be...@ke...> - 2004-04-21 00:13:12
|
On Tue, 2004-04-20 at 22:00, Bernard Leach wrote: > What we really need though is some kind of profiling on memory usage for the > decoder. Perhaps something like the Armulator does this (or could be > modified). If we knew a little more about the memory usage pattern then it > would be easier to optimise usage of the fast ram. Also, the ARM ABI isn't very stack intensive (at least not as much as m68k is) but it may still be a good optimisation to put the decoder stack in the fastest memory type you have. |
From: Jonathan C. R. <jon...@ba...> - 2004-04-25 20:19:59
|
On Wed, 2004-04-21 at 02:12, Benjamin Herrenschmidt wrote: > On Tue, 2004-04-20 at 22:00, Bernard Leach wrote: > > > What we really need though is some kind of profiling on memory usage for the > > decoder. Perhaps something like the Armulator does this (or could be > > modified). If we knew a little more about the memory usage pattern then it > > would be easier to optimise usage of the fast ram. > > Also, the ARM ABI isn't very stack intensive (at least not as much as > m68k is) but it may still be a good optimisation to put the decoder > stack in the fastest memory type you have. > Hi Benjamin, I've been looking at the libmad code, and an awful lot of the calculation seems to take place on the stack... Your idea seems very good to me. How does one go about putting the stack somewhere else? I tried hacking an ld script but I must confess I'm a bit stuck... Either I don't understand the scripts at all, or arm-elf-ld doesn't like being told what to do. The same goes for moving some code: after profiling on an i386, I tried using __attribute__ to put some code on the on-board memory, to no avail. On the up-side, I've managed to reduce the audio buffer in my kernel to 16k, and have noticed a bit of speed-up from 'allocating' the libmad data structures after 0x4000800C. I guess playback is around 95% speed now. Cheers, Jonathan |
From: Benjamin H. <be...@ke...> - 2004-04-26 01:13:35
|
> I've been looking at the libmad code, and an awful lot of the > calculation seems to take place on the stack... Your idea seems very > good to me. How does one go about putting the stack somewhere else? I > tried hacking an ld script but I must confess I'm a bit stuck... Either > I don't understand the scripts at all, or arm-elf-ld doesn't like being > told what to do. > > The same goes for moving some code: after profiling on an i386, I tried > using __attribute__ to put some code on the on-board memory, to no > avail. > > On the up-side, I've managed to reduce the audio buffer in my kernel > to 16k, and have noticed a bit of speed-up from 'allocating' the > libmad data structures after 0x4000800C. I guess playback is around 95% > speed now. Well, there are 2 things. One is to allocate the memory in the SRAM, and the other is to call routines with the stack there. Normally, Linus has one stack per process/thread, so you need to chose carefuly where/when to switch stack there. Then, what I'd do is to create a small asm trampoline that switches the stack, store the old stack pointer on the new stack (for the exit path), and jump to the routine that has to be executed on the internal stack. You could do that on a thread entry point for a whole thread to use that SDRAM stack for example. Ben |
From: Jonathan C. R. <jon...@ba...> - 2004-04-27 15:38:42
|
On Apr 26, 2004, at 3:09 AM, Benjamin Herrenschmidt wrote: > > Well, there are 2 things. One is to allocate the memory in the > SRAM, and the other is to call routines with the stack there. Normally, > Linus has one stack per process/thread, so you need to chose carefuly > where/when to switch stack there. Then, what I'd do is to create a > small asm trampoline that switches the stack, store the old stack > pointer > on the new stack (for the exit path), and jump to the routine that has > to be executed on the internal stack. You could do that on a thread > entry point for a whole thread to use that SDRAM stack for example. > > Ben > Hi, I've given myself a crash-course in ARM assembly and performed a 'stack trampoline' pretty much the way you suggest. I _think_ there is some performance improvement, but It's very difficult to tell without a proper profiler. In summary, I have performed the following optimisations in libmad: 1. I have moved all dynamically allocated structures to the SRAM; 2. I have moved some static data from the sampling routines too (the 'D' structure from D.dat) 3. I have moved the stack to 0x40017ffc Guessing from audio quality and tempi, this has given me a 10-20% performance boost. I will attempt two more things now - first I'll try _moving_ the sampling code (judging by profiling on an i386 this is the bottleneck) to the SRAM, along with some of the Huffmann and layer 3 static data. If that fails, I'll try a hack where I write directly to the COP's audio buffer (instead of copying first to slow memory, converting to little-endian DSP, then writing back to SRAM). (We'll be doing this in the kernel version anyway.) As a last resort I will turn to hand-coding the dct/sample code - looking at the assembly output on sample.c, there is a lot of work to be done there... Have you got any pointers for reallocating code? As I said previously, arm-elf-ld didn't like me trying to tell it where to put stuff. Should I try a kernel-style relocation? How would you guys do it? As Bern pointed out, we really need profiling... has anybody gotten -pg to work with the arm-elf toolchain? Cheers, Jonathan. |
From: Benjamin H. <be...@ke...> - 2004-04-27 22:51:39
|
On Wed, 2004-04-28 at 01:38, Jonathan C. Ross wrote: > Have you got any pointers for reallocating code? As I said previously, > arm-elf-ld didn't like me trying to tell it where to put stuff. Should > I try > a kernel-style relocation? How would you guys do it? > > As Bern pointed out, we really need profiling... has anybody gotten > -pg to work with the arm-elf toolchain? For moving thigns around, I usually put them in separate ELF sections (I define __sram and __sramdata macros, similar to the kernel equivalent section macros) and then use the ld script to link those sections at different addresses than where they are loaded. Then, at boot, you need a bit of code that copies those to their final address. Ben. |