I want to dig much more into this asynchronous memory manager mechanism.
Now I have several questions:
1.According to Thamos's suggestion, each memory manager will has a fence object with it, which is delivered from driver's *move* function, my question is what's relationship between the memory manager's fence object and each BO's fence object?
2.What's the difference between the HW-engine-for-move and GPU? If we use GPU to do the move, can we treat the two behavior the same? I mean can the BO's synchronization be achieved through memory manager's fence object?
3.Now the BO's synchronization is acted by ttm_bo_wait, which is used in evict_bo, swapout_bo, cpu_access, bo_cleanup. cpu_access and bo_cleanup need ttm_bo_wait anyway, while the evict_bo and swapout_bo(assume GPU support this feature) needn't call ttm_bo_wait since they use the same engine, i.e. GPU. Am i right?
any comments will be appreciated
On Wed, Dec 16, 2009 at 12:12:13AM +0800, Donnie Fang wrote:
> Hi Thomas,
> I conclude your meaning as below:Some hw have way to synchronize btw different part of the GPU so for
> a. When CPU join in, it must wait for the sync object to really free the
> device address space.
> b. When CPU absent, but there are two indepent HW engines relevant to the
> space, the one must wait for the sync object.
> c. Fully pipelined bo move support when only one HW engine related to the
> Am i right?
> About *b*, let's say,
> 1)schedule copy the bo from VRAM based on HW DMA engine.
> 2) Put a corresponding sync object on the manager.
> 3) Free the vram region.
> 4) Region gets allocated.
> 5) GPU 2D render to this region.
> since GPU 2D and HW DMA engine is totally independent from each other, so
> sync object still needs to be signaled in this situation.
instance you can tell the 2D pipeline to wait on the hw dma engine before
doing any work. If hw doesn't have such synchronization capabilities
i believe it's better to only use 1 pipeline of the hw (so forget about
hw dma engine and do bo move using the 2d or 3d engine), otherwise you
will have to put the CPU in the loop and that would mean stalling the
GPU (will more than likely end up in suboptimal use of the GPU).