You can subscribe to this list here.
2000 |
Jan
(8) |
Feb
(49) |
Mar
(48) |
Apr
(28) |
May
(37) |
Jun
(28) |
Jul
(16) |
Aug
(16) |
Sep
(44) |
Oct
(61) |
Nov
(31) |
Dec
(24) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(56) |
Feb
(54) |
Mar
(41) |
Apr
(71) |
May
(48) |
Jun
(32) |
Jul
(53) |
Aug
(91) |
Sep
(56) |
Oct
(33) |
Nov
(81) |
Dec
(54) |
2002 |
Jan
(72) |
Feb
(37) |
Mar
(126) |
Apr
(62) |
May
(34) |
Jun
(124) |
Jul
(36) |
Aug
(34) |
Sep
(60) |
Oct
(37) |
Nov
(23) |
Dec
(104) |
2003 |
Jan
(110) |
Feb
(73) |
Mar
(42) |
Apr
(8) |
May
(76) |
Jun
(14) |
Jul
(52) |
Aug
(26) |
Sep
(108) |
Oct
(82) |
Nov
(89) |
Dec
(94) |
2004 |
Jan
(117) |
Feb
(86) |
Mar
(75) |
Apr
(55) |
May
(75) |
Jun
(160) |
Jul
(152) |
Aug
(86) |
Sep
(75) |
Oct
(134) |
Nov
(62) |
Dec
(60) |
2005 |
Jan
(187) |
Feb
(318) |
Mar
(296) |
Apr
(205) |
May
(84) |
Jun
(63) |
Jul
(122) |
Aug
(59) |
Sep
(66) |
Oct
(148) |
Nov
(120) |
Dec
(70) |
2006 |
Jan
(460) |
Feb
(683) |
Mar
(589) |
Apr
(559) |
May
(445) |
Jun
(712) |
Jul
(815) |
Aug
(663) |
Sep
(559) |
Oct
(930) |
Nov
(373) |
Dec
|
From: Alan G I. <ai...@am...> - 2006-09-22 13:52:32
|
On Fri, 22 Sep 2006, Bill Baxter apparently wrote: > Ok, here's my best shot at a generalized repmat: Sorry to always repeat this suggestion when it comes to repmat, but I think the whole approach is wrong. A repmat should be a separate object type, which behaves like the described matrix but has only one copy of the repetitive data. Cheers, Alan Isaac |
From: Bill B. <wb...@gm...> - 2006-09-22 13:31:28
|
Ok, here's my best shot at a generalized repmat: def reparray(A, tup): if numpy.isscalar(tup): tup = (tup,) d = len(tup) A = numpy.array(A,copy=False,subok=True,ndmin=d) for i,k in enumerate(tup): A = numpy.concatenate([A]*k, axis=i) return A Can anyone improve on this? The only way I could think to seriously improve it would be if there were an N-way concatenate() function in the C-API. Each call to concatenate() involves a potentially big allocation and memcpy of the data. And here's a docstring for it: """Repeat an array the number of times given in the integer tuple, tup. Similar to repmat, but works for arrays of any dimension. reparray(A,(m,n)) is equivalent to repmat(A,m,n) If tup has length d, the result will have dimension of max(d, A.ndim). If tup is scalar it is treated as a 1-tuple. If A.ndim < d, A is promoted to be d-dimensional by prepending new axes. So a shape (3,) array is promoted to (1,3) for 2-D replication, or shape (1,1,3) for 3-D replication. If this is not the desired behavior, promote A to d-dimensions manually before calling this function. If d < A.ndim, ndim is effectively promoted to A.ndim by appending 1's to tup. Examples: >>> a = array([0,1,2]) >>> reparray(a,2) array([0, 1, 2, 0, 1, 2]) >>> reparray(a,(1,2)) array([[0, 1, 2, 0, 1, 2]]) >>> reparray(a,(2,2)) array([[0, 1, 2, 0, 1, 2], [0, 1, 2, 0, 1, 2]]) >>> reparray(a,(2,1,2)) array([[[0, 1, 2, 0, 1, 2]], [[0, 1, 2, 0, 1, 2]]]) See Also: repmat, repeat """ |
From: Tim H. <tim...@ie...> - 2006-09-22 13:09:05
|
Tim Hochberg wrote: > David M. Cooke wrote: > >> On Thu, 21 Sep 2006 11:34:42 -0700 >> Tim Hochberg <tim...@ie...> wrote: >> >> >> >>> Tim Hochberg wrote: >>> >>> >>>> Robert Kern wrote: >>>> >>>> >>>> >>>>> David M. Cooke wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Let me offer a third path: the algorithms used for .mean() and .var() >>>>>>> are substandard. There are much better incremental algorithms that >>>>>>> entirely avoid the need to accumulate such large (and therefore >>>>>>> precision-losing) intermediate values. The algorithms look like the >>>>>>> following for 1D arrays in Python: >>>>>>> >>>>>>> def mean(a): >>>>>>> m = a[0] >>>>>>> for i in range(1, len(a)): >>>>>>> m += (a[i] - m) / (i + 1) >>>>>>> return m >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> This isn't really going to be any better than using a simple sum. >>>>>> It'll also be slower (a division per iteration). >>>>>> >>>>>> >>>>>> >>>>>> >>>>> With one exception, every test that I've thrown at it shows that it's >>>>> better for float32. That exception is uniformly spaced arrays, like >>>>> linspace(). >>>>> >>>>> > You do avoid >>>>> > accumulating large sums, but then doing the division a[i]/len(a) and >>>>> > adding that will do the same. >>>>> >>>>> Okay, this is true. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> Now, if you want to avoid losing precision, you want to use a better >>>>>> summation technique, like compensated (or Kahan) summation: >>>>>> >>>>>> def mean(a): >>>>>> s = e = a.dtype.type(0) >>>>>> for i in range(0, len(a)): >>>>>> temp = s >>>>>> y = a[i] + e >>>>>> s = temp + y >>>>>> e = (temp - s) + y >>>>>> return s / len(a) >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>>>> def var(a): >>>>>>> m = a[0] >>>>>>> t = a.dtype.type(0) >>>>>>> for i in range(1, len(a)): >>>>>>> q = a[i] - m >>>>>>> r = q / (i+1) >>>>>>> m += r >>>>>>> t += i * q * r >>>>>>> t /= len(a) >>>>>>> return t >>>>>>> >>>>>>> Alternatively, from Knuth: >>>>>>> >>>>>>> def var_knuth(a): >>>>>>> m = a.dtype.type(0) >>>>>>> variance = a.dtype.type(0) >>>>>>> for i in range(len(a)): >>>>>>> delta = a[i] - m >>>>>>> m += delta / (i+1) >>>>>>> variance += delta * (a[i] - m) >>>>>>> variance /= len(a) >>>>>>> return variance >>>>>>> >>>>>>> >>> I'm going to go ahead and attach a module containing the versions of >>> mean, var, etc that I've been playing with in case someone wants to mess >>> with them. Some were stolen from traffic on this list, for others I >>> grabbed the algorithms from wikipedia or equivalent. >>> >>> >> I looked into this a bit more. I checked float32 (single precision) and >> float64 (double precision), using long doubles (float96) for the "exact" >> results. This is based on your code. Results are compared using >> abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values in the >> range of [-100, 900] >> >> First, the mean. In float32, the Kahan summation in single precision is >> better by about 2 orders of magnitude than simple summation. However, >> accumulating the sum in double precision is better by about 9 orders of >> magnitude than simple summation (7 orders more than Kahan). >> >> In float64, Kahan summation is the way to go, by 2 orders of magnitude. >> >> For the variance, in float32, Knuth's method is *no better* than the two-pass >> method. Tim's code does an implicit conversion of intermediate results to >> float64, which is why he saw a much better result. >> > Doh! And I fixed that same problem in the mean implementation earlier > too. I was astounded by how good knuth was doing, but not astounded > enough apparently. > > Does it seem weird to anyone else that in: > numpy_scalar <op> python_scalar > the precision ends up being controlled by the python scalar? I would > expect the numpy_scalar to control the resulting precision just like > numpy arrays do in similar circumstances. Perhaps the egg on my face is > just clouding my vision though. > > >> The two-pass method using >> Kahan summation (again, in single precision), is better by about 2 orders of >> magnitude. There is practically no difference when using a double-precision >> accumulator amongst the techniques: they're all about 9 orders of magnitude >> better than single-precision two-pass. >> >> In float64, Kahan summation is again better than the rest, by about 2 orders >> of magnitude. >> >> I've put my adaptation of Tim's code, and box-and-whisker plots of the >> results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/ >> >> Conclusions: >> >> - If you're going to calculate everything in single precision, use Kahan >> summation. Using it in double-precision also helps. >> - If you can use a double-precision accumulator, it's much better than any of >> the techniques in single-precision only. >> >> - for speed+precision in the variance, either use Kahan summation in single >> precision with the two-pass method, or use double precision with simple >> summation with the two-pass method. Knuth buys you nothing, except slower >> code :-) >> >> > The two pass methods are definitely more accurate. I won't be convinced > on the speed front till I see comparable C implementations slug it out. > That may well mean never in practice. However, I expect that somewhere > around 10,000 items, the cache will overflow and memory bandwidth will > become the bottleneck. At that point the extra operations of Knuth won't > matter as much as making two passes through the array and Knuth will win > on speed. OK, let me take this back. I looked at this speed effect a little more and the effect is smaller than I remember. For example, "a+=a" slows down by about a factor of 2.5 on my box between 10,000 and 100,0000 elements. That's not insignificant, but assuming (naively) that this means that memory effects account to the equivalent of 1.5 add operations, this doesn't come close to closing to gap between Knuth and the two pass approach. The other thing is that while a+=a and similar operations show this behaviour, a.sum() and add.reduce(a) do not, hinting that perhaps it's only writing to memory that is a bottleneck. Or perhaps hinting that my mental model of what's going on here is badly flawed. So, +1 for me on Kahan summation for computing means and two pass with Kahan summation for variances. It would probably be nice to expose the Kahan sum and maybe even the raw_kahan_sum somewhere. I can see use for the latter in summing a bunch of disparate matrices with high precision. I'm on the fence on using the array dtype for the accumulator dtype versus always using at least double precision for the accumulator. The former is easier to explain and is probably faster, but the latter is a lot more accuracy for basically free. It depends on how the relative speeds shake out I suppose. If the speed of using float64 is comparable to that of using float32, we might as well. One thing I'm not on the fence about is the return type: it should always match the input type, or dtype if that is specified. Otherwise one gets mysterious, gratuitous upcasting. On the subject of upcasting, I'm still skeptical of the behavior of mixed numpy-scalar and python-scalar ops. Since numpy-scalars are generally the results of indexing operations, not literals, I think that they should behave like arrays for purposes of determining the resulting precision, not like python-scalars. Otherwise situations arise where the rank of the input array determine the kind of output (float32 versus float64 for example) since if the array is 1D indexing into the array yields numpy-scalars which then get promoted when they interact with python-scalars.However if the array is 2D, indexing yields a vector, which is not promoted when interacting with python scalars and the precision remains fixed. -tim > Of course the accuracy is pretty bad at single precision, so > the possible, theoretical speed advantage at large sizes probably > doesn't matter. > > -tim > > >> After 1.0 is out, we should look at doing one of the above. >> >> > > +1 > > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > m bnb |
From: Bill B. <wb...@gm...> - 2006-09-22 12:47:35
|
MjYgd2Vla3MsIDQgZGF5cywgMiBob3VycyBhbmQgOSBtaW51dGVzIGFnbywgWmRlbuxrIEh1cuFr IGFza2VkIHdoeQphdGxlYXN0XzNkIGFjdHMgdGhlIHdheSBpdCBkb2VzOgpodHRwOi8vYXJ0aWNs ZS5nbWFuZS5vcmcvZ21hbmUuY29tcC5weXRob24ubnVtZXJpYy5nZW5lcmFsLzQzODIvbWF0Y2g9 YXRsZWFzdCszZAoKSGUgZG9lc24ndCBzZWVtIHRvIGhhdmUgZ290dGVuIGFueSBhbnN3ZXJzLiAg QW5kIG5vdyBJJ20gd29uZGVyaW5nIHRoZQpzYW1lIHRoaW5nLiAgQW55b25lIGhhdmUgYW55IGlk ZWE/CgotLWJiCg== |
From: Bill B. <wb...@gm...> - 2006-09-22 12:06:19
|
Do you also have a 64-bit processor? Just checking since you didn't mention it. --bb On 9/22/06, Sve...@ff... <Sve...@ff...> wrote: > > > > > I would like to read files >2Gbyte. From earlier posting I believed it > should be possible with python 2.5. > > I am getting MemoryError when trying to read a file larger than 2Gb. > > I am using python2.5 and numpy1.0rc1. > > Any solution to the problem? > > > > SEH > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > |
From: <qw...@to...> - 2006-09-22 11:41:16
|
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=gb2312"> <title>无标题文档</title> <style type="text/css"> <!-- .td { font-size: 12px; color: #313131; line-height: 20px; font-family: "Arial", "Helvetica", "sans-serif"; } --> </style> </head> <body leftmargin="0" background="http://bo.sohu.com//images/img20040502/dj_bg.gif"> <table width="100%" border="0" cellspacing="0" cellpadding="0"> <tr> <td height="31" background="http://igame.sina.com.cn/club/images/topmenu/topMenu_8.gif" class="td"><div align="center"><font color="#FFFFFF">主办单位:易腾企业管理咨询有限公司</font></div></td> </tr> </table> <br> <table width="673" border="0" align="center" cellpadding="0" cellspacing="0"> <tr> <td height="62" bgcolor="#8C8C8C"> <div align="center"> <table width="100%" border="0" cellspacing="1" cellpadding="0"> <tr> <td height="62" bgcolor="#F3F3F3"> <p align="center"><span style="background-color: #F3F3F3"> <font size="5"><span style="font-weight: 700"> 非</span></font></span><font size="5"><span style="font-weight: 700; background-color: #F3F3F3">财务人员的财务管理<br> -</span></font><font size="4"><span style="background-color: #F3F3F3">沙盘模拟课程</span></font></td> </tr> </table> <font color="#FF0000" size="+3" face="黑体"></font></div></td> </tr> </table> <table width="673" border="0" align="center" cellpadding="0" cellspacing="0" class="td" height="1488"> <tr> <td bgcolor="#8C8C8C" height="22"> </td> </tr> <tr> <td height="1466" bgcolor="#FFFFFF"> <div align="center"> <table width="99%" border="0" cellspacing="0" cellpadding="0"> <tr> <td width="17%" height="20" bgcolor="#BF0000" class="td"> <div align="center"><font color="#FFFFFF">[课 程 背 景]</font></div></td> <td width="83%" class="td"> </td> </tr> <tr> <td height="74" colspan="2" class="td"><font FACE="??ì?,SimSun" LANG="ZH-CN"> <p ALIGN="JUSTIFY"> </font><font lang="ZH-CN" size="2"> </font><font lang="ZH-CN" size="2">每一位管理和技术人员都清楚地懂得,单纯从技术角度衡量为合算的方案,也许却是一个财务陷阱,表面赢利而暗地里亏损,使经营者无法接受。如何将技术手段与财务运作相结合,使每位管理和技术人员都从老板的角度进行思考 , 有效地规避财务陷阱,实现管理决策与经营目标的一致性?本课程通过沙盘模拟实施体验式教学,每个小组 5-6 人模拟公司的生产、销售和财务分析过程,学员在模拟对抗中相互激发、体验决策与经营企业的乐趣,同时讲师与学员共同分享解决问题的模型与工具,使学员 " 身同体受 " ,完成从 know-what 向 know-why 转化。本课程通过二十<span style="letter-spacing: -1pt">八个实际案例分析,使企业各级管理和技术人员掌握财务管理知识,利用财务信息改进管理决策,实现管理效益最大化</span>。<br> ★ 对会计与财务管理有基本了解, 提高日常管理活动的财务可行性;<br> ★ 掌握合乎财务原则的管理决策方法 , 与老板的思维同步;<br> ★ 通过成本分析,找到有效降低成本的途径;<br> ★ 通过边际效应分析,实施正确的成本决策;<br> ★ 掌握业绩评价的依据和方法,评估经营业绩 , 实施科学的业绩考核。</font></td> </tr> </table> </div> <div align="center" style="width: 671; height: 1"> </div> <div align="center"> <table width="99%" height="84" border="0" cellpadding="0" cellspacing="0"> <tr> <td width="17%" height="20" bgcolor="#0080C0" class="td"> <div align="center"><font color="#FFFFFF">[课 程 大 纲]</font></div></td> <td width="83%" class="td"> </td> </tr> <tr> <td height="64" colspan="2" class="td"> <p><font lang="ZH-CN" size="2"><font color="#FF0000">一、企业财务管理的内容及管理会计的作用</font><br> 1、财务会计的职能<br> 2、财务专家的思维模式<br> 3、财务工作的基本内容<br> 4、财务管理的四个基本原理<br> 5、会计工作的十三个基本原则<br> <font color="#FF0000">二、如何阅读和分析财务报表</font><br> 1、会计报表的构成<br> 2、资产负债表:<br> 资产负债表的逻辑结构与主要内容(透视资产负债表的各个部分)<br> 解读资产负债表(应收款、存货、应付款、固定资产、借款、股东权益)<br> 企业拥有何物,谁拥有企业<br> 财务杠杆的运用 ---- 资本结构分析<br> 3、损益表:<br> 损益表的逻辑结构与主要内容<br> 如何正确确认收入避免 ” 苹果电脑事件 ”<br> 如何计算制造成本 , 反映每种产品的实际贡献<br> 如何计算共同成本<br> 如何计算息税前利润和净利润<br> 4、现金流量表的阅读与分析<br> 现金流量表的逻辑结构与主要内容<br> 5、会计报表之间的关系( Dupont 模型的应用)<br> 6、如何从会计报表读懂企业经营状况<br> • 案例分析:读报表,判断企业业绩水平<br> <font color="#FF0000">三、如何利用财务数据分析并改善经营绩效</font><br> 1、财务比率分析<br> 2、关键财务指标解析<br> 总资产收益率 / 净资产收益率<br> 营业利润率 / 总资产周转率<br> 总资产周转率与资产管理<br> 3、盈利能力分析:资产回报率、股东权益回报率、资产流动速率<br> 4、风险指数分析:流动比率、负债 / 权益比率、营运偿债能力<br> 5、财务报表综合解读:综合运用财务信息透视公司运作水平<br> ◆ 案例分析:某上市公司的财务状况分析与经营绩效评价<br> <font color="#FF0000">四、成本分析与成本决策</font><br> 1、产品成本的概念和构成<br> 成本习性分析<br> 完全成本法<br> 变动成本法<br> 2、CVP(本-量-利)分析与运用<br> 3、固定成本与边际效应分析<br> 4、如何运用目标成本法控制产品成本,保证利润水平<br> 5、如何运用 ABC 作业成本法进行管理分析,实施精细成本管理<br> 6、如何针对沉没成本和机会成本进行正确决策<br> 7、如何改善采购、生产等环节的运作以改良企业的整体财务状况<br> ◆ 综合案例分析<br> <font color="#FF0000">五、投资决策</font><br> 1、管理和技术方案的可行性分析<br> 2、设备改造与更新的决策分析<br> 3、如何根据成本与市场进行合理定价<br> 4、长期投资项目的现金流分析<br> 5、资金的时间价值<br> 6、投资项目评价方法<br> 回收期法<br> 净现值法<br> 内部收益率法<br> ◆ 综合案例演练<br> <font color="#FF0000">六、绩效考评与内部控制</font><br> 1、如何确定责任成本?<br> 2、如何实施内部控制<br> 3、如何针对成本中心进行费用控制<br> 4、如何针对利润中心进行业绩考核<br> 5、如何针对投资中心进行业绩评价<br> 6、如何运用内部市场链进行管理,使每个部门都成为利润中心<br> • 案例研讨: GE 和海尔的绩效考评</font> </p></td> </tr> </table> <table width="99%" height="84" border="0" cellpadding="0" cellspacing="0"> <tr> <td width="17%" height="20" bgcolor="#0080C0" class="td"> <div align="center"><font color="#FFFFFF">[导 师 简 介]</font></div></td> <td width="83%" class="td"> </td> </tr> <tr> <td height="64" colspan="2" class="td"> <p> <font color="#FF0000"> </font> Mr Wang ,管理工程硕士、高级经济师,国际职业培训师协会认证职业培训师,历任跨国公司生产负责人、工业工程经理、管理会计分析师、营运总监等高级管理职务多年,同时还担任 < 价值工程 > 杂志审稿人、辽宁省营口市商业银行独立董事等职务,对企业管理有较深入的研究。 王老师主要从事 IE 技术应用、成本控制、管理会计决策等课程的讲授,先后为 IBM 、 TDK 、松下、可口可乐、康师傅、汇源果汁、雪津啤酒、吉百利食品、冠捷电子、 INTEX 明达塑胶、正新橡胶、美国 ITT 集团、广上科技、美的空调、中兴通讯、京信通信、联想电脑,应用材料 ( 中国 ) 公司、艾克森 - 金山石化、中国化工进出口公司、正大集团大福饲料、厦华集团、灿坤股份、 NEC 东金电子、太原钢铁集团、 PHILIPS 、深圳开发科技、大冷王运输制冷、三洋华强、 TCL 、美的汽车、上海贝尔阿尔卡特、天津扎努西、上海卡博特等知名企业提供项目辅导或专题培训。王老师授课经验丰富,风格幽默诙谐、逻辑清晰、过程互动,案例生动、深受学员喜爱。 </p></td> </tr> </table> </div> <div align="center"> <table width="99%" border="0" cellpadding="0" cellspacing="0" height="54"> <tr> <td width="17%" height="25" bgcolor="#0080C0" class="td"> <div align="center"><font color="#FFFFFF">[时间/地点/报名]</font></div></td> <td width="83%" class="td" height="25"> </td> </tr> <tr> <td height="29" colspan="2" class="td"> <p><b><font size="2">时间:</font><font size="2" color="#FF0000"> </font></b> <font size="2" color="#000000">9月28-29日 (周四、五) </font><font size="2" color="#FF0000"> <b> </b></font><font size="2"><b>地点: </b>上海</font></p> </td> </tr> </table> </div> <table width="99%" height="35" border="0" align="center" cellpadding="0" cellspacing="0"> <tr> <td height="15" class="td"> <p><b><font size="2" color="#000000">费用: </font></b><font size="2" color="#000000">1980元/人(含教材,午餐等) </font><b><font size="2" color="#FF0000"> </font></b><font size="2" color="#000000">四人以上参加,赠予一名名额</font> </p> </td> </tr> </table> <table width="99%" height="35" border="0" align="center" cellpadding="0" cellspacing="0"> <tr> <td height="15" class="td"> <font size="2" color="#000000"> <b>电话: </b> 谢小姐 电话:021-51187126 </font><font size="2"><br> 注:尊敬的客户:<br> 您好!以下内容为商-务-广-告;如果对您的工作及生活带来不便,在此深表歉意;如您不需要,请与我们联系</font><font size="2" color="#000000"> </font></td> </tr> </table> </td> </tr> </table> </body> </html> |
From: <Sve...@ff...> - 2006-09-22 09:07:06
|
I would like to read files >2Gbyte. From earlier posting I believed it should be possible with python 2.5. I am getting MemoryError when trying to read a file larger than 2Gb. I am using python2.5 and numpy1.0rc1. Any solution to the problem? =20 SEH |
From: Stefan v. d. W. <st...@su...> - 2006-09-22 08:22:26
|
On Fri, Sep 22, 2006 at 02:17:57AM -0500, Robert Kern wrote: > Stefan van der Walt wrote: > > Hi P., > >=20 > > On Thu, Sep 21, 2006 at 07:40:39PM -0400, PGM wrote: > >=20 > >> I'm running into the following problem with putmask on take. > >> > >>>>> import numpy > >>>>> x =3D N.arange(12.) > >>>>> m =3D [1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1] > >>>>> i =3D N.nonzero(m)[0] > >>>>> w =3D N.array([-1, -2, -3, -4.]) > >>>>> x.putmask(w,m) > >>>>> x.take(i) > >>>>> N.allclose(x.take(i),w) > >=20 > > According to the putmask docstring: > >=20 > > a.putmask(values, mask) sets a.flat[n] =3D v[n] for each n where > > mask.flat[n] is true. v can be scalar. > >=20 > > This would mean that 'w' is not of the right length.=20 >=20 > There are 4 true values in m and 4 values in w. What's the wrong length? The way I read the docstring, you use putmask like this: In [4]: x =3D N.array([1,2,3,4]) In [5]: x.putmask([4,3,2,1],[1,0,0,1]) In [6]: x Out[6]: array([4, 2, 3, 1]) > For the sake of clarity: >=20 >=20 > In [1]: from numpy import * >=20 > In [3]: m =3D [1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1] >=20 > In [4]: i =3D nonzero(m)[0] >=20 > In [5]: i > Out[5]: array([ 0, 6, 9, 11]) >=20 > In [6]: w =3D array([-1, -2, -3, -4.]) >=20 > In [7]: x =3D arange(12.) >=20 > In [8]: x.putmask(w, m) >=20 > In [9]: x > Out[9]: > array([ -1., 1., 2., 3., 4., 5., -3., 7., 8., -2., 10.= , > -4.]) >=20 > In [17]: x[array(m, dtype=3Dbool)] =3D w >=20 > In [18]: x > Out[18]: > array([ -1., 1., 2., 3., 4., 5., -2., 7., 8., -3., 10.= , > -4.]) >=20 >=20 > Out[9] and Out[18] should have been the same, but elements 6 and 9 are = flipped.=20 > It's pretty clear that this is a bug in .putmask(). Based purely on what I read in the docstring, I would expect the above to= do x[0] =3D w[0] x[6] =3D w[6] x[9] =3D w[9] x[11] =3D w[11] Since w is of length 4, you'll probably get indices modulo 4: w[6] =3D=3D w[2] =3D=3D -3 w[9] =3D=3D w[1] =3D=3D -2 w[11] =3D=3D w[3] =3D=3D -4 Which seems to explain what you are seeing. Regards St=E9fan |
From: Robert K. <rob...@gm...> - 2006-09-22 08:07:07
|
Lionel Roubeyrie wrote: > good news, and thanks for your last comment. However, using nans give some > errors with scipy.stats: > lionel52>t=array([1,2,nan,4]) > > lionel53>stats.nanmean(t) > --------------------------------------------------------------------------- > exceptions.NameError Traceback (most recent > call last) > > /home/lionel/<ipython console> > > /usr/lib/python2.4/site-packages/scipy/stats/stats.py in nanmean(x, axis) > 258 > 259 # XXX: this line is quite clearly wrong > --> 260 n = N-sum(isnan(x),axis) > 261 putmask(x,isnan(x),0) > 262 return stats.mean(x,axis)/factor > > NameError: global name 'N' is not defined It's a bug in nanmean() as the comment immediately preceding it mentions. I don't know who put it in, but I noticed it and couldn't figure out what it intended to do (or didn't have to time to try). <Looks at svn blame and svn log> Ah, it's Travis's fault. So he can fix it. :-) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Lionel R. <lro...@li...> - 2006-09-22 07:35:17
|
Le jeudi 21 septembre 2006 19:01, Travis Oliphant a =E9crit=A0: > Lionel Roubeyrie wrote: > > find any solution for that. I have tried with arrays of dtype=3Dobject,= but > > I have problem when I want to compute min, max, ... with an error like: > > TypeError: function not supported for these types, and can't coerce > > safely to supported types. > > I just added support for min and max methods of object arrays, by adding > support for Object arrays to the minimum and maximum functions. > > -Travis Hello travis, good news, and thanks for your last comment. However, using nans give some= =20 errors with scipy.stats: lionel52>t=3Darray([1,2,nan,4]) lionel53>stats.nanmean(t) =2D------------------------------------------------------------------------= =2D- exceptions.NameError Traceback (most recent= =20 call last) /home/lionel/<ipython console> /usr/lib/python2.4/site-packages/scipy/stats/stats.py in nanmean(x, axis) 258 259 # XXX: this line is quite clearly wrong =2D-> 260 n =3D N-sum(isnan(x),axis) 261 putmask(x,isnan(x),0) 262 return stats.mean(x,axis)/factor NameError: global name 'N' is not defined > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your opinions on IT & business topics through brief surveys -- and earn > cash > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDEV > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion =2D-=20 Lionel Roubeyrie - lro...@li... LIMAIR http://www.limair.asso.fr |
From: Robert K. <rob...@gm...> - 2006-09-22 07:18:32
|
Stefan van der Walt wrote: > Hi P., > > On Thu, Sep 21, 2006 at 07:40:39PM -0400, PGM wrote: > >> I'm running into the following problem with putmask on take. >> >>>>> import numpy >>>>> x = N.arange(12.) >>>>> m = [1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1] >>>>> i = N.nonzero(m)[0] >>>>> w = N.array([-1, -2, -3, -4.]) >>>>> x.putmask(w,m) >>>>> x.take(i) >>>>> N.allclose(x.take(i),w) > > According to the putmask docstring: > > a.putmask(values, mask) sets a.flat[n] = v[n] for each n where > mask.flat[n] is true. v can be scalar. > > This would mean that 'w' is not of the right length. There are 4 true values in m and 4 values in w. What's the wrong length? For the sake of clarity: In [1]: from numpy import * In [3]: m = [1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1] In [4]: i = nonzero(m)[0] In [5]: i Out[5]: array([ 0, 6, 9, 11]) In [6]: w = array([-1, -2, -3, -4.]) In [7]: x = arange(12.) In [8]: x.putmask(w, m) In [9]: x Out[9]: array([ -1., 1., 2., 3., 4., 5., -3., 7., 8., -2., 10., -4.]) In [17]: x[array(m, dtype=bool)] = w In [18]: x Out[18]: array([ -1., 1., 2., 3., 4., 5., -2., 7., 8., -3., 10., -4.]) Out[9] and Out[18] should have been the same, but elements 6 and 9 are flipped. It's pretty clear that this is a bug in .putmask(). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Stefan v. d. W. <st...@su...> - 2006-09-22 07:13:39
|
On Thu, Sep 21, 2006 at 08:35:02PM -0400, P GM wrote: > Folks, > I'm running into the following problem with putmask on take. >=20 > >>> import numpy > >>> x =3D N.arange(12.) > >>> m =3D [1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1] > >>> i =3D N.nonzero (m)[0] > >>> w =3D N.array([-1, -2, -3, -4.]) > >>> x.putmask(w,m) You can also use N.place: Similar to putmask arr[mask] =3D vals but the 1D array vals has the same number of elements as the non-zero values of mask. Inverse of extract. > >>> x.take(i) > >>> N.allclose(x.take(i),w) > False Regards St=E9fan |
From: Stefan v. d. W. <st...@su...> - 2006-09-22 07:03:26
|
Hi P., On Thu, Sep 21, 2006 at 07:40:39PM -0400, PGM wrote: > I'm running into the following problem with putmask on take. >=20 > >>> import numpy > >>> x =3D N.arange(12.) > >>> m =3D [1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1] > >>> i =3D N.nonzero(m)[0] > >>> w =3D N.array([-1, -2, -3, -4.]) > >>> x.putmask(w,m) > >>> x.take(i) > >>> N.allclose(x.take(i),w) According to the putmask docstring: a.putmask(values, mask) sets a.flat[n] =3D v[n] for each n where mask.flat[n] is true. v can be scalar. This would mean that 'w' is not of the right length. Would the following do what you want? import numpy as N m =3D N.array([1,0,0,0,0,0,1,0,0,1,0,1],dtype=3Dbool) w =3D N.array([-1,-2,-3,-4]) x[m] =3D w Regards St=E9fan |
From: Peter B. <Pet...@ug...> - 2006-09-22 04:57:11
|
Cleaning out and rebuilding did the trick! Thanks, Peter On Thursday 21 September 2006 18:33,=20 num...@li... wrote: > Subject: Re: [Numpy-discussion] 1.0rc1 doesn't seem to work on AMD64 > <snip> > > I don't see this running the latest from svn on AMD64 here. Not sayin' > there might not be a problem with rc1, I just don't see it with my source= s. > > Python 2.4.3 (#1, Jun 13 2006, 11:46:22) > [GCC 4.1.1 20060525 (Red Hat 4.1.1-1)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > > >>> import numpy > >>> numpy.version.version > > '1.0.dev3202' > > >>> numpy.version.os.uname() > > ('Linux', 'tethys', '2.6.17-1.2187_FC5', '#1 SMP Mon Sep 11 01:16:59 EDT > 2006', 'x86_64') > > If you are building on Gentoo maybe you could delete the build directory > (and maybe the numpy site package) and rebuild. > > Chuck. |
From: PGM <pgm...@gm...> - 2006-09-22 04:26:58
|
Folks, I'm running into the following problem with putmask on take. >>> import numpy >>> x = N.arange(12.) >>> m = [1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1] >>> i = N.nonzero(m)[0] >>> w = N.array([-1, -2, -3, -4.]) >>> x.putmask(w,m) >>> x.take(i) >>> N.allclose(x.take(i),w) False I'm wondering ifit is intentional, or if it's a problem on my build (1.0b5), or if somebody experienced it as well. Thanks a lot for your input. P. |
From: Christian K. <ck...@ho...> - 2006-09-22 03:46:02
|
Bill Baxter <wbaxter <at> gmail.com> writes: > > Yep, check the release notes: > http://www.scipy.org/ReleaseNotes/NumPy_1.0 > search for 'take' on that page to find out what others have changed as well. > --bb Ok. Does axis=None then mean, that take(a, ind) operates on the flattened array? This it at least what it seem to be. I noticed that the ufunc behaves differently. a.take(ind) and a.take(ind, axis=0) behave the same, so the default argument to axis is 0 rather than None. Christian |
From: Bill B. <wb...@gm...> - 2006-09-22 02:37:24
|
Yep, check the release notes: http://www.scipy.org/ReleaseNotes/NumPy_1.0 search for 'take' on that page to find out what others have changed as well. --bb On 9/22/06, Christian Kristukat <ck...@ho...> wrote: > Hi, > from 1.0b1 to 1.0rc1 the default behaviour of take seems to have changed when > omitting the axis argument: > > In [13]: a = reshape(arange(12),(3,4)) > > In [14]: take(a,[2,3]) > Out[14]: array([2, 3]) > > In [15]: take(a,[2,3],1) > Out[15]: > array([[ 2, 3], > [ 6, 7], > [10, 11]]) > > Is this intended? > > Christian > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
From: Bill B. <wb...@gm...> - 2006-09-22 02:34:52
|
Is there some way to get the equivalent of repmat() for ndim == 1 and ndim >2. For ndim == 1, repmat always returns a 2-d array, instead of remaining 1-d. For ndim >2, repmat just doesn't work. Maybe we could add a 'reparray', with the signature: reparray(A, repeats, axis=None) where repeats is a scalar or a sequence. If 'repeats' is a scalar then the matrix is duplicated along 'axis' that many times. If 'repeats' is a sequence of length N, then A is duplicated repeats[i] times along axis[i]. If axis is None then it is assumed to be (0,1,2...N). Er that's not quite complete, because it doesn't specify what happens when you reparray an array to a higher dimension, like a 1-d to a 3-d. Like reparray([1,2], (2,2,2)). I guess the axis parameter could have some 'newaxis' entries to accomodate that. --bb |
From: Christian K. <ck...@ho...> - 2006-09-22 02:14:54
|
Hi, from 1.0b1 to 1.0rc1 the default behaviour of take seems to have changed when omitting the axis argument: In [13]: a = reshape(arange(12),(3,4)) In [14]: take(a,[2,3]) Out[14]: array([2, 3]) In [15]: take(a,[2,3],1) Out[15]: array([[ 2, 3], [ 6, 7], [10, 11]]) Is this intended? Christian |
From: Bill B. <wb...@gm...> - 2006-09-22 01:20:35
|
Apparently numpy.matrixmultiply got moved into numpy.oldnumeric.matrixmultiply at some point (or rather ceased to be imported into the numpy namespace). Is there any list of all such methods that got banished? This would be nice to have in the release notes. --bb |
From: P G. <pgm...@gm...> - 2006-09-22 00:35:04
|
Folks, I'm running into the following problem with putmask on take. >>> import numpy >>> x = N.arange(12.) >>> m = [1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1] >>> i = N.nonzero(m)[0] >>> w = N.array([-1, -2, -3, -4.]) >>> x.putmask(w,m) >>> x.take(i) >>> N.allclose(x.take(i),w) False I'm wondering ifit is intentional, or if it's a problem on my build (1.0b5), or if somebody experiences that as well. Thanks a lot for your input. P. |
From: Alan G I. <ai...@am...> - 2006-09-21 23:57:32
|
On Thu, 21 Sep 2006, Charles R Harris apparently wrote: > As to the oddness of \param or @param, here is an example from > Epydoc using Epytext > @type m: number > @param m: The slope of the line. > @type b: number > @param b: The y intercept of the line. Compare to definition list style for consolidated field lists in section 5.1 of http://epydoc.sourceforge.net/fields.html#rst which is much more elegant, IMO. Cheers, Alan Isaac |
From: Alan G I. <ai...@am...> - 2006-09-21 23:57:32
|
On Thu, 21 Sep 2006, "David M. Cooke" apparently wrote: > Foremost for Python doc strings, I think, is that it look > ok when using pydoc or similar (ipython's ?, for > instance). That means a minimal amount of markup. IMO reStructuredText is very natural for documentation, and it is nicely handled by epydoc. fwiw, Alan Isaac |
From: David G. <dav...@gm...> - 2006-09-21 23:23:57
|
On 9/20/06, Bill Baxter <wb...@gm...> wrote: > > Hey Andrew, point taken, but I think it would be better if someone who > actually knows the full extent of the change made the edit. I know > zeros and ones changed. Did anything else? > > Anyway, I'm surprised the release notes page is publicly editable. I'm glad that it is editable. I hate wikis that are only editable by a select few. Defeats the purpose (or at least does not maximize the capability of a wiki). -- David Grant http://www.davidgrant.ca |
From: Sebastian H. <ha...@ms...> - 2006-09-21 23:22:11
|
On Thursday 21 September 2006 15:28, Tim Hochberg wrote: > David M. Cooke wrote: > > On Thu, 21 Sep 2006 11:34:42 -0700 > > > > Tim Hochberg <tim...@ie...> wrote: > >> Tim Hochberg wrote: > >>> Robert Kern wrote: > >>>> David M. Cooke wrote: > >>>>> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote: > >>>>>> Let me offer a third path: the algorithms used for .mean() and > >>>>>> .var() are substandard. There are much better incremental algorithms > >>>>>> that entirely avoid the need to accumulate such large (and therefore > >>>>>> precision-losing) intermediate values. The algorithms look like the > >>>>>> following for 1D arrays in Python: > >>>>>> > >>>>>> def mean(a): > >>>>>> m = a[0] > >>>>>> for i in range(1, len(a)): > >>>>>> m += (a[i] - m) / (i + 1) > >>>>>> return m > >>>>> > >>>>> This isn't really going to be any better than using a simple sum. > >>>>> It'll also be slower (a division per iteration). > >>>> > >>>> With one exception, every test that I've thrown at it shows that it's > >>>> better for float32. That exception is uniformly spaced arrays, like > >>>> linspace(). > >>>> > >>>> > You do avoid > >>>> > accumulating large sums, but then doing the division a[i]/len(a) > >>>> > and adding that will do the same. > >>>> > >>>> Okay, this is true. > >>>> > >>>>> Now, if you want to avoid losing precision, you want to use a better > >>>>> summation technique, like compensated (or Kahan) summation: > >>>>> > >>>>> def mean(a): > >>>>> s = e = a.dtype.type(0) > >>>>> for i in range(0, len(a)): > >>>>> temp = s > >>>>> y = a[i] + e > >>>>> s = temp + y > >>>>> e = (temp - s) + y > >>>>> return s / len(a) > >>>>> > >>>>>> def var(a): > >>>>>> m = a[0] > >>>>>> t = a.dtype.type(0) > >>>>>> for i in range(1, len(a)): > >>>>>> q = a[i] - m > >>>>>> r = q / (i+1) > >>>>>> m += r > >>>>>> t += i * q * r > >>>>>> t /= len(a) > >>>>>> return t > >>>>>> > >>>>>> Alternatively, from Knuth: > >>>>>> > >>>>>> def var_knuth(a): > >>>>>> m = a.dtype.type(0) > >>>>>> variance = a.dtype.type(0) > >>>>>> for i in range(len(a)): > >>>>>> delta = a[i] - m > >>>>>> m += delta / (i+1) > >>>>>> variance += delta * (a[i] - m) > >>>>>> variance /= len(a) > >>>>>> return variance > >> > >> I'm going to go ahead and attach a module containing the versions of > >> mean, var, etc that I've been playing with in case someone wants to mess > >> with them. Some were stolen from traffic on this list, for others I > >> grabbed the algorithms from wikipedia or equivalent. > > > > I looked into this a bit more. I checked float32 (single precision) and > > float64 (double precision), using long doubles (float96) for the "exact" > > results. This is based on your code. Results are compared using > > abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values in > > the range of [-100, 900] > > > > First, the mean. In float32, the Kahan summation in single precision is > > better by about 2 orders of magnitude than simple summation. However, > > accumulating the sum in double precision is better by about 9 orders of > > magnitude than simple summation (7 orders more than Kahan). > > > > In float64, Kahan summation is the way to go, by 2 orders of magnitude. > > > > For the variance, in float32, Knuth's method is *no better* than the > > two-pass method. Tim's code does an implicit conversion of intermediate > > results to float64, which is why he saw a much better result. > > Doh! And I fixed that same problem in the mean implementation earlier > too. I was astounded by how good knuth was doing, but not astounded > enough apparently. > > Does it seem weird to anyone else that in: > numpy_scalar <op> python_scalar > the precision ends up being controlled by the python scalar? I would > expect the numpy_scalar to control the resulting precision just like > numpy arrays do in similar circumstances. Perhaps the egg on my face is > just clouding my vision though. > > > The two-pass method using > > Kahan summation (again, in single precision), is better by about 2 orders > > of magnitude. There is practically no difference when using a > > double-precision accumulator amongst the techniques: they're all about 9 > > orders of magnitude better than single-precision two-pass. > > > > In float64, Kahan summation is again better than the rest, by about 2 > > orders of magnitude. > > > > I've put my adaptation of Tim's code, and box-and-whisker plots of the > > results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/ > > > > Conclusions: > > > > - If you're going to calculate everything in single precision, use Kahan > > summation. Using it in double-precision also helps. > > - If you can use a double-precision accumulator, it's much better than > > any of the techniques in single-precision only. > > > > - for speed+precision in the variance, either use Kahan summation in > > single precision with the two-pass method, or use double precision with > > simple summation with the two-pass method. Knuth buys you nothing, except > > slower code :-) > > The two pass methods are definitely more accurate. I won't be convinced > on the speed front till I see comparable C implementations slug it out. > That may well mean never in practice. However, I expect that somewhere > around 10,000 items, the cache will overflow and memory bandwidth will > become the bottleneck. At that point the extra operations of Knuth won't > matter as much as making two passes through the array and Knuth will win > on speed. Of course the accuracy is pretty bad at single precision, so > the possible, theoretical speed advantage at large sizes probably > doesn't matter. > > -tim > > > After 1.0 is out, we should look at doing one of the above. > > +1 > Hi, I don't understand much of these "fancy algorithms", but does the above mean that after 1.0 is released the float32 functions will catch up in terms of precision precision ("to some extend") with using dtype=float64 ? - Sebastian Haase |