mozo is a mozo

2015年5月4日月曜日

Go Ancestry

https://news.ycombinator.com/item?id=8715971

The revision history of Go language begins with a number of interesting commits:

commit 18c5b488a3b2e218c0e0cf2a7d4820d9da93a554
Author: Robert Griesemer 
Date:   Sun Mar 2 20:47:34 2008 -0800

    Go spec starting point.

    SVN=111041

commit d82b11e4a46307f1f1415024f33263e819c222b8
Author: Brian Kernighan 
Date:   Fri Apr 1 02:03:04 1988 -0500

    last-minute fix: convert to ANSI C

    R=dmr
    DELTA=3  (2 added, 0 deleted, 1 changed)

commit 0744ac969119db8a0ad3253951d375eb77cfce9e
Author: Brian Kernighan 
Date:   Fri Apr 1 02:02:04 1988 -0500

    convert to Draft-Proposed ANSI C

    R=dmr
    DELTA=5  (2 added, 0 deleted, 3 changed)

commit 0bb0b61d6a85b2a1a33dcbc418089656f2754d32
Author: Brian Kernighan 
Date:   Sun Jan 20 01:02:03 1974 -0400

    convert to C

    R=dmr
    DELTA=6  (0 added, 3 deleted, 3 changed)

commit 7d7c6a97f815e9279d08cfaea7d5efb5e90695a8
Author: Brian Kernighan 
Date:   Tue Jul 18 19:05:45 1972 -0500

    hello, world

    R=ken

The very first commit consists of a single file named hello.b, which seems to be a "hello world" program in the B programming language. According to "Hello, World! program" article at Wikipedia, it was actually written by Kernighan for A Tutorial Introduction to the Language B in 1972, though I tend to doubt the commit date reflects the real date when the document had been written and/or published. (It'd be a much surprise if it was!)

The source code is as follows (no syntax highlighting is available for B as you might think):

main( ) {
 extrn a, b, c;
 putchar(a); putchar(b); putchar(c); putchar('!*n');
}
a 'hell';
b 'o, w';
c 'orld';

The commit followed by the C version, which appeared in Kernighan's 1974 Programming in C: A Tutorial. "dmr" is the username of Dennis Ritche, the inventor of C.

main() {
       printf("hello, world");
}
1988 is the year when C was standardized as part of ANSI and also when the second edition of K&R was published. The change in "convert to Draft-Proposed ANSI C" reflects the updates in the standard, that is, prototype declarations and standard headers.
#include 

main()
{
       printf("hello, world");
}
The "last-minute" fix seems to have reflected the (presumed) addition in 2.1.2.2. Hosted Environment.
A return from the initial call to the main function is equivalent to calling the exit function with the value returned by the main function as its argument. If the main function executes a return that specifies no value, the termination status returned to the host environment is undefined.
The code is as follows:
#include 

int
main(void)
{
 printf("hello, world\n");
 return 0;
}
And those "reconstructed" changes are eventually followed by this, the initial language specification of Go language. Isn't it like seeing an epic?

2014年6月1日日曜日

Synchronization patterns in Go

When I write codes for ik, often I find it necessary to synchronize the access to some resource from a number of goroutines. While using channels almost looks like the way to “go” , simply applying mutexes also has some clear benefits. I think it is a good time to do some comparison over the two, so here it goes:

Good-old mutexes

Procedures that manipulate on the shared resource are surrounded by mutex calls. It should be easy to follow the control flow because the callgraph from the initiator of the action to the critical sections is pretty much straight-forward, and so it is to do the function-level testing as everything is serialized in a single flow and all you need is call the target function.

But it'd generally turn out to be a nightmare to find out the cause of the deadlock where three or more locks are involved.

Daemon pattern

Access to the resource is restricted to just one goroutine and every initiator of the action is required to send the message to the goroutine to let it manipulate the resource. The result of the manipulation is retrieved through the channel too. This is what I call “daemon pattern” as the manipulating goroutine behaves analogously to Unix's service daemons. I don't know the exact name of the pattern but there should be a good name for it.

One of the notable benefit over mutexes is that you can identify what causes a deadlock much easier because synchronization happens only when the communication takes place. In addition, it enables you to implement timeout with another channel without much difficulty.

But you'd end up managing the lifetime of the daemon goroutine as it needs to be restarted when it aborts for some reasons and needs to be stopped when it is no longer necessary. Besides it'd be harder to figure out the control flow because of the separated callgraphs and extra codes for the communication. Thus the code for function-level testing would be complicated.

Summary

  • Mutexes

    • Pros

      • Straight-forward, top-down callgraph
      • Easily testable in a single thread
    • Cons

      • Terribly hard to find out the deadlock site
  • Daemon pattern

    • Pros

      • Comparatively easier to find deadlocks because locks are far more obvious than mutexes
      • Need to manage lifetime of daemon goroutines
    • Cons:

      • Testing would be a little bit difficult

P.S. Go wiki has the dedicated page for this topic: Mutex or Channel

2009年5月18日月曜日

Performance consideration of numpy.array and tuples

While I was writing a python module in C that calculates the euclidean distance between two spatial points, I came across a question: numpy.array and tuples, which is faster in this particular case? That wouldn't look obvious because numpy.array() is slower in creating a new instance while it allows to access the raw pointer to the underlying data, which should enable faster element retrieval.

So I wrote different versions of a function that calculates dot product in C, and did the benchmark between them.

Here is the code I used in the comparison:

mytestmodule.c:

#include <Python.h>
#include <numpy/arrayobject.h>

PyObject* dotproductSeq(PyObject* self, PyObject* args)
{
PyObject *arg1;
PyObject *arg2;
int i;
double result = 0;

if (!PyArg_ParseTuple(args, "OO", &arg1, &arg2)) {
return NULL;
}

if (!PySequence_Check(arg1)) {
PyErr_SetString(PyExc_TypeError, "argument 1 must be a sequence");
return NULL;
}

if (!PySequence_Check(arg2)) {
PyErr_SetString(PyExc_TypeError, "argument 2 must be a sequence");
return NULL;
}

Py_ssize_t sz = PySequence_Size(arg1);

if (sz != PySequence_Size(arg2)) {
PyErr_SetString(PyExc_ValueError, "argument 1 and 2 must be sequences of the same length");
return NULL;
}

for (i = 0; i < sz; ++i) {
result +=
PyFloat_AsDouble(PySequence_GetItem(arg1, i)) *
PyFloat_AsDouble(PySequence_GetItem(arg2, i));
}

return PyFloat_FromDouble(result);
}

PyObject* dotproductSeqUnsafe(PyObject* self, PyObject* args)
{
PyObject *arg1;
PyObject *arg2;
int i;
double result = 0;

if (!PyArg_ParseTuple(args, "OO", &arg1, &arg2)) {
return NULL;
}

if (!PySequence_Check(arg1)) {
PyErr_SetString(PyExc_TypeError, "argument 1 must be a sequence");
return NULL;
}

if (!PySequence_Check(arg2)) {
PyErr_SetString(PyExc_TypeError, "argument 2 must be a sequence");
return NULL;
}

Py_ssize_t sz = PySequence_Fast_GET_SIZE(arg1);

if (sz != PySequence_Fast_GET_SIZE(arg2)) {
PyErr_SetString(PyExc_ValueError, "argument 1 and 2 must be sequences of the same length");
return NULL;
}

for (i = 0; i < sz; ++i) {
result +=
PyFloat_AS_DOUBLE(PySequence_Fast_GET_ITEM(arg1, i)) *
PyFloat_AS_DOUBLE(PySequence_Fast_GET_ITEM(arg2, i));
}

return PyFloat_FromDouble(result);
}

PyObject* dotproductTuple(PyObject* self, PyObject* args)
{
PyObject *arg1;
PyObject *arg2;
int i;
double result = 0;

if (!PyArg_ParseTuple(args, "OO", &arg1, &arg2)) {
return NULL;
}

if (!PyTuple_Check(arg1)) {
PyErr_SetString(PyExc_TypeError, "argument 1 must be a tuple");
return NULL;
}

if (!PyTuple_Check(arg2)) {
PyErr_SetString(PyExc_TypeError, "argument 2 must be a tuple");
return NULL;
}

Py_ssize_t sz = PyTuple_GET_SIZE(arg1);

if (sz != PyTuple_GET_SIZE(arg2)) {
PyErr_SetString(PyExc_ValueError, "argument 1 and 2 must be tuple of the same length");
return NULL;
}

for (i = 0; i < sz; ++i) {
result +=
PyFloat_AsDouble(PyTuple_GET_ITEM(arg1, i)) *
PyFloat_AsDouble(PyTuple_GET_ITEM(arg2, i));
}

return PyFloat_FromDouble(result);
}

PyObject* dotproductTupleUnsafe(PyObject* self, PyObject* args)
{
PyObject *arg1;
PyObject *arg2;
int i;
double result = 0;

if (!PyArg_ParseTuple(args, "OO", &arg1, &arg2)) {
return NULL;
}

if (!PyTuple_Check(arg1)) {
PyErr_SetString(PyExc_TypeError, "argument 1 must be a tuple");
return NULL;
}

if (!PyTuple_Check(arg2)) {
PyErr_SetString(PyExc_TypeError, "argument 2 must be a tuple");
return NULL;
}

Py_ssize_t sz = PyTuple_GET_SIZE(arg1);

if (sz != PyTuple_GET_SIZE(arg2)) {
PyErr_SetString(PyExc_ValueError, "argument 1 and 2 must be tuple of the same length");
return NULL;
}

for (i = 0; i < sz; ++i) {
result +=
PyFloat_AS_DOUBLE(PyTuple_GET_ITEM(arg1, i)) *
PyFloat_AS_DOUBLE(PyTuple_GET_ITEM(arg2, i));
}

return PyFloat_FromDouble(result);
}


PyObject* dotproductNDArray(PyObject* self, PyObject* args)
{
PyObject *arg1;
PyObject *arg2;
npy_byte const* data1;
npy_byte const* data2;
npy_intp arg1_stride;
npy_intp arg2_stride;
npy_double result = 0;
PyArray_Descr* arg1_descr;
PyArray_Descr* arg2_descr;
int i;

if (!PyArg_ParseTuple(args, "OO", &arg1, &arg2)) {
return NULL;
}

if (!PyArray_Check(arg1) || PyArray_NDIM(arg1) != 1) {
PyErr_SetString(PyExc_TypeError, "argument 1 must be a 1-dimensional ndarray");
return NULL;
}

if (!PyArray_Check(arg2) || PyArray_NDIM(arg2) != 1) {
PyErr_SetString(PyExc_TypeError, "argument 2 must be a 1-dimensional ndarray");
return NULL;
}

npy_intp sz = PyArray_DIMS(arg1)[0];

if (sz != PyArray_DIMS(arg2)[0]) {
PyErr_SetString(PyExc_ValueError, "argument 1 and 2 must be ndarrays of the same length");
return NULL;
}

arg1_descr = PyArray_DESCR(arg1);
arg2_descr = PyArray_DESCR(arg2);

if (NPY_DOUBLE != arg1_descr->type_num) {
PyErr_SetString(PyExc_ValueError, "elements of argument 1 must be float");
return NULL;
}

if (NPY_DOUBLE != arg2_descr->type_num) {
PyErr_SetString(PyExc_ValueError, "elements of argument 2 must be float");
return NULL;
}

data1 = (npy_byte*)PyArray_DATA(arg1);
data2 = (npy_byte*)PyArray_DATA(arg2);
arg1_stride = PyArray_STRIDES(arg1)[0];
arg2_stride = PyArray_STRIDES(arg2)[0];
result = 0;

for (i = 0; i < sz; ++i) {
result +=
*(npy_double const*)(data1 + arg1_stride * i) *
*(npy_double const*)(data2 + arg2_stride * i);
}

return PyFloat_FromDouble(result);
}

static struct PyMethodDef methods[] = {
{ "dotproductSeq", (PyCFunction)dotproductSeq, METH_VARARGS, "" },
{ "dotproductSeqUnsafe", (PyCFunction)dotproductSeqUnsafe, METH_VARARGS, "" },
{ "dotproductTuple", (PyCFunction)dotproductTuple, METH_VARARGS, "" },
{ "dotproductTupleUnsafe", (PyCFunction)dotproductTupleUnsafe, METH_VARARGS, "" },
{ "dotproductNDArray", (PyCFunction)dotproductNDArray, METH_VARARGS, "" }
};

void initmytest()
{
import_array();
Py_InitModule3("mytest", methods, "");
}


bench.py:

import numpy
from mytest import *
import cProfile as profile
import pstats
import os

def case1():
d = 0.
while d < 10000.:
dotproductNDArray(numpy.array((d, d, d)), numpy.array((d, d, d)))
d += .1

def case2():
d = 0.
while d < 10000.:
dotproductSeq((d, d, d), (d, d, d))
d += .1

def case3():
d = 0.
while d < 10000.:
dotproductSeq(numpy.array((d, d, d)), numpy.array((d, d, d)))
d += .1

def case4():
d = 0.
while d < 10000.:
dotproductSeqUnsafe((d, d, d), (d, d, d))
d += .1

def case5():
d = 0.
while d < 10000.:
dotproductTuple((d, d, d), (d, d, d))
d += .1

def case6():
d = 0.
while d < 10000.:
dotproductTupleUnsafe((d, d, d), (d, d, d))
d += .1

def case7():
d = 0.
while d < 10000.:
numpy.dot(numpy.array((d, d, d)), numpy.array((d, d, d)))
d += .1

def case8():
d = 0.
while d < 10000.:
numpy.dot((d, d, d), (d, d, d))
d += .1

def runProf(fun):
tmp = os.tmpnam() # XXX:racy
profile.run('fun()', tmp)
return tmp

stats = pstats.Stats(*[runProf(fun) for fun in (case1, case2, case3, case4, case5, case6, case7, case8)])
stats.sort_stats('cumulative')
stats.print_stats(__file__)


And the results:


Ordered by: cumulative time
List reduced from 17 to 8 due to restriction <'bench.py'>

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.164 0.164 1.437 1.437 bench.py:43(case7)
1 0.151 0.151 1.377 1.377 bench.py:19(case3)
1 0.145 0.145 1.268 1.268 bench.py:7(case1)
1 0.090 0.090 1.232 1.232 bench.py:49(case8)
1 0.049 0.049 0.089 0.089 bench.py:13(case2)
1 0.048 0.048 0.084 0.084 bench.py:25(case4)
1 0.048 0.048 0.065 0.065 bench.py:31(case5)
1 0.048 0.048 0.063 0.063 bench.py:37(case6)


Summary

  • If you are sure that a given argument is a tuple, use PyTuple_XXX() and that is the fastest.

  • Construction of a ndarray object is roughly 15 times as slow as a tuple.

  • numpy.dot() is remarkably slow.

  • PyFloat_AsDouble() is not that slow comparing to PyFloat_AS_DOUBLE().

Blogger Syntax Highliter

フォロワー