GDMLで気になった記事

swizzleで配置されたデータへの簡単なアクセス方法らしいのだが、Morton orderって言葉の意味が分かりません。
コンピュータ用語っぽいんだけどなぁ。
"Mortion order"をでぐぐったら出てきた。
http://www.cs.indiana.edu/pub/techreports/TR533.pdf
なるほど。

以下引用
ちなみにこのメールを書いたのはSCEAが誇るスーパープログラマのChrister Ericson先生です。

Ben Garney wrote:
> >There are also some clever bitshift things you can do to make it
> >fast(er) [for parting the bits of a byte/word/etc by one bit].

Here's my version of it:

// Parting 8 bits into 16 (15) bits
int Part1By1(int n)
{
    n = (n ^ (n << 4)) & 0x0f0f;
    n = (n ^ (n << 2)) & 0x3333;
    n = (n ^ (n << 1)) & 0x5555;
    return n;
}

// Parting 16 bits into 32 (31) bits
int Part1By1(int n)
{
    n = (n ^ (n << 8)) & 0x00ff00ff;
    n = (n ^ (n << 4)) & 0x0f0f0f0f;
    n = (n ^ (n << 2)) & 0x33333333;
    n = (n ^ (n << 1)) & 0x55555555;
    return n;
}

This approach generalizes to larger words as well as to parting
by more than one bit.

To compute a 2D Morton code using the above code you do:

int Morton2D(int a, int b)
{
    return (Part1By1(a) << 1) + Part1By1(b);
}

If you have native support (and perhaps even if not) for words
twice as long as the Morton code you need to produce, you can roll
the two Part1By1 calls together into one by writing the code like
so (here for 8-bit bytes being interleaved):

int Morton2D(int a, int b)
{
        int t = Part1By1((a << 8) + b);
        return (t >> 15) + (t & 0x0ffff);
}

This is better in the sense you're doing the two Part1By1() calls
in parallel.  However, it's also worse in that the Part1By1() call
now must operate on more bits which has the drawback of requiring
larger mask constants that take longer to construct, etc.  Whether
it's worth doing depends entirely on the target architecture.