BIG DISCLAIMER!!

I am a rookie in Odin.

I might have a lot of faults and misinterpretation of things on this work. Please take this article with a BIG grain of salts, cz tbh, I am learning.

I wanna know how things work.

I also wanna be corrected if I am wrong.

There will be bunch of senteces that might be confusing to You due to grammars. I’m so sorry, I am trying, yk, born in non-spekaing english country sometimes sucks.

AND PLEASE DO NOT USE THIS IN ANY OF YOUR PRODUCTION!, duh?!

I already did decode the .npy files in my repo python-numpy-npy-in-odin, So, if You are more interested to dig and find out yourself, you are welcomed to check that repo ;D

This whole thing in a nutshell

  1. I’ll inform You tools I used
  2. I’ll inform You what are my motivations
  3. I’ll inform You what are my sources
  4. I’ll inform You WHAT & HOW I learned from the sources
  5. I’ll inform You how I help myself to extract informations
  6. I’ll inform You how I did read .npy files in Odin e.i. How to use the final product
  7. And finally, You are free to choose to read What I learned from this small project or go, see You later 🤘😁
  8. The references

Tools I used when working on this:

  1. Python v 3.10
  2. Numpy v 1.26.4
  3. Odin v dev-2025-03

Motivation(s)

I’ve been working with Numpy in Python since day-1 I put my fingers on keyboards coding in Python. Most of the times, it involves saving-and-loading data using Numpy.

As a Geophysics student, multi dimensional arrays/matrix is an absolute objects I very often cannot avoid while writing programs, like, dead ahh absolute.

Okay, now that I’m embracing myself to code in lower-level language (relative than Python), in this case, Odin, I want to utilize what I have been using and producing using Numpy in Python inside Odin. But there is no such a thing numpy.load(the_file) inside Odin and as far as my ability to surf and dig out the internet, I haven’t found single person doing such a thing, you know, to avoiding re-inventing the wheel. So, I might need to do it myself, manually.

Small research

The “What” and “Why”

First things first, I have to read. I need to find the “What”, “How”, and “Why”, to be able to do the works. But wait, what is exactly the things we are dealing with, again?.

  • It’s .npy files.
  • Who made it? Numpy.

So logically, Numpy is the correct source where I can find “why they made it”, and “how they made it”. Here’s what I found in Numpy Enhancement Proposal (NEP)-1

We propose a standard binary file format (NPY) for persisting a single arbitrary NumPy array on disk. The format stores all of the shape and dtype information necessary to reconstruct the array correctly even on another machine with a different architecture. The format is designed to be as simple as possible while achieving its limited goals. The implementation is intended to be pure Python and distributed as part of the main numpy package.

Kern, R. (2007)

What I can take, solely based on above;

  • The format stores all the shape and dtype (data type) informations.
  • That informations are necessary to reconstruct the array/data correctly no matter what kinda devices we use.
  • The format guaranteed to be as simple as possible to achieve its limited goals.
  • The implementations is intended to be pure Python as part of Numpy.

Note: I took that as the “Why” it existed.

The “How”, but first, I need me some examples.

Now, I need some example on how to work with non-native data type and files in Odin. I found this article “Reverse Engineering Alembic” by Ginger Bill. [Alembic] is an interchange file format. Alembic is written in C++. Bill need to work with Alembic as part of his works in JangaFX which requires him to read and write Alembic format in Odin.

From that article, Bill basically explain following points;

  1. The Header

    • what it is.
    • how the header be layouted in memory.
    • how to read them in correct way.
  2. The Data

    • what are they.
    • how alembic structure it’s data.
    • how the relation between the data.
    • how the layouted in memory, and how to correclty extract them in correct format.

Bill also would recommend us to use hex-viewer to bytes in a readable way, since we are working with binary formats. Through this work, I use Hex Editor Neo.

I also came accross Rickard Andersson’s YouTube videos. I learned A LOT from his videos. In one of Rickard’s videos, he talked about bitwise operations which basically, at least how I percieved it, tells me how to work with bytes, and hex. Rickard then gave example on how byte can be used to represent a set of flags (in the case, it was boolean values) that hold certain informations. Rickard gave the example using LZ4 which is a tool to compressing file.

I specifically watched these playlists which where I found that bitwise operation video:

  1. The Odin programming language with Rickard
  2. Odin in Practice

I also learn that the impotance of Byte Order, Endianness which also I acquire from Rickard’s videos. Endianness refers to how bytes are stored in specific order. The orders are:

  1. Largest first
  2. Smallest first

The endianness actually represent the significancy of those orders

I think endiannes is like setting up the base of our assumptions when we communicate, but insteadd of human, the subjects are computers.

It matters because human tends to assume over things. the value is stays the same but one’s perception about it could be varies. So this endianness gives us a way to specify and unify the perception upon things its like top-down/down-top approach on a problem

Another thing I found through this journey, is bytes-stream. Bytes-stream refers to a sequence of bytes. I came accross this very nice visualization of bytes-stream by Overcoded in a Medium article by West, Z. 2020, “What’s A Byte Stream, Anyway”.

Note by West, Z. (2020)

In the illustration above, the solid black boxes separating 8-bit groups are purely for visualization purposes only. Such a gap in information would not be present in practice.

ok, now I am going to talk about “How” the examples implement things, in Odin.

How Bill, G. (2022) doing with Alembic

Under What is Alembic? section, Bill, G. (2022) says that, “Alembic consits of 2 file formats with different memory layout which can be determined based on magic file signature”. Those 2 file formats are;

  1. HDF5, and
  2. Ogawa.

HDF5 commonly being used to store data in hierarchical kinda way. Ogawa, also stores the data in hierarchical fashion, but uncompressed. And it seems like majority of “Alembic” are not in HDF5 format, but in Ogawa (Bill, G. 2022). Ogawa format is little-endian-binary-format.

Here’s how Bill, G. structured his way to define a very simple Ogawa’s header in Odin.

// this is Odin lang
// source: https://www.gingerbill.org/article/2022/07/11/reverse-engineering-alembic/#ogawa
 
MAGIC :: "Ogawa"
 
File_Header :: struct {
	magic:   [5]byte, // "Ogawa"
	wflag:   enum byte { writing = 0x00, closed = 0xff },
	version: [2]byte, // {0, 1}
	root_group_offset: u64le,
}

That is an Odin’s struct. How did He know it? well, I am assuming He just read the Ogawa docs. Notice that Bill, G. used [5]byte as the type for magic field which was encoded while Ogawa file format is created, and save information “Ogawa” as the magic.

I ain’t going through all the fields of the Ogawa File_Header struct above, since it is not my concern. The major things I saw from Bill’s works is that, we have to know certain things;

  1. What are the information the header would be holding.
  2. How the informations are being laidout in memory.
  3. What are the data type of each information.
  4. What is the endianness, that important for us to work with ‘em bytes-stream.

How do we know those things?

  • Read the official docs
  • Messin’ around and hope for the best.

Well, I guess, reading is the best way before messing around. So, it brought me to the next example, which is came from Rickard Andersson’s YouTube, about “Bitwise Operation” where Rickard talked about the example of LZ4. Specifically on General Structure of LZ4 Frame Format and Frame Descriptor

Now, we are talking. How the Numpy’s .npy file actually looks like?

  1. I read Numpy’s (v1.26.4) documentation about data types
  2. I looked up into Numpy’s Github repository “_format_impl.py”

Now that I got at least the information of types that Numpy implements, I need to know what they would look like in .npy files, like ‘em binaries.

Here’s the flow diagram of how I did it.

graph LR
start((start))
make_array[/ make array /]
make_array_another_array[/ make another array /]
save[/ save /]
open[/ open the array /]
inspection[/ inspection /]
loop{ is it enough? }
stop((stop))

loop --> how_about --> make_array_another_array --> save
start --> make_array --> save --> open --> inspection --> loop --> yes --> stop
loop --> no --> make_array

Now we’re playin with Numpy

  1. Making the arrays, for simplicity, I’ll make 2 arrays, 1st one is 1D array and the other is 2D array.
  1. Load-in the arrays

With those simple examples, we can see that for each .npy files, we got those alien strings for the int8_5.npy we got these

"“NUMPY\x01\x00v\x00{'descr': '|u1', 'fortran_order': False, 'shape': (5, 5), }                
\n"

followed by these

'\x01\x02\x03\x04\x05\x02\x03\x04\x05\x06\x03\x04\x05\x06\x07\x04\x05\x06\x07\x08\x05\x06\x07\x08\t'

Okay, to construct the data, we definitely need information, right?, we now use that strange lookin’ header to find out what informatio we could get.

In Numpy-npy format specification verision 1.0 , Numpy mentioned that;

  1. The first 6 bytes are a magic string, exactly x93NUMPY
  2. the next 1 byte is unsigned byte, which tells us the major version number of the file format
  3. the next 1 byte is also unsigned byte, which tells us the minor version number of the file format
  4. The next 2 bytes form a little-endian unsigned short int (whoah, that’s a LOT), it is the length of the header which Numpy refers to HEADER_LEN.
  5. Now, the next HEADER_LEN bytes, is the header data describing the array’s format. It is an ASCII string, which contains Python literal expression of a dictionary.

How would it look like in Odin, then?

The header and eveyrthin’

Now, I am going to test my Odin amateur skill to represent those things above

The NumpyHeader struct is a #packed struct, which is a struct directive . Different struct directive will gives us different memory layout and alignment requirements. In this case, NumpyHeader will not have any padding between its fields. Following code block will show You directives of struct in Odin.

// this is also Odin
// source: odin-lang.com/docs/overview/
 
struct #align(4) {...} // align to 4 bytes
struct #packed {...} // remove padding between fields
struct #raw_union {...} // all fields share the same offset (0). This is the same as C's union

Hollup, lemme go back to creating more arrays, but this time, I’ll do more

Now, I wanna go back to creates Numpy arrays with all data types that Numpy v1.26.4 got (see Data types table) The full script is in my Github repository in generate_array.py. I created of 1D (postfix-ed by _5.) and 2D arrays (postfix-ed by _5x5 ), and now I got these bad boys.

After that, I need to make sure/check several things by utilizing Numpy format implementation and modified it a lil bit to kinda see what is going on inside all of those.

The modification was only addition of print functions all over the necessary places

def pp(func: str, *args) -> None:
    print(f"inside {func=}: {args=}")
    return None

The modified script is also in the repo python-numpy-npy-in-odin Example of how I used it to see, yk, how the data flow and transformation are happening depicts as follow

I also write little script that can take a single .npy file or a single directory containing bunch of .npy files and print bunch of informations, its called dirty.py, its in the repo as well

And here the result of 2D array with shape of 5x5 with type of np.bool_

Input of single numpy file

Note: I am using a Python libaray rich to print out colorfull strings as a matter of personal preference, if You don’t have it, it’s okay. the script is going to just be fine.

I then run that dirty.py script for all of the .npy files I created and awk-ed the output way trhough to collect following tables, just to filter out unnecessary informations. The tables consist of 2 columns, column Numpy Type is the exact datatype that I named the .npy files, and column Type in npy File Header shows the native data type prefix-ed with the endian.

The endiann are describe as follows;

  1. | = native endianness of the system, depends on the computer.
  2. < = little endian.
  3. > = big endian.

Now we are speaking in Odin again.

So, from my experience and couple of previous examples, we can open the .npy files not with np.load(), but with python basic open() and got alien strings. Now, I am going to do that in Odin. I was heavily being influenced by Rickard’s YouTube videos, “Stream, Reader and data pointers” when I was opening files in Odin.

In my understanding, the points Rickard was trying to do is extending some behaviour of a set of Odin’s procedures. The extensions is about adding a logger in the main process of reading JSON file. I was following what is Rickard’s doing in that video, but instead of JSON file that being read, I open .npy file. Anyway, I did take notes of steps of what was Rickard doing in the video.

  1. imporitng file in Odin, it will return a handle and a potential error
  2. creating a stream from that handle
  3. turn stream into a reader object (also with a potential error)
  4. initialize a bufio.Reader from that reader
  5. we make use of that bufio.Reader whatever we want.

However, I tried to replicate and modified above workflow to open .npy files. Here’s a simplified flow diagram of how I did it, put in mind that I show no error handling in the diagram, the error unions are defined in complete script in the repo

Now, the Odin procedure. I call it load_npy, it takes 3 arguments

  1. file_name: string
  2. bufreader_size: int
  3. allocator := context.allocator

and will give back 3 objects

  1. npy_header: NumpyHeader
  2. lines: NDArray
  3. (potential) error: ReadFileError

Before the code, first I defined union of array types that will construct the NDArray struct.

The parse_npy_header is not that interesting I would say, tbh, I asked Deep-seek to do the string parsing and cleaning. You can see the implementation in the repo, tho. I want to draw your attention to recreate_array procedure, instead. This bad girl is my favourite, yet the most frustrating thing in this work 😂 cz of the switch-case

recreate_array procedure takes 4 arguments, which consits of 3 pointers and an context.allocator. It later returns ArrayTypes or nil value

  1. np_header: ^NumpyHeader
  2. reader: ^bufio.Reader
  3. ndarray: ^NDArray
  4. allocator: context.allocator

Note: there is multiplication of code in the switch-case, but I really don’t mind tho, since I need it to be specific and working in the way I wanted so You will encounter a long-long-long switch-case.

Before things, got messy, let’s take a look into the simplified flow diagram for a single data type.

Now the codeblock itself

in siwthc-case "c16"

for this complex dataype I really have to set size := 16 to make it works. After a long debugging and printing, I found out that length of the bytes was being read before was not matching the number of elements in header.shape if the length of bytes is being devided by number of the elements.

 
   // other cases ...
 
        case "c16" :
 
            n_data_from_shape : int = 1
            for shp in np_header.header.shape {
                n_data_from_shape *= shp
            }
 
            size := 16
            ndarray.length = cast(u64)n_data_from_shape
            ndarray.size = size
 
            count_elems := 0
            _lines := make([dynamic]complex64)
            i : int
            for i := 0; i <n_elem-(size/2); i += size {
                casted_data, cast_ok := endian.get_f64(data[i:i+size], np_header.header.endianess)
                count_elems += 1
                append(&_lines, cast(complex64)casted_data)
            }
            return _lines[:]
 
 
   // other cases ...
 

Finally, remember that the recreate_array was being called in load_npy proc, right?, now we’re going back to that and assign the return object of recreate_array to NDArray instance.

end.

The Final part

main.odin script

// this is odin scirpt
package main
 
import "base:runtime"
import "core:fmt"
import "core:os"
import npyload "npyodin"
 
default_context : runtime.Context
 
main :: proc() {
 
    default_context = context
 
    file_name : string = os.args[1]
    defer delete(file_name)
 
    np_header, ndarray, ok := npyload.load_npy(file_name, 1024, allocator = default_context.allocator)
 
    defer npyload.delete_ndarray(&ndarray)
    defer npyload.delete_header(&np_header)
 
    fmt.printfln("file: %v", file_name)
    fmt.printfln("Header: \n| %v", np_header)
 
    fmt.printfln("Data: %v\n| size_of that thing: %v bytes\n| with lenght of: %v bits\n", ndarray, size_of(ndarray), ndarray.length)
 
}

What I learned from this small project

  1. I sometimes find myself crash-out a lil bit when writing program in Odin while assuming I can do things when I code in Python like, let Python collect the garbage for me, etc.
  2. I learn more to be patience watching and reading stuff more carefully and taking notes to things that matter.
  3. Writing in lower-level language are for real harder and need more wide attention span while writing.
  4. I learn about bytes and a little bit of how decode/encode in Odin.
  5. I now know how Numpy internal implementation when they create and writing data into disk

With those above being said, now I have potential to extrapolate and learn more about interchangeable file formats, like

  1. I know what should I know/read/look for before actually encode the format programmatically
  2. I can reverse the encoding process to write anything in specific format that can be read in anohter machine
  3. I can decide (have more options and being mindful about it) wether I should program in Python, Odin or another tools.

Refernces

  1. Numpy Enhancement Proposal (NEP)-1
  2. “Reverse Engineering Alembic”
  3. Odin-Overview
  4. Numpy’s (v1.26.4) documentation about data types
  5. Numpy’s Github repository “_format_impl.py”
  6. magic file signature
  7. “What’s A Byte Stream, Anyway”
  8. Rickard Andersson’s YouTube