Thread

Zero-JS Hypermedia Browser

Relays: 5
Replies: 17
Generated: 17:08:09
Login to reply

Replies (17)

that's because you never read a textbook on compiler design and construction. i did, when i was like, idk, 13 or something, i should try to remember exactly. borrowed a book using my mother's university library card. never got arround to it quite yet, but it's on my list to build a slimmed down version of Go, which i'm working title "moxie" as in, the soda with gentian based flavor sold in USA) i've specced up the revised grammar, and mostly modified the yaegi go interpreter to run i tried to build an actual binary compiler first, but that proved to be very complex and claude couldn't do it. but the interpreter was relatively easy, and once implemented, can then allow me to write the compiler in moxie. the standard technique is to make a bootstrap compiler, which only has the parts implemented required for the compiler to work. then, with that compiler, you refactor it a bit so that you use the full version where you previously used more clumsy constructions to avoid making bootstrap 2 compiler difficult to build, and then you have the final compiler. i'm using the interpreter to do the bootstrap 1, and because it's just a small set of changes, and mostly removals, i will be at stage two, the second bootstrap, already, effectively, and with that write the full compiler, and add the execution environment and ABI targets to the code generator. Go has an extensive toolkit for building tools that edit go code, which makes up a large chunk of what the compiler actually has in it. the tokeniser, which identifies all the reserved words, operators, etc, and generates a linear list of tokens. then there is an AST generator (abstract syntax tree) which produces the more complex, tree structured, syntax and grammar containing information, types, comment blocks, etc, and then after that, the compiler feeds the AST into the code generator, which can either generate another programming language (less common) or directly generate binary code for the processor. i hope you enjoyed that explanation, about why language compilers are usually written in the language they compile. once you have one, you can then use it to build updated versions with bugfixes and add/remove language features.
2025-12-05 21:49:42 from 1 relay(s) ↑ Parent 2 replies ↓ Reply
Yes, the rust compiler is written in rust too. The thing is at the beginning, you need to write the compiler for your language in a language that already exists. It'll take your code files and turn them into machine code. But say you then write C code that can compile C code (read C code files and convert them into machine code), and you compile it using your existing compiler written in Assembly or whatever. Then you can run the C compiler you wrote in C to compile new C code.
2025-12-05 21:58:45 from 1 relay(s) ↑ Parent Reply
Because a self referencing compiler could build rebuild Trojans from the first iteration of compiler without it being detectable. You run your sha sum line to check that it matches the signed version and it looks fine, but that's because no one knew there was malicious code in it from the previous compiler. Im probably saying this wrong.
2025-12-05 23:18:54 from 1 relay(s) ↑ Parent 1 replies ↓ Reply
it's an interpreter, and i could probably read the whole codebase within a couple of hours. it's not something i take very seriously - trojan in a programming language, and definitely not if i can read the source. such things would have clear red flags, like non-commented, strange hexadecimal constants that look like they could be binary code. those don't even belong in a compiler, there is no purpose for this. for elliptic curve functions, those are always well documented elements of the arithmetic group, and all the endomorphisms and symmetries and all that. yes, bootstrapping can be very simple. the core C syntax is an example of a language that is already so small that it's simple to build into itself. but C has some terrible elements in it, like unclear bit lengths that are often platform dependant (this was the hardest part when i ported a hamming code error correction algorithm from C into Go some time back), the union type is an abomination, the pointer dereference operator versus the dot operator for struct fields is confusing. all of these complexities lead to slow compilation, and harder validation of the correctness of the translation into machine (or other) language. things you can leave out of Go that are not needed for a compiler are pretty extensive. channels, mutexes, atomics, goroutines, interfaces, probably you can get away with no maps by using inverted indexes or implement a key/value index map purely with slices. you could leave out slices, too, but that would require a lot of fiddling with creating them as they are extremely handy for parsing streams of bytes. you could leave out strings, too, because they are immutable and weird (and that's one of the things i'm removing anyway), but of course you would still need a string literal, it just would map to a slice of bytes (ascii/utf-8).
2025-12-05 23:23:04 from 1 relay(s) ↑ Parent 1 replies ↓ Reply
nah, it's a reasonable question but i am very familiar with how the parts all work, from many years, i have written tokenizers, and even a simple recursive tree structure lexical analyser that i played with to build a novel command line syntax with tree properties. at the time i was building an abstraction for a simple language, essentially. there is no way to hide malware in a syntax, only a lexicon. i'm taking a language with one of the smallest lexicons, and razoring it down even further. probably quite a swathe of the standard library, i'm not even going to be using, but partlially hand-translating it, mostly just filling the gaps in the parts where i have removed features that they used in the Go compiler. so, yeah. it's not likely, it's a small search space to detect, and there are no common ways in which code can be backdoored. LLMs on the other hand, can hide reams and reams of malware in them, but with the scope of the subject matter, there's nowhere it can transfer it forward.
2025-12-05 23:28:07 from 1 relay(s) ↑ Parent 1 replies ↓ Reply
Woosh! Sooooo many words I don't know. But it sounds like C can be small. Could you segment off different things and kinda make it modular, so that you add in a piece after you build it for additional capability? Those channels, mutexes, interfaces and maps? That way you could always return to the initial compiler and then redo the entire bootstrap.
2025-12-05 23:30:17 from 1 relay(s) ↑ Parent Reply
yeah, C is a small language. that's why it's so often used in kernels and drivers, it's actually a smaller lexicon than common assembler/instruction sets, it trades lexicon down for structuring. actually, that's an example of why i love the Go loop syntax. it's one reserved word, and like 12 or more different grammars.
2025-12-05 23:38:35 from 1 relay(s) ↑ Parent 1 replies ↓ Reply