Cosmopolitan Libc: build-once run-anywhere C library

Cosmopolitan makes C a acquire-as soon as whisk-wherever language, equal to
Java, aside from it does no longer require interpreters or digital machines be
installed beforehand. Cosmo offers the same portability advantages as
excessive-stage languages like Scoot and Rust, nevertheless it does no longer contain a recent
language and you obtained’t want to configure a CI machine to acquire separate
binaries for every operating machine. What Cosmopolitan specializes in is
fixing C by decoupling it from platforms, so it might maybe per chance probably per chance very effectively be fine to spend
for writing minute unix capabilities which are without effort distributed to an unbelievable
broader viewers.

Getting Began

Assuming you might maybe like GCC on Linux, then all you would like are the five
extra recordsdata which shall be linked below:

# contain easy c program on account for line
echo '
  well-known() {
    printf("howdy worldn");

# whisk gcc compiler in freestanding mode
gcc -g -Os -static -fno-pie -mno-red-zone -nostdlib -nostdinc -o howdy.c 
  -Wl,--oformat=binary -Wl,--gc-sections -Wl,-z,max-web page-dimension=0x1000 
  -Wl,-T, -contain cosmopolitan.h crt.o ape.o cosmopolitan.a

# ~40kb static binary (might maybe per chance very effectively be ~16kb w/ MODE=diminutive)

The above account for fixes GCC so it outputs transportable binaries that will
whisk on every Linux distro as well to Mac OS X, Windows NT,
FreeBSD, and OpenBSD too. For essential aspects on how this works, please read
the αcτµαlly pδrταblε εxεcµταblε blog post. This
new binary layout is moreover optional: worn ELF binaries might maybe per chance very effectively be
compiled too by eradicating the -Wl,--oformat=binary flag.

Your program will moreover boot on bare steel too. In other phrases, you like
written a normal textbook C program, and for that reason of Cosmopolitan’s
low-stage linker magic, you like effectively created your like operating
machine which happens to whisk on the total existing ones as effectively. Now
that’s something nobody’s done earlier than.

Mailing List

Please join
the Cosmopolitan
Google Team!


Cosmopolitan has been optimized by hand for improbable efficiency on
usual desktops and servers. Compared with glibc, you might want to restful quiz
Cosmopolitan to be practically as hasty, nevertheless with an show of a magnitude
tinier code dimension. Compared with Musl or Newlib, you would also quiz that
Cosmopolitan will on occasion proceed great faster, whereas having roughly the
same code dimension, if no longer tinier.

Within the case of the supreme libc characteristic, memcpy(),
Cosmopolitan outperformed every other launch provide library examined. The
chart below exhibits how fleet memory is transferred looking out on the
dimension of the copy. Because it be log scale, every grid square represents a
2x distinction in efficiency. What makes Cosmopolitan so hasty right here is
it makes spend of makes spend of several varied memory copying ideas. For minute
sizes it makes spend of an indirect branch with overlapping strikes; for medium
sizes it makes spend of simd vectors, and for faithful copies it makes spend of nontemporal
hints which quit cache thrash. Diversified libraries on the general tumble short
because they spend a one-dimension-suits-all device. To illustrate, Newlib
goes 10x slower for the optimal block dimension (half L1 cache) because it
continuously does nontemporal strikes.

memcpy() performance for varying n values

Trickle-Down Performance

Performing basically the most easy on benchmarks is no longer ample. Cosmopolitan moreover makes spend of
a second device that the above benchmark does no longer measure, which we
call “trickle-down efficiency”. For an instance of how that works,
spend into consideration the next frequent truth about C which is continuously disregarded.
External characteristic calls equivalent to the next:

memcpy(foo, bar, n);

Are roughly equal to the next assembly, which leads
compilers to map shut that the bulk cpu reveal is clobbered:

asm unstable("call memcpy"
             : "=a"(rax), "=D"(rdi), "=S"(rsi), "=d"(rdx)
             : "1"(foo), "2"(bar), "3"(n)
             : "rcx", "r8", "r9", "r10", "r11", "memory", "cc",
               "xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6");

In other phrases the compiler assumes that, in calling the characteristic,
fifteen separate registers and all memory shall be overwritten. Verify
the Machine V
for added essential aspects. This might maybe per chance per chance very effectively be problematic for
typically-called capabilities equivalent to memcpy, because it inhibits many
optimizations and it tosses a wrench in the compiler register
allocation algorithm, thus causing stack spillage which extra
degrades efficiency whereas bloating the output binary dimension.

So what Cosmopolitan does for memcpy() and heaps other
typically-called core library leaf capabilities, is defining a easy
macro wrapper, which tells the compiler the finest subset of the abi
that’s in truth wished, e.g.

#account for memcpy(DEST, SRC, N) ({       
  void *Dest=(DEST);                
  void *Src=(SRC);                  
  size_t Dimension=(N);                  
  asm("call memcpy"                   
      : "=m"(*(char[Size][Size])(Dest))  
      : "D"(Dest), "S"(Src), "d"(n),  
      : "rcx", "xmm3", "xmm4", "cc"); 

What this device, is that Cosmopolitan memcpy() is no longer merely hasty, it
moreover makes unrelated code in the capabilities that call it faster too as
a facet-enact. When this machine became as soon as first utilized for memcpy()
by myself, many of the capabilities in the Cosmopolitan codebase had their
generated code dimension diminished by a third.

For an instance of one such characteristic, spend into consideration strlcpy,
which is the BSD formula of asserting strcpy:

/Copies string, the BSD formula.
 @param d is buffer which needn't be initialized
 @param s is a NUL-terminated string
 @param n is byte skill of d
 @return strlen(s)
 @display d and s can not overlap
 @display we get memccpy()
size_t strlcpy(char *d, const char *s, size_t n) {
  size_t slen, staunch;
  if (n) {
    staunch=MIN(n - 1, slen);
    memcpy(d, s, staunch);
  return slen;

If we bring collectively our strlcpy characteristic, then right here’s the
assembly code that the compiler outputs:

/ compiled with faded libc
	push	%rbp
	mov	%rsp,%rbp
	push	%r14
	mov	%rsi,%r14
	push	%r13
	mov	%rdi,%r13
	mov	%rsi,%rdi
	push	%r12
	push	%rbx
	mov	%rdx,%rbx
	call	strlen
	mov	%rax,%r12
	take a look at	%rbx,%rbx
	jne	1f
	pop	%rbx
	mov	%r12,%rax
	pop	%r12
	pop	%r13
	pop	%r14
	pop	%rbp
1:	cmp	%rbx,%rax
	mov	%r14,%rsi
	mov	%r13,%rdi
	cmovbe	%rax,%rbx
	mov	%rbx,%rdx
	call	memcpy
	movb	$0,0(%r13,%rbx)
	mov	%r12,%rax
	pop	%rbx
	pop	%r12
	pop	%r13
	pop	%r14
	pop	%rbp
	.endfn	strlcpy,globl
/ compiled with cosmopolitan libc
	mov	%rdx,%r8
	mov	%rdi,%r9
	mov	%rsi,%rdi
	call	strlen
	take a look at	%r8,%r8
	je	1f
	cmp	%r8,%rax
	lea	-1(%r8),%rdx
	mov	%r9,%rdi
	cmova	%rax,%rdx
	call	MemCpy
	movb	$0,(%r9,%rdx)
1:	ret
	.endfn	strlcpy,globl

That is a enormous improvement in generated code dimension. The above two
compiles faded the same gcc flags and no changes to the code wished to
be made. All that changed became as soon as we faded cosmopolitan.h (as an different of the
platform c library string.h) which includes ABI specialization macros
for memcpy and strlen. It be a enormous instance
of how merely picking a better C library can systemically acquire rid of
bloat throughout your entire codebase.

Read More

Recent Content