Josef “Jeff” Sipek

Exclusive Or Character

A couple of years ago I blogged about the CCS instruction in the Apollo Guidance Computer. Today I want to tell you about the XC instruction from the System/360 ISA.

Many ISAs have some sort of xor instruction. The 360 is no different. It offers several different xor instructions which differ in the type of operands that they operate on. In all cases, the operation they perform could be summarized as (using C syntax):

A ^= B;

That is one of the operands is used as both a source and a destination.

There are the boring X (reg ^= memory), XR (reg ^= reg), and XI (reg ^= immediate). Then there is XC which is what inspired this post. XC, or Exclusive Or Character, takes two memory locations and a length and performs what appears as byte by byte xor of the two buffers. (The hardware is smart enough to operate on bigger chunks of memory but the effect is as if it was done byte at a time.) In assembly XC looks like:

XC d1(l,b1),d2(b2)

The d are 12-bit unsigned displacements while the b specify the registers with the base address. For each of the operands the actual address is dX plus the value of the bX register. The l is a length field which encodes a length between 1 and 256.

To use more C pseudocode, XC does:

void XC(unsigned char *op1, size_t len, unsigned char *op2)
	while (len--) {
		*op1 ^= *op2;

(This pseudo code ignores the condition code calculation and exception generation which are not relevant to the discussion.)

This by itself is neat but not every exciting…until you remember that xor can be used to zero out a register. You can use XC to zero out up to 256 bytes of memory. It turns out this idiom is used pretty often in handwritten assembly, and compilers such as gcc even produce such instructions without any special effort on the programmer’s behalf.

For example, in HVF I have this line:

memset(&psw, 0, sizeof(struct psw));

Which GCC helpfully turns into (struct psw is 16 bytes in size):

xc      160(16,%r15),160(%r15)

When I first saw that line in the disassembly of HVF years ago, it blew my mind. It is elegant, fast thanks to the microarchitecture optimizations, and once you are used to the idiom it is clear about what it does. I hope your mind was as blown as mine. Till next time!


Radium-Schokolade (1925) — Radium-laced Chocolate. Sold as something to rejuvenate your organs by eating or drinking it.

Untraceable communication — guaranteed — New untraceable text-messaging system comes with statistical guarantees.

Normalization of Deviance in Software

Michigan Terminal System Archive — It is good to see Wikipedia article: MTS live on as a historical curiosity and hobbyist OS.

Researchers uncover JavaScript-based ransomware-as-service — Ransomware-as-a-service…sigh.

imap4 partial fetch request — Sadly, mutt still doesn’t have it. I really don’t enjoy waiting for a large attachment to get downloaded over a slow link just because I want to read the email body.

Mathematicians invent new way to slice pizza into exotic shapes — I am not sure how some of those new shapes can possibly work in the real life without the notches essentially splitting the slice into a pile of mush that cannot be held.

dis(1): support for System/370, System/390, and z/Architecture ELF bins

A few months ago, I came to the conclusion that it would be both fun and educational to add a new disassembler backend to libdisasm—the disassembler library in Illumos. Being a mainframe fan, I decided that implementing a System/390 and z/Architecture disassembler would be fun (I’ve done it before in HVF).

At first, I was targetting only the 390 and z/Architecture, but given that the System/370 is a trivial (almost) subset of the 390 (and there is a spec for 370 ELF files!), I ended up including the 370 support as well.

It took a while to get the code written (z/Architecture has so many instructions!) and reviewed, but it finally happened… the commit just landed in the repository.

If you get the latest Illumos bits, you’ll be able to disassemble 370, 390, and z/Architecture binaries with style. For example:

$ dis -F strcmp hvf             
disassembly for hvf

    strcmp:      a7 19 00 00        lghi    %r1,0
    strcmp+0x4:  a7 f4 00 08        j       0x111aec
    strcmp+0x8:  a7 1b 00 01        aghi    %r1,1
    strcmp+0xc:  b9 02 00 55        ltgr    %r5,%r5
    strcmp+0x10: a7 84 00 17        je      0x111b16
    strcmp+0x14: e3 51 20 00 00 90  llgc    %r5,0(%r1,%r2)
    strcmp+0x1a: e3 41 30 00 00 90  llgc    %r4,0(%r1,%r3)
    strcmp+0x20: 18 05              lr      %r0,%r5
    strcmp+0x22: 1b 04              sr      %r0,%r4
    strcmp+0x24: 18 40              lr      %r4,%r0
    strcmp+0x26: a7 41 00 ff        tmll    %r4,255
    strcmp+0x2a: a7 84 ff ef        je      0x111ae0
    strcmp+0x2e: 18 20              lr      %r2,%r0
    strcmp+0x30: 89 20 00 18        sll     %r2,%r0,24(%r0)
    strcmp+0x34: 8a 20 00 18        sra     %r2,%r0,24(%r0)
    strcmp+0x38: b9 14 00 22        lgfr    %r2,%r2
    strcmp+0x3c: 07 fe              br      %r14
    strcmp+0x3e: a7 28 00 00        lhi     %r2,0
    strcmp+0x42: b9 14 00 22        lgfr    %r2,%r2
    strcmp+0x46: 07 fe              br      %r14

I am hoping that this will help document all the places needed to change when adding support for a new ISA to libdisasm.

Happy disassembling!


Wikipedia article: Interchange — This definitely reminds me of xkcd: Highway Engineer Pranks. At the same time, it is fascinating how there is a whole set of standard interchanges.

DxOMark — Very in-depth reviews of SLR lenses and bodies.

Lisp as the Maxwell’s equations of software — Reading this has rekindled my interest in Lisp and Scheme.

This Man Has Been Trying to Live Life as a Goat

What’s going on with a Python assignment puzzle — As a C programmer, this is totally counter-intuitive to me.

Internet Mainframes Project — Screenshots of Wikipedia article: 3270 login screens of tons of internet facing mainframes.

The Case for Teaching Ignorance


I’m going to try something new. Instead of sharing individual links per post as I come across them, I’m going to try to dump them whenever I have enough of them. It does mean that some of these links aren’t as “hot off the press”. Here’s the first batch.

How We’re Predicting AI — or Failing To

How Typography Shapes Our Perception Of Truth

Bitcoin mining on a 55 year old IBM 1401 mainframe: 80 seconds per hash

What is the difference between an “aggregate” and other kinds of “modified versions”?

SourceForge grabs GIMP for Windows’ account, wraps installer in bundle-pushing adware

Wikipedia article: Speed tape looks like duct tape but isn’t.

Working with Wide Characters

Two weekends ago, I happened to stumble into a situation where I had a use for wide characters. Since I’ve never dealt with them before, it was an interesting experience. I’m hoping to document some of my thoughts and discoveries in this post.

As you may have guessed, I am using OpenIndiana for development so excuse me if I happen to stray from straight up POSIX in favor of Illumos-flavored POSIX.

The program I was working with happens to read a bunch of strings. It then does some mangling on these strings — specifically, it (1) converts these strings between Unicode and EBCDIC, and (2) at times it needs to uppercase a Unicode character. (Yes, technically the Unicode to EBCDIC conversion is lossy since EBCDIC doesn’t have all possible Unicode characters. Practically, the program only cares about a subset of Unicode characters and those all appear in EBCDIC.)

In the past, most of the code I wrote dealt with Unicode by just assuming the world was ASCII. This approach allows UTF-8 to just work in most cases. Assuming you don’t want to mangle the strings in any major way, you’ll be just fine. Concatenation (strcat), ASCII character search (strchr), and substring search (strstr) all work perfectly fine. While other functions will do the wrong thing (e.g., strlen will return number of bytes, not number of characters).

Converting an ASCII string to EBCDIC is pretty easy. For each input character (aka. each input byte), do a lookup in a 256-element array. The output is just a concatenation of all the looked up values.

This simple approach falls apart if the input is UTF-8. There, some characters (e.g., ö) take up multiple bytes (e.g., c3 b6). Iterating over the input bytes won’t work. One way to deal with this is to process as many bytes as necessary to get a full character (1 for ASCII characters, 2–6 for “non-ASCII” Unicode characters), and then covert/uppercase/whatever it instead of the raw bytes. This sort of hoop-jumping is necessary whenever one wants to process characters instead of bytes.


Another way to deal with this is to store the string as something other than UTF-8. I took this approach. When the program reads in a (UTF-8) string, it promptly converts it into a wide character string. In other words, instead of my strings being char *, they are wchar_t *. On my system, wchar_t is a 32-bit unsigned integer. This trivially makes all Unicode characters the same length — 32 bits. I can go back to assuming that one element of my string corresponds to a single character. I just need to keep in mind that a single character is not one byte. In practice, this means remembering to malloc more memory than before. In other words:

wchar_t *a, *b;

a = malloc(MAX_LEN);                   /* WRONG */
b = malloc(sizeof(wchar_t) * MAX_LEN); /* CORRECT */

Uppercasing a character becomes just as easy as it was with plain ol’ ASCII. For example, to uppercase the $i^{th}$ letter in a string:

void uppercase_nth(wchar_t *str, int i)
	str[i] = toupper(str[i]);

There are however some downsides. First and foremost, if you are dealing mostly with ASCII, then your memory footprint may have just quadrupled. (In my case, the program is so small that I don’t care about the memory footprint increase.) Second, you have to deal with a couple of “silly” syntax to make the (C99) compiler realize what it is you are attempting to do.

const wchar_t *msg = L"Lorem ipsum";
const wchar_t *letter = L'x';

“str” functions

Arguably, the most visible change involves the “str” functions. With plain old ASCII strings, you use functions like strlen, strcpy, and strcat to, respectively, get the length, copy a string, and concatenate two strings. These functions assume that each byte is a character and that the string is terminated by a null (8-bit 0) so they do not work in the world of wide characters. (Keep in mind that since ASCII consists of characters with values less than 128, a 32-bit integer with that value will have three null bytes in most characters (assuming ASCII text). On big endian systems, you’ll end up with the empty string, while on little endian systems you’ll end up with a string consisting of just the first character.) Thankfully, there are alternatives to the “str” functions that know how to deal with wide character strings — the “ws” functions. Instead of using strlen, strcpy, and strcat, you want to call wslen, wscpy, and wscat. There are of course more. On Illumos, you can look at the wcstring(3c) manpage for many (but not all!) of them.

printf & friends

Manipulating strings solely with the “str” functions is tedious. Often enough, it is so much simpler to reach for the venerable printf. This is where things get really interesting. The printf family of functions knows how to convert between char * strings and wchar_t * strings. First of all, let’s take a look at snprintf (the same applies to printf and sprintf). Here’s a simple code snippet that dumps a string into a char array. The output is char *, the format string is char *, and the string input is also char *.

char output[1024];
char *s = "abc";

snprintf(output, sizeof(output), "foo %s bar\n", s);

One can use %ls to let snprintf know that the corresponding input string is a wide character string. snprintf will do everything the same, except it transparently converts the wide character string into a regular string before outputting it. For example:

char output[1024];
wchar_t *s = L"abc";

snprintf(output, sizeof(output), "foo %ls bar\n", s);

Will produce the same output as the previous code snippet.

Now, what if you want the output to be a wide character string? Simple, use the wprintf functions! There are fwprintf, wprintf, and swprintf which correspond to fprintf, printf, and snprintf. Do note that the wide-character versions want the format string to be a wide character string. As far as the format string is concerned, the same rules apply as before — %s for char * input and %ls for wchar_t * input:

wchar_t output[1024];
wchar_t *s1 = L"abc";
char *s2 = "abc";

swprintf(output, sizeof(output), L"foo %ls %s bar\n", s1, s2);

Caution! In addition to swprintf there is also wsprintf. This one takes the format string in char * but outputs into a wchar_t * buffer.

Here’s the same information, in a tabular form. The input string type is always determined by the format string contents — %s for char * input and %ls for wchar_t * input:

Function Output Format string
printf, sprintf, snprintf, fprintf char * char *
wprintf, swprintf, fwprintf wchar_t * wchar_t *
wsprintf wchar_t * char *

setlocale and Summary

Oh, I almost forgot! You should call setlocale before you start using all these features.

So, to conclude, it is pretty easy to use wide character strings.

  • #include <wchar.h>
  • #include <widec.h>
  • call setlocale in your main
  • use wchar_t instead of char
  • use %ls in format strings instead of %s
  • use L string literal prefix
  • beware of wsprintf and swprintf

I wouldn’t want to deal with this sort of code on daily basis, but for a random side project it isn’t so bad. I do like the ability to not worry about the encoding — the 1:1 mapping of characters to array elements is really convenient.

Happy 50th, System/360

It’s been a while since I blahged about mainframes. Rest assured, I’m still a huge fan, I’m just preoccupied with other things to continuously extoll their virtues.

The reason I’m writing today is because it is the 50th anniversary of the System/360 announcement. Aside from the “50 years already?” sentiment, I have a couple of images to share. (I found these several years ago on someone’s GeoCities site. It’s a good thing I made a mirror :) )

I also came across this video from 1964:

zEnterprise 196

I’m rather late with this post, but here it goes anyway. In August 2010, IBM announced a new mainframe — the zEnterprise 196.

At first, I wasn’t sure if that was supposed to be a z10 replacement, or if it was an entry level system like the Wikipedia article: Multiprise was back in the ESA/390 days. It turns out that it is a z10 replacement.

The system looks like a z10 from the outside, on the inside it’s different of course.

The specs for a fully configured system are:

  • 80 customer cores @ 5.2GHz (96 cores total, the 16 extra ones take care of system stuff as well as provide spare cores for failover)
  • 3TB of memory (z10 has 1.5TB limit)

The big new thing is the addition of the zBX. Now, it is possible to have your mainframe hooked up to several POWER (and soon x86) BladeCenters working as one system. I don’t really know much about how this part of the system works, but it sure looks interesting.

z/VOS - running x86 code on z

Earlier this year, I heard of a company that tried to make a product out of dynamic binary translation of x86 code to Wikipedia article: z/Architecture. Recently, I decided to look at what they do.

The company is called Mantissa Corporation, and their binary translation product is called z/VOS.

Much like VMWARE, they cache the translated code, in z/VOS’s case it’s really a must otherwise I’d guess the cost of traslation would make the result unusable. I like how they used VNC (see the demo mentioned below) to give the virtual x86 box a display.

There is an official blog that has some interesting bits of information. For example, they hint at how they use multiple address spaces to give a the x86 code the illusion of virtual memory. I am not quite sure why they list Wikipedia article: Decimal Floating Point facility as a requirement. Unfortunately, it has been a few months since the last update.

Their website also happens to have a demo of a small x86 assembly operating system starting up and running under z/VOS. I find this fascinating.


Few days ago, a new company was created: TurboHercules.

As the name implies, they package up Hercules (an IBM mainframe emulator), and provide support for it. They are targetting the platform as a disaster recovery solution.

It shouldn’t directly affect the open source project in a negative way (just like Red Hat cannot prevent people from continuing their work on the Linux Kernel). At the same time, it’ll change the way people look at Hercules.

Powered by blahgd