Just arranging bits

By Paul Gobée

My job is arranging bits. Just: arranging bits. No more. You know: these 1's and 0's of which computer stuff is made. That's what my boss pays me for: for organising 1's and 0's. Not that I go to my boss and deliver a pile of papers written al over with 11001010 10100100 10110101 00101010 01010010 10011000 10010110 01001001 's etc. Also I don't organise all these bits by hand. I use code-editors, image editors, text editors, and what I deliver are web pages, images, texts, etc. To the superficial observer it might seem I develop medical web based e-learning apps. But basically, what I deliver is just a certain organisation, an organisation of bits (on a hard disc, on a flash drive, etc.). Not only my job, but billions of people's job is ordering of bits. Your's too probably. Office workers, writers, photographers, computer artists, movie stars, journalists, IT workers, recording musicians, we all share the same job: arranging bits. Doesn't seem too fancy or ' Wow', does it?

Some work is far more impressive. My father is a civil engineer. He built harbours around the world. Big stuff, really big stuff. Huge slabs of rock solid concrete. You can see his work from great heights on Google Earth, decades after it was constructed. That is real stuff. Not just some volatile arrangement of magnetic particles invisible to the eye.

Nevertheless, arrangements of little units are more powerful than you might think. All written texts and books are, in fact, merely arrangements of the little units named letters. To take it a level deeper: all of nature is based on an arrangement of little units. Not bits this time, but molecules named 'bases' whose names are shorthanded by the letters A, T, C and G. These are arranged in long strings in a way like ATTCTCTCGACCAGCTCGTTACGTACCGTACGTT... (think of this continued for some thousands of pages). A thrilling read? You bet! These strings are DNA, determining how everything, down from the bacterium causing you stomach ache (couldn't they have skipped that one?) up to us people, look like, function, and up to a certain level, behave.

Bits, bases and letters

It's fascinating to see parallels between these 'languages' of texts, nature and computers and to see how their worlds are even coming closer together. The 'basic units' of their 'alphabets' are respectively letters, bases and bits. While the natural alphabet uses 26 different letters; DNA only has the mentioned four base variants A, T, C, G; and in the digital world the variance is cut down to the probably absolute minimum, the two states of a bit 0 and 1. Interestingly, both computer bits and DNA bases shorthand to "b", so gene sizes are expressed in similar units - kb, Mb, Gb - as you know from the files on your computer. Genes are said to be 'so and so' kb large (examples). The human DNA is 'written' in 3 billion bases, so its size is 3 Gb. To put that in context, that is as much bases as there are letters in the pile of telephone books on the photo. In a different perspective, that amount of bytes fits on a 2009 average-level USB pendrive. But we have the DNA with that amount of bases in each of our cells! Nature still wins by lengths in miniaturisation...


This pile of telephone books contain 3 billion letters - as much as there are bases in each human cell's DNA
Courtesy: The Museum of Communication, the Hague, for letting me use their collection of telephone books to make this photo.


The power of arrangements

The power of the arrangements of letters or bits is of course not in the literal letters, but in the meaning their arrangement conveys in real life. Let's again consider the three mentioned forms; texts, DNA and computer bits. Texts bring us thoughts and knowledge. They may even influence complete societies; think of Holy Books as the Bible or the Qur'an. Definitely powerful. Next DNA. It steers all life, need more be said? And computer bits may be transformed into meaningful things like pictures, music, video, prints, web sites. Half of our present world 'runs' on bits.

Looking in the future, computer bits literally may even get a new dimension. Regarding prints we usually think of 2D prints, but 3D prints are coming. Most 3D printers are still very expensive, but cheaper prototypes are being made for a few hundred dollars already. 3D printing means you really print things. It might for instance, allow you to finally get a replacement for that broken lid of your beloved but long sold out tea pot. Go on the web, download the 'drawing' and print it! To take it even further: experiments are being done with tissue printers printing human tissues. Imagine the future. Need a new heart valve? Your doctor will print one for you! Okay, we're not yet there, a couple of techniques need to be sorted out a bit further - stem cell techniques, tissue engineering, etc. but it's not unimaginable. The 'recipes' of the 3D prints and of that printed human tissues are once again stored in some arrangement of bits. Got how powerful arrangements are?

It is interesting to see how these types of arrangements may mix. The 'recipe' for the printed human tissue in the previous section (at least for its structural organisation) is written in bits. Up till now, the 'recipes' for human tissues were only written in DNA. DNA and bits coming close...

Of the three types of carriers of arrangements (letters, bases, bits), bits probably are the most powerful, as they are the most abstracted down, to only two variants (1 and 0); they are storable on rewritable media, and they may be sent across the world in a split second. Reasons why the other arrangements, like texts, are 'translated' into bits.

Does your meaning mean the same to me?

So it's clear how powerful arrangements of bits are. They may store and transport and distribute ideas, knowledge, ideologies, pictures, sound and film, in the long term materials and things, and finally maybe even scraps of living matter. Impressive! But be aware, threats and pitfalls are on the way. To clarify that we have to go back to basics for a moment. Ones and zeros in themselves don't mean anything. We must agree upon meanings for certain combinations of ones and zeros. Should 01100100 01101111 01100111 mean 'dog' or 'cat'? (the answer= is 'dog' See: binary alphabet). Just like we have agreed in English that the combined string of the letters 't,r,a,i,n' means 'a machine that comes late but brings you somewhere'. A whole set of such agreements and you have a language. The computer world equivalent is an encoding or a computer language. Ah! You get problem number one. In practice, we don't all give the same combination of ones and zeros (or combination of greater clusters) the same meaning. Because we think we do it smarter than someone else. Or because we don't one want someone else to read our secrets. Or just because we didn't know someone else made a similar language. But if you speak English and I speak Chinese, we have a problem... In computer terms: I've got my great Mac application, but we happen to have only Windows machines here. Oops...we don't speak each others languages! Extinct languages are the next problem: if all my books are written in, let's say, Kwasanopi, but at a certain moment nobody knows Kwasanopi any more, gone is the content of my books. A digital analogue: it seems the original recordings of the Apollo spaceflights cannot be read anymore as the machinery that can handle the encoding in which they're stored, has gone lost.

Let's speak the same language!

To prevent these problems the solution is to agree upon standards for encodings. Or simply put, to agree 'To say this, use that word'. The web wouldn't have been the succes it is now, if we wouldn't have had the standards that underlie it, and that are used across the world: http, tcp/ip, html. Imagine the mess if there would have been five 'languages' for each of these three standards. Browsing the web would mean you would need 5 x 5 x 5 = 125 browsers, of which each could only access 1/125th of all web sites. For each site you would need to figure out which browser to use, you would need to switch browsers all the time, not an easy ride... No thank you, let's speak the same language, and standardise it.

In daily life, both a language that everyone agrees upon, or that everyone uses, may be regarded as a standard. In the category 'agreed upon' are standards which are officially released by international standardisation organisations, like the aforementioned web standards. In the category 'used by everyone' are code languages that have grown to a 'semi-standards'-status due to their widespread use. For instance, the encodings of your Word document or PowerPoint presentation. Does the difference matter? As long as there's one widely used encoding, everything works, so we're happy. Sure?

Encoding is power

Realise how powerful the party is that owns such a widely used encoding or standard. Everyone is dependent on its programs to write (encode) and read (decode) materials. It may charge what it likes. If it declares "Sorry, your file can't be read with the old program anymore, the old program has become unsafe, etc., you'll need to buy a new program", then you'll have no choice but getting the new program, because years of your work is stored in its encoding. Competitors hardly gain a chance, as who would want a program that can't read the majority of documents around, due to their secret encoding? These dependencies are just a starter. Control of code ultimately may allow controlling access to information, goods, to the ability to create content, to publish, to communicate. Furthermore, a party owning a standard, may determine how the standard (and its possibilities) evolves. As long as they're only office documents, well... But if it's encodings of human tissues, medicines, matters of life and death.... And things would become outright dangerous if a malign regime would gain control over an ubiquitous encoding. Hence, the important thing is not only speaking the same language, but it's who owns the language. Here lies the flaw in the analogon with natural languages, which wrongly leads us to think that it is okay just if it is used widely: natural languages are open and owned by nobody. Computer encodings not necessarily. In the digital age, the long-standing credo 'knowledge is power' has thus transformed into 'encoding is power'.

Open Standards and Open Source

Hence, probably even more important than having standards, is that they're open. At least for vital things in life and society, standards for encodings shouldn't be secret and shouldn't be owned by either a specific company or a government. They should be owned by nobody, or differently said, by everybody. Just like the alphabet. Anyone may learn the alphabet, anyone may use it (to write something), and no one has to pay anybody for doing so. Some great things come for free without us ever realizing... Bet you never realized how great it is that huge investments (education) are made to learn everybody the alphabet and that we may nevertheless use that vital tool to communicate for free. Hmmm, that would be great business though, if everybody would owe me a cent for each written or spoken word...

In line with how data are encoded to store them, lies the coding of programs that make these data do something useful, the so-called source code of software. If that encoding is secret, you're still at the mercy of the holder of the key. If however you can see what happens under the hood and may also adapt it to your needs, that is Open Source Software. Open Standards and Open Source may be regarded brothers. In the digital age, code is the major carrier of value. Open Standards and Open Source ensure access to this value, hence are vital for fair sharing of opportunities, power and wealth. Let's take close care to foster them.

My job is arranging bits. They're mighty wonderful stuff.


This work is under Creative Commons license: Attribute –Non-Commercial –Share Alike see: http://creativecommons.org/licenses/by-nc-sa/3.0/