[CSDb] - User Forums - On the relative pros and cons of various RLE schemes

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > On the relative pros and cons of various RLE schemes

2024-04-21 17:12

ChristopherJam

Registered: Aug 2004
Posts: 1427

On the relative pros and cons of various RLE schemes

So over in Native crunch/decrunch code. I just suggested encoding
abbcdddeffffg as
abb{0}cdd{1}eff{2}g

Raistlin kindly liked the idea, Krill quite sensibly didn't want us to hijack the post and suggested a new post for the new topic. so... Have at it!

2024-04-21 17:13

ChristopherJam

Registered: Aug 2004
Posts: 1427

(oh, addendum - whenever the count maxes out, there's no literal following, just another count - which could be zero. This also makes it quite easy to play with different count widths, (eg nybbles or bitpairs) without going down the huffman or arbitrary precision integer routes)

2024-04-21 17:24

Krill

Registered: Apr 2002
Posts: 3103

As RLE isn't very crunchy to begin with, i'd suggest to focus on other things affecting performance,
such as whether the scheme allows for stream encoding (entire input not known beforehand) for a local optimum within this scheme,
whether or not it's a bit- rather than bytestream,
and how big a 6502 decompressor would be.

2024-04-21 17:35

ChristopherJam

Registered: Aug 2004
Posts: 1427

I must admit, I came up with the scheme above partly because I got most of the way through implementing a cruncher for the usual escape byte based scheme and thought there must be a simpler way - so if anything I was (vaguely) optimising for code complexity - which presumably maps to code size to some extent.

2024-04-21 19:44

tlr

Registered: Sep 2003
Posts: 1822

Quoting ChristopherJam

So over in Native crunch/decrunch code. I just suggested encoding
abbcdddeffffg as
abb{0}cdd{1}eff{2}g

Isn't this going to be expensive for certain inputs containing lots of byte pairs?

2024-04-21 20:29

ChristopherJam

Registered: Aug 2004
Posts: 1427

That depends how expensive your encoding for a zero additional repeat counts is. But sure, every compression scheme is vulnerable to certain inputs getting bigger instead of smaller, otherwise you could just keep recrunching to get arbitrarily small outputs.

2024-04-21 20:41

tlr

Registered: Sep 2003
Posts: 1822

but here you'll be vulnerable to something not too uncommon. I think for a run count represented with a regular byte, the escape code method is going to be smaller than both this and the lit/run chunk with count method for almost all normal c64 input.

2024-04-21 21:27

CyberBrain
Administrator

Posts: 392

Quote: So over in Native crunch/decrunch code. I just suggested encoding
abbcdddeffffg as
abb{0}cdd{1}eff{2}g

Raistlin kindly liked the idea, Krill quite sensibly didn't want us to hijack the post and suggested a new post for the new topic. so... Have at it!

That looks like the encoding scheme from the codebase64 article? (https://codebase64.org/doku.php?id=base:rle_pack_unpack)

Yes, that way of encoding it, wastes one byte for each 2-byte repeat (starts saving bytes for 4-byte repeats and longer), so could be bad if there are too many of those in the input data

(I actually used that one in my latest demo to get room for more pics in memory, and this ultra simple RLE-based packing, together with delta-encoding, worked surprisingly (to me) well - it saved multiple $100s of bytes off some pics (they did have large background areas, but still), so the 2-repeats doesn't have to be a problem)

It could be interesting to hear about the other ways of representing the bytes of RLE-encoding, which are used on the C64.
For example, what is this "the usual escape byte based scheme" you guys have mentioned in this and the other thread? And the counter representation?

2024-04-21 21:35

tlr

Registered: Sep 2003
Posts: 1822

Quoting CyberBrain

It could be interesting to hear about the other ways of representing the bytes of RLE-encoding, which are used on the C64.
For example, what is this "the usual escape byte based scheme" you guys have mentioned in this and the other thread? And the counter representation?

escape byte (typical example):

<byte>               - emit byte
<esc> <count> <byte> - emit count * byte
<esc> $00            - emit 'esc' byte

the esc byte is selected as the least used byte in the data (with some optimizations possible).

lit/run chunk (typical example):

$01-$7f, 1-127 * <byte> - emit 1-127 literal bytes
$81-$ff, <byte>         - emit 1-127 * byte  
$00                     - end of stream

2024-04-21 22:02

Krill

Registered: Apr 2002
Posts: 3103

Quoting tlr

but here you'll be vulnerable to something not too uncommon. I think for a run count represented with a regular byte, the escape code method is going to be smaller than both this and the lit/run chunk with count method for almost all normal c64 input.

As RLE isn't that crunchy anyways, performance should be the main regard (where crunchiness plays a rather weak role, but it still has some impact).

Some test corpus wouldn't hurt, ofc. Maybe the good old Bitfire benchmark files?

2024-04-21 22:06

CyberBrain
Administrator

Posts: 392

<Post edited by CyberBrain on 22/4-2024 23:52>

Cool, thank you, tlr!

Ah, ok, i can see how the "Escape byte" representation doesn't have to waste anything for 2-byte repeats (but requires an extra pass over the input data to find the least used byte when packing).

If understand the "lit/run chunk" representation correctly, it looks like it wastes a byte on each literal-chunk (so at least 2 bytes per 256 packed bytes, as far as i can tell). Looks like this could struggle a bit if there are many short repeats mixed in between non-repeating data. Edit: No, it couldn't - fake news.

Those are the 3 standard representations on C64?

2024-04-21 22:09

tlr

Registered: Sep 2003
Posts: 1822

Quoting CyberBrain

Those are the 3 standard representations on C64?

I'd say escape byte is by far the most common. I've seldom seen the lit/run chunk, but it's used in G-Packer V1.0 and also in Amiga IFF ILBM files IIRC. I've never seen the one cjam spoke of in the wild.

2024-04-21 22:11

Bansai

Registered: Feb 2023
Posts: 54

In the absence of an escape byte that needs to be consulted every input byte for driving decisions in decoding, you could still have a *mode* escape of sorts where a reserved repeat count value or signaling bit in the length indicates something like "directly copy [arbitrary number of] bytes from source to dest" or there is a some form of encoded RLE skip/copy count that follows the reserved value. This puts more complexity in the compressor, but the decode is still fairly straightforward.

In an input stream if the value XX refers to the same value:

XX XX RC

if RC is positive, emit f(RC) more repeats of XX, for example, RC+1 is the repeat count, but this could grow to include large repeat counts to handle long runs of zeros, etc.
otherwise directly copy the number of bytes that is a function of -RC.

2024-04-22 00:27

Raistlin

Registered: Mar 2007
Posts: 785

Isn’t it more common to use a bit to define “constant or not”?

$00-7F .. count of non-repeats ($01-$80)
$80-FF .. count of repeats ($04-$83)

There’s no point of course in having 1-2 repeats. I’m not sure there’s much point in 3 either, definitely not if breaking from a non-repeat segment to present them?

This also makes the depack code a little simpler as you simply check with BPL/BMI.

2024-04-22 02:38

Fungus

Registered: Sep 2002
Posts: 756

I did a lot of RLE work in the past, and I found that output that gave the most equal sequences was best for crunching afterwards. Of course there are many different packer implementations but almost all of them either use one or the other method.

For my own I chose to use scan codes, using the least used byte in the file as the code to have minimum expansion. Some packers used more than one code to represent hard coded lengths which were scanned for in order to shorten the encoding.

Examples like

(code)(byte) for runs of 3
or
(code)(byte) where the code was a lookup table for various lengths

or simply multiple codes, but that always added more expansion.

I chose to limit runs to no more than 256 bytes, so for a length of say 4095 bytes the output would be

(code)(byte)(00)(code)(byte)(00)(code)(byte)(00)(code)(byte)(ff)

This gave an output that crunched better than other packers which would give

(code)(01=16bit)(lengthlow)(lengthhi)(byte)

or similar...

depends on use case of course.

I never did look into super zipper v8 or similar to see what they were doing though.

2024-04-22 03:58

ChristopherJam

Registered: Aug 2004
Posts: 1427

Quoting CyberBrain

That looks like the encoding scheme from the codebase64 article? (https://codebase64.org/doku.php?id=base:rle_pack_unpack)

Oh! Yes, it's almost exactly that, except I didn't think to reserve one count value as an end-of-data marker.

Quote:

Yes, that way of encoding it, wastes one byte for each 2-byte repeat (starts saving bytes for 4-byte repeats and longer), so could be bad if there are too many of those in the input data

Yeah it's a size loss if there are more 2-byte repeats than occurrences of a potential escape byte value (assuming counts are stored as whole bytes of course). But the implementation is nice and simple.

Quote:

(I actually used that one in my latest demo to get room for more pics in memory, and this ultra simple RLE-based packing, together with delta-encoding, worked surprisingly (to me) well - it saved multiple $100s of bytes off some pics (they did have large background areas, but still), so the 2-repeats doesn't have to be a problem)

Nice!

Quote:

It could be interesting to hear about the other ways of representing the bytes of RLE-encoding, which are used on the C64.
For example, what is this "the usual escape byte based scheme" you guys have mentioned in this and the other thread? And the counter representation?

Oh, things like (esc)(byte)(repeat count), where (esc) is the least commonly occurring byte in the source data. It does mean than any occurrences of the escape byte are inflated.

As for counter representation, most RLE implementations use a whole byte for the repeat count, but you can do things like read a smaller number of bits to make the repeat counts more compact, much as that slows decrunch a bit (you still keep the literals byte aligned of course). I do like that idea Fungus mentioned of having a few different escape codes for common repeat counts.

2024-04-22 04:43

Fungus

Registered: Sep 2002
Posts: 756

In the case of my packer I used esc,esc to represent a literal. It still caused expansion, but it also was a common sequence that would be crunched by whatever lz compressor I was using at the time, either byteboiler/level ab2, or level crusher.

The esc,esc worked for the other encoding way too.

Lightemizer and Oneway's packers would be interesting to look into these days. Back then I was trying to understand algos and make my own rather than using other peoples code. But you don't get accused of ripping anymore so looking at stuff is OK these days.

2024-04-22 07:57

Krill

Registered: Apr 2002
Posts: 3103

Quoting Fungus

But you don't get accused of ripping anymore so looking at stuff is OK these days.

Ripping and looking at other people's code are different things, though. =)

2024-04-22 09:25

The Human Code Machine

Registered: Sep 2005
Posts: 114

Here's an exmaple of a byte packer using different pack codes: https://codebase64.org/doku.php?id=base:mdg_bytesmasher_0.04&s[..

2024-04-22 13:54

JackAsser

Registered: Jun 2002
Posts: 2038

Quote: That depends how expensive your encoding for a zero additional repeat counts is. But sure, every compression scheme is vulnerable to certain inputs getting bigger instead of smaller, otherwise you could just keep recrunching to get arbitrarily small outputs.

Like Dinosours?

2024-04-22 15:42

Krill

Registered: Apr 2002
Posts: 3103

Quoting JackAsser

Like Dinosours?

Dinasours? Which magical infinite cruncher are you referring to? =)

2024-04-22 15:48

chatGPZ

Registered: Dec 2001
Posts: 11540

did someone mention packers.txt yet?

2024-04-22 19:33

tlr

Registered: Sep 2003
Posts: 1822

Quoting Raistlin

Isn’t it more common to use a bit to define “constant or not”?

$00-7F .. count of non-repeats ($01-$80)
$80-FF .. count of repeats ($04-$83)

Do you have any other examples of packers implementing this? I haven't seen that many.

I think the reason it's uncommon (on the C64) because it doesn't pack as well, and size was probably the main driver for the development of packers/crunchers. Decompression speed came much later.

2024-04-22 20:28

Krill

Registered: Apr 2002
Posts: 3103

Quoting tlr

I think the reason it's uncommon (on the C64) because it doesn't pack as well, and size was probably the main driver for the development of packers/crunchers. Decompression speed came much later.

I think it's uncommon because everybody just copied everybody else's approach. :)

RLE used to be LZ-crunched anyways, so RLE compression ratio wasn't much of a concern.

How are you so sure this approach (generally?) packs worse than the escape-byte approach?

2024-04-22 20:41

tlr

Registered: Sep 2003
Posts: 1822

Quoting Krill

RLE used to be LZ-crunched anyways, so RLE compression ratio wasn't much of a concern.

Initially RLE seemed to have been used by itself.

Quoting Krill

How are you so sure this approach (generally?) packs worse than the escape-byte approach?

I'm thinking because only 128-ish runs are allowed + only 128-ish literal blocks. For a typical C64 program (demo?) there would be longish runs of empty space, interweaved with longish runs of code and data.

But maybe I'm assuming too much? Would be interesting to seem some practical measurements.

2024-04-23 02:08

chatGPZ

Registered: Dec 2001
Posts: 11540

Quote:

For a typical C64 program (demo?) there would be longish runs of empty space, interweaved with longish runs of code and data.

Thats why people used "linkers" before packing :) But even if not so, the difference shouldn't be more than a couple bytes on a typical program - and those bytes would form a nice sequence that can be crunched later.

2024-04-23 07:18

Fungus

Registered: Sep 2002
Posts: 756

Probably the most interesting Super-Zipper V8

2024-04-23 08:04

WVL

Registered: Mar 2002
Posts: 924

Quote: Quoting Raistlin
Isn’t it more common to use a bit to define “constant or not”?

$00-7F .. count of non-repeats ($01-$80)
$80-FF .. count of repeats ($04-$83)
Do you have any other examples of packers implementing this? I haven't seen that many.

I think the reason it's uncommon (on the C64) because it doesn't pack as well, and size was probably the main driver for the development of packers/crunchers. Decompression speed came much later.

Here is an example : https://csdb.dk/release/?id=34685.

2024-04-23 08:49

JackAsser

Registered: Jun 2002
Posts: 2038

Quote: Quoting JackAsser
Like Dinosours?
Dinasours? Which magical infinite cruncher are you referring to? =)

Check any Dinosours release and you'll get the point. :D

2024-04-23 09:34

Bitbreaker

Registered: Oct 2002
Posts: 510

Quoting Krill

RLE used to be LZ-crunched anyways, so RLE compression ratio wasn't much of a concern.

What is kind of hard to understand, as LZ includes RLE by design (match with offset 1)

2024-04-23 09:34

Krill

Registered: Apr 2002
Posts: 3103

Quoting JackAsser

Check any Dinosours release and you'll get the point. :D

I was aware that Dinasours pack everything 10 times for comedic effect,
but i thought maybe they've released the Airwolf Fixer of crunchers or something and i've missed that. :)

2024-04-23 09:37

Krill

Registered: Apr 2002
Posts: 3103

Quoting Bitbreaker

Quoting Krill

RLE used to be LZ-crunched anyways, so RLE compression ratio wasn't much of a concern.

What is kind of hard to understand, as LZ includes RLE by design (match with offset 1)

Native LZ crunchers used to require quite a bit of working memory, so the maximum size of input programs was often smaller than entirely uncompressed programs.
Thus RLE first, then LZ. Also this two-pass approach can make for better overall compression ratio.

2024-04-23 09:54

Jetboy

Registered: Jul 2006
Posts: 376

Quoting Krill

Thus RLE first, then LZ. Also this two-pass approach can make for better overall compression ratio.

Puls LZ ones were rather slow, so prepacking with RLE first could save substantial amounts of time. As shorter file to pack tends to pack quicker.

2024-04-23 10:06

enthusi

Registered: May 2004
Posts: 679

Just to add an alternative which is a bit special but cute:
Maniac Mansion and ZMK use some RLE depacker that resides in ZP and encodes most-common bytes, which I felt works rather well for graphics. I wrote something about that here:
https://www.pagetable.com/?p=603

A byte of < $40 = count of raw bytes following
$3f to < $80 represents the length of the run of the byte to follow (+$3f of course).
$7f to < $a0 is the number of runs of most common byte 1,
$9f to < $c0 and $bf to <$e0 and $df-$ff for common byte 2, 3 and 4 respectively.

2024-04-23 15:21

ChristopherJam

Registered: Aug 2004
Posts: 1427

ooh, I like that one enthusi.

Really interesting the historical divide between packers and crunchers, especially the way intros tended to be displayed between one and the other. It does indeed reduce the impact on runlength limits considerably. Not sure why I ever bothered trying to improve the ratio of my higher ratio crunchers for empty spaces now :P

2024-04-23 16:35

Martin Piper

Registered: Nov 2007
Posts: 740

In the old days, dictionary based compression would be quite slow and memory hungry. The usual way of speeding it up would be to limit the distance searched and limit the dictionary.

Using a RLE compression pass before a dictionary compression stage would reduce the total input data, it would reduce the strain on the dictionary, and would pass some benefit along to the dictionary based compression.

As dictionary algorithms improve, and also as time and memory becomes less of an issue, the dictionary size can be increased and this means there is less benefit to shrinking the input data with a RLE pre-pass.

2024-04-23 18:46

Fungus

Registered: Sep 2002
Posts: 756

Main reason for packing first was before crunchers had REU versions with 2mhz or 8mhz support they were slow as hell and took many hours. So smaller input files crunched way faster. Also crunching large files tended to bug out. They generally were smaller too, so it became the norm.

2024-05-29 11:39

Starfox

Registered: Jul 2014
Posts: 45

I researched a lot about RLE a year ago.

Forgot most of it, lol!

But you can preprocess the data before run-length encoding it, to make the RLE more successful.

There were a couple of schemes used, which were probably mostly optimized for text. Burrows-Wheeler Transform for instance.

I have my notes at home, let me see if there's anything useful.

2024-05-29 14:20

Martin Piper

Registered: Nov 2007
Posts: 740

Quote: I researched a lot about RLE a year ago.

Forgot most of it, lol!

But you can preprocess the data before run-length encoding it, to make the RLE more successful.

There were a couple of schemes used, which were probably mostly optimized for text. Burrows-Wheeler Transform for instance.

I have my notes at home, let me see if there's anything useful.

If I recall, reversing BWT encoded data needs a few sorts for each stored string. Which makes it quite expensive? Creating the BWT data also needs a sort, but I don't care about that so much.

2024-05-29 20:13

Bansai

Registered: Feb 2023
Posts: 54

Quoting Martin Piper

If I recall, reversing BWT encoded data needs a few sorts for each stored string. Which makes it quite expensive? Creating the BWT data also needs a sort, but I don't care about that so much.

https://marknelson.us/posts/1996/09/01/bwt.html

...BWT is off topic from RLE, but Nelson breaks compression and decompression up into phases that can be piped, so it's easy to move them around, omit them entirely, etc., to see what they do.

His "The Data Compression Book" was a decent reference at the time it was published: https://marknelson.us/pages/tdcb.html. It's a good intro text for those who want to understand various forms of data compression.

2024-05-30 20:15

Starfox

Registered: Jul 2014
Posts: 45

I found some links in my notes. Take them as they are.

It's mostly BWT though:

https://www.baeldung.com/cs/bwt-rle-compression-algorithm-for-s..

https://www.fileformat.info/mirror/egff/ch09_03.htm

https://www.youtube.com/watch?v=4n7NPk5lwbI

I had some other pdf's/videos with some other methods. If only I could find them... (same youtube channel)

Here's one "FM index": https://www.youtube.com/watch?v=kvVGj5V65io (It's also based on BWT).

I can't find a description of a scheme to encode the number of bytes in a row, I read, where you stored the number of equal bytes in bit representation so it could be any number. I'll put the link if I find it.

Think it is this Unary coding/elias in this video, where he explains it:

https://www.youtube.com/watch?v=3jKLjmV1bL8

ok last link about text transforms before encoding:
https://www.youtube.com/watch?v=Q2pinaj3i9Y

p.s sorry for messy post :)

2024-05-31 04:34

ChristopherJam

Registered: Aug 2004
Posts: 1427

Cheers for the links, Starfox!

This is getting a bit off topic (ironic given that this page was created to shift all the RLE talk away from another discussion) but yeah, I've been looking at FM-Index and BWT for implementing a native cruncher that doesn't require an REU just to get compression time down to something workable. There's been a lot of interesting research the past 20 years!

Have you done any work on implementing BWT in a c64 context?

2024-05-31 20:41

Burglar

Registered: Dec 2004
Posts: 1139

Quoting enthusi

Just to add an alternative which is a bit special but cute:
Maniac Mansion and ZMK use some RLE depacker that resides in ZP and encodes most-common bytes, which I felt works rather well for graphics. I wrote something about that here:
https://www.pagetable.com/?p=603

A byte of < $40 = count of raw bytes following
$3f to < $80 represents the length of the run of the byte to follow (+$3f of course).
$7f to < $a0 is the number of runs of most common byte 1,
$9f to < $c0 and $bf to <$e0 and $df-$ff for common byte 2, 3 and 4 respectively.

that is interesting indeed, also lovely article, even Ron Gilbert himself replied :)

not having looked at the unpacker, i'm assuming lotsa CMPs, so possibly custom ranges could generate great results for various inputs.

2024-06-07 19:32

ws

Registered: Apr 2012
Posts: 251

Hej,
i know, thin ice here, please excuse, but this is a real question.

ABSTRACT:
So if you have a longer sequence of bytes like
04 0f 02 0a 08 09 01 02 06 05 (->low nibbles)
or
80 c0 90 10 20 a0 f0 70 d0 b0 (->high nibbles)

where you could compress by just saying

crunch mode identifier byte $f1 for combining the low nibbles,
number of "stacked bytes" (10) $0a,
followed by cojoined nibbles, resulting in:
f1 0a 4f 2a 89 12 65

or

crunch mode identifier byte $f2 for combining the high nibbles,
number of "stacked bytes" (10) $0a
followed by cojoined nibbles, resulting in
f2 0a 8c 91 2a f7 dc

QUESTION:
is there a name for this compression method?
would there be a more efficient simple way to compress these (usually very random) numbers (preferrably to implement as a cruncher on a c64)?

my initial thought stems from the practice of crunching a petscii color map by 50% like this.

thanks in advance for helpful answers, and really sorry, i had no clue how to search for this certain usecase.

2024-06-07 21:03

tlr

Registered: Sep 2003
Posts: 1822

Quoting ws

QUESTION:
is there a name for this compression method?

shift + or?

Quoting ws

would there be a more efficient simple way to compress these (usually very random) numbers (preferrably to implement as a cruncher on a c64)?

If the numbers are indeed random, then your suggested method should be the most effective.

2024-06-07 21:16

ws

Registered: Apr 2012
Posts: 251

@tlr thanks!

2024-06-08 04:27

Martin Piper

Registered: Nov 2007
Posts: 740

Quote: Hej,
i know, thin ice here, please excuse, but this is a real question.

ABSTRACT:
So if you have a longer sequence of bytes like
04 0f 02 0a 08 09 01 02 06 05 (->low nibbles)
or
80 c0 90 10 20 a0 f0 70 d0 b0 (->high nibbles)

where you could compress by just saying

crunch mode identifier byte $f1 for combining the low nibbles,
number of "stacked bytes" (10) $0a,
followed by cojoined nibbles, resulting in:
f1 0a 4f 2a 89 12 65

or

crunch mode identifier byte $f2 for combining the high nibbles,
number of "stacked bytes" (10) $0a
followed by cojoined nibbles, resulting in
f2 0a 8c 91 2a f7 dc

QUESTION:
is there a name for this compression method?
would there be a more efficient simple way to compress these (usually very random) numbers (preferrably to implement as a cruncher on a c64)?

my initial thought stems from the practice of crunching a petscii color map by 50% like this.

thanks in advance for helpful answers, and really sorry, i had no clue how to search for this certain usecase.

Some freezers did this for the colour RAM.

2024-06-08 18:38

tlr

Registered: Sep 2003
Posts: 1822

Quote: Some freezers did this for the colour RAM.

You mean just packed nybbles?

2024-06-08 19:36

Fungus

Registered: Sep 2002
Posts: 756

That's what he means, and its a very very common technique.

Refresh

Subscribe to this thread: