About the disk format: the Amiga native disk format was just one huge sector per track. That is how the Amiga fitted 880K on the same disk MS-DOS filled with 720K and MacOS 800K.
One of the custom chip on the Amiga had hardware for teh track decoding. So the Amiga could read the disk while performing other operations, the data would come in via DMA (direct memopry access) of the co-processor.
(MacOS used variable-speed, that is how it fitted more sectors on outer tracks and achieved 800K.)
> One of the custom chip on the Amiga had hardware for teh track decoding
If memory serves correct, the raw data was pulled in from disk to memory DMA-style, but the actual decoding could be done either using the blitter or 68k.
True about performing operations whilst loading from disk though; I coded a trackloader demo that was loading the next part of the demo whilst the current part was ongoing, using the 68k to decode so that the blitter could be hammered for graphic effects!
The MFM encoding used 2 raw bits for each bit of data to be stored. As such, people figured how to use the Blitter to do just that. I'm not sure if the OS did it; it was probably not the case in the 1.x days.
I don't know either when it did what, but when Amigas got faster CPUs IIRC there were patches (or newer Kickstarts?) to move the decoding to the CPU because the blitter was slower than the CPU then.
>One of the custom chip on the Amiga had hardware for teh track decoding.
sadly no, it was just deserializing head magnetic flux data (encoded as pulses) to raw bytestream, you still had to compute actual data out of it. Whats worse Commodore shipped same, outdated even in 1987, Double Density chip for the whole life of Amiga.
Yeah, constant angular velocity (hub motor RPM is fixed) vs. constant linear velocity (linear speed over the disk is fixed). If memory serves, the 800K Mac and Commodore 1541 disks were actually Z-CLV or Zoned-CLV -- where the drive speed changes every "n" tracks. The reason you do ZCLV is that it reduces the complexity of the controller hardware (fewer taps on the clock divider).
The fun trick is, with a drive controller which works based on "time between pulses" (DiscFerret, Catweasel, etc) you can do all of the above with a CAV disk drive just by tweaking the read/write clock. Same trick probably works for discrete drive controllers like the 765 and 1771 (I say probably because I haven't checked the datasheet!).
It read full tracks at a time, but you can see from the doc there were 11/22 sectors with no inter sector gaps, but there are separate sectors of 512 bytes which are addressed in the file system structures.
That's just a clash of nomenclature between the logical and physical layout. It had logical sectors, but on disk, it was written as one contiguous sector.
Yes, it read the whole track, starting from wherever the head was, then the floppy device handler figured where each logical block was, based on the magic sync word $4489 (which is not supposed to be output by the default MFM encoding) and track/sector IDs embedded in the track's bitstream.
Not even "Not supposed to be" -- it can't be. $4489 decoded is a valid byte, but with one of the clock transitions 'blanked'.
If you decode $4489 via MFM then re-encode that byte, you'll get a 1-bit difference. This is why it works as a sync marker: even if you wrote that byte in the data area of the sector, it wouldn't encode the same way because of the missing clock :)
It's a common trick on radio systems too -- a short burst of data which can't be obtained through normal encoding processes (invalid FEC bits, flipped parity, etc).
I owned an Amiga 500 and at a certain point I bought for it a HDD (20 MBs, if I remember correctly) => then later I switched to an IBM-compatible (386sx with a HDD) and, subjectively, I thought that the HDD of the IBM-compatible was slower than the one I previously used on the Amiga => any HW reason why the Amiga-HDD might be faster than the IBM-HDD, or did I just imagine it?
It might have been a SCSI drive/controller, but it might have also been that you used FastFileSystem, which was pretty efficient.
One of the HD manufacturers of the time, either Conner or Maxtor, was known to use Amiga 3000 machines to test their SCSI disks, because of the CPU/chipset/OS/FS combination. Not many systems were able to extract 90%+ of theoretical throughput like the A3000 could.
The 590 had a SCSI controller with DMA, but only for the external SCSI connector. The internal drive was 8-but IDE and dog slow. So if it was faster, it must be chalked up to better OS, better filesystem and I would assume a 68000 at 7Mhz was faster than a 4.77Mhz 8088 at shuffling data around. Especially since the memory data bus was 16 bit wide for the Amiga but only 8 bit wide for the PC. (Right?!)
If they sold'em with XT-IDE inside, I was probably unfortunate to have on of those. Still beat swapping floppies. Thanks to the HDD I could compile C programs without switching floppies all the time. IIRC I had the compiler and the headers on one floppy and the linker etc on another. Eventually I made DICE (Matt Dillon yo!) fit on one floppy. Or was it an early Lattice C?? Can't remember.:-P By that time I had bought a 486SX for uni, so I reluctantly used Turbo C instead. Turbo C was just too good to pass up...
Part of it may also have been that even expansions that used non-SCSI drives tended to offload most of the processing from the CPU. My A2000 SCSI card had a Z80 on it, for example.
ORG directives were really just hints.
There was no operating system and no
linker/loader on Apple II: The developer
had to "somehow" manage to transfer the
instructions from floppy disc to the
intended location.
Well, no. .ORG directives had nothing to do with any of that. They were (and are) used to tell the assembler how to fix up absolute references such as JMPs, JSRs, and loads/stores to variables defined in the code.
As he points out, when building complex programs on the Apple II, you had to keep track of where to load modules yourself, precisely because there was no metadata in the object file. Nothing prevented you from loading a module compiled at one ORG at a different address... well, nothing except the fact that it would probably crash.
I think that you are misinterpreting that statement. The .ORG directive is indeed somewhat of a hint to the assembler (or rather, obviously, a directive) telling it which memory location it should assemble the code for. It is a hint in the sense that it doesn't necessarily reflect the actual location of the code at run-time, because it is separate from linking.
Typically you'd patch the relocatable routine before inserting it at the desired memory location, simply overwriting the location-dependent instructions like jsr and absolute jmp with new operands. This isn't very complex, especially if the relocator doesn't need to be position independent itself.
There are some factors that can simplify relocation: only allow relocation to page boundaries (i.e. $xx00 addresses) and possibly keeping the routines shorter than a page.
For example, here's a single page, page boundary constrained relocator for a routine with two patch points:
; x = page
relocate_routine
stx rel_sta+2
; patch
stx patch1+2
stx patch2+2
ldy #0
copy
lda routine_template,y
rel_sta:
sta $0000,y
iny
bne copy
rts
; some routine assembled for any origin $xx00
routine_template
bne continue
patch1
jmp done
continue
...
patch2
jsr print
...
done
rts
print
...
rts
If the relocator is not position independent then the solution is not completely relocateble. I don’t even know how you would get access to the program counter without trashing some memory.
Not completely relocatable, but in practice this means that you have some memory reserved at a specific point in memory for the relocator and the template.
The relocator can be made position independent by using indirect stores. You never need direct access to the program counter for this, but it gets a bit more complex. You don't need to read the program counter directly for this, but I think the simplest way of doing that is the jsr instruction, which puts the PC (plus some fixed offset) on the stack.
The idea is that instead of using a "template" and referring directly to that, you load the relocatable code where you want it, load that location in some zero page vector and in the relocator you do indirect+relative stores based off of that vector:
lda page ; the page that the routine was loaded to
sta $ff
lda #0
sta $fe
jsr relocate ; wherever you've loaded the relocator
...
; vector $fe-$ff = routine location, $fe must be 0
relocate
lda $ff
; load the lsbyte of the offset of the first patch in the routine into y
; in reality, would be calculated at the point of assembling the routine template
ldy #<2+patch1-routine_start
sta ($fe),y
ldy #<2+patch2-routine_start
sta ($fe),y
rts
As you can see, the relocate routine is now completely position independent. The relocated routine has the same constraints (has to be loaded to the beginning of a page). We use the y register to load the offset within a page and $ff as the page counter. It's also not much more complex to write the relocator for routines spanning multiple pages, but it comes with a bunch more considerations that I think would take much more code to demonstrate (e.g. cross page jumps and calls).
The problem is that the only way to read the program counter is through jsr or an interrupt and these both can’t do relative jumps.
If you start with that information and you allow patching you can of course relocate the code on the fly, but I wouldn’t really call that position independent code generation by a compiler. It’s more like you’re implementing your own linker.
> The problem is that the only way to read the program counter is through jsr or an interrupt and these both can’t do relative jumps.
No, that's not a problem in this case since you don't need to know the value of the program counter to relocate the routine. You only need to know the relative offsets of the patch points from the start of the routine, which are static. The relocator itself is position independent, though, since it never uses absolute or absolute indexed addressing. No jumps or subroutine calls, just zero page loads and vectored writes.
That said, if you want to know the program counter, you can use a tiny, position independent subroutine to save it:
savepc
; copy return address to $fe-$ff
pla
sta $fe
pla
sta $ff
pha
lda $fe
pha
; increment $fe-$ff by 1 so that it contains the
; address of the next instruction after the jsr
inc $fe
bne done
inc $ff
done
rts
...
jsr savepc
lda #0 ; $fe-$ff contains address of this instruction after savepc call
That has no use in this relocation technique, however.
> If you start with that information and you allow patching you can of course relocate the code on the fly, but I wouldn’t really call that position independent code generation by a compiler.
Neither would I, since it's assembled, but that was never the question, as far as I understood. Quoting the post you first responded to: "You could write 6502 code to be completely relocatable, but at a cost of space and a more complex assembler." I demonstrated that with an example of a position independent relocator that patches an otherwise position dependent routine so that it can be relocated to the start of any page.
And yes, if you've built the module that you need to relocate, you have the information required to patch it. You could even join the position independent relocator with the relocatable routine and have it patch itself out with a jump to the routine after relocation.
> It’s more like you’re implementing your own linker.
Maybe a small portion, but specifically, it's a position independent code relocator, exactly what was on topic.
EDIT: I think we may be confusing two different problems here. One is relocating a routine and producing position independent code. That is what I've demonstrated. The other is the problem of knowing the location at the routine at the call site.
In the simple case, the call site of the relocator typically knows what portion of memory needs to be position adjusted. That is what I have assumed so far.
In a more complex case, e.g. a position independent/relocated routine needs to call another position independent/relocated routine without knowing its location. In order to do that you can put the address of each subroutine in a table and use it as a trampoline. The routines calling into each other now only need to know the index of the routine they want to call in the table. The table itself can be located anywhere and be moved around freely.
call ; $aa-ab = table location, y = subroutine index
lda ($aa),y
sta jump+1
iny
lda ($aa),y
sta jump+2
dey
jump
jmp $ffff
...
ldy #11 ; index of PRINT routine
jsr call
This can also be made position independent via stack trickery or indirect jump. The relocator for each routine could be responsible for populating this table.
I don't think ability to address 64k means the CPU is 16 bit. The 6502 is a very 8 bit processor AFAIK. The main registers are 8 bit, most operations work on 8 bit registers.
I think it's best to go with the "external data bus size", which is 8-bit on the processors typically known as "8-bit processors" (6502, Z80, i8080, 6809 etc...). Those processors can read one 8-bit byte in one memory access. For instance, even though the Z80 has 16-bit registers, can load 16-bit values from memory (and even has some 16-bit arithmetic operations), a 16-bit memory access takes 2 separate memory accesses through the 8-bit data bus.
By that definition the 68000 and 8086 are 16-bit processor (which makes sense), however the stripped down variants 68008 and 8088 would be considered 8-bit processors... and this is where this simple model breaks down ;)
"bitness" is defined by the natural data size of common operations.
The Z80 is 8-bit as most of its internal operations work over this data size and its registered are 8-bit. It can perform some 16-bit operations over two registers, but this is less natural to the design (based upon the 8-bit 8080).
The 8088 is 16-bit as its registers are naturally 16-bit even though there are also ways to address them as 8-bit halved. While the distinction between registers that are 8-bit but can be paired (Z80) or 16-bit but can be halved (8086/8088) may seem arbitrary on its own but looking into the instructions available and how they perform makes which side a chip sits on more obvious.
How much can be read at once is not the right differentiator, as it is essentially an interface issue. The internals of the 8088 are the same as those of an 8086, apart from those at the interface which break the 16-bit requests into 8-bit ones, they just have to wait longer for the signal that data has arrived/left.
Another common one that can confuse is the 80386SX: these are 32 bit chips (essentially the same design as the original 80386, latterly referred to as 80386DX) with a 16-bit data bus (so 32-bit requests are split as seen with 16-bit ones with the 8088) and a 24-bit address bus (limiting them to 16Mb physical memory like 80286 chips). These limited bus sizes allowed them to be used on motherboard designs originally intended for older 80286 units.
Other confusions come from specialist instructions: floating point operations using larger (40- & 80-bit) special purpose registers, SIMD instructions (such as those in the MMX set, introduced on 32-bit Pentium lines, operated over 64-bit collections of smaller values), etc.
Sometimes it works the other way around: the original Pentium lines (up to the Pentium 4) were 32-bit chips even though they had a 64-bit data bus (this made loading data into the L1 cache faster, rather than marking the chip out as 64-bit overall) and some 64-bit instructions.
I think when the Atari Jaguar was released they used the same kind of logic to market it as a 64-bit system since technically some part of it was using 64-bit processing, but the CPUs were still 32-bit.
It's just nitpicking though, I thoroughly enjoy Fabien's writing and I can highly recommend the couple of books he's written about reverse engineering Id Software games. On HN people seem to confuse "hacking" with "growth hacking" a lot of times, and Fabien's writing is thankfully on the right side of that fence.
The 68000 CPU has a 32-bit address space [1], 32-bit internal data bus, and 32-bit registers, but a 16-bit external data bus and 16-bit ALU, so it's sometimes seen as a bit of a hybrid and sometimes referred to as a 16/32 bit CPU.
[1] but only 24 address lines, which led to "interesting" bugs when people had written code that used the top byte of addresses for other purposes and their code was then run on a 68020 or higher.
One of the custom chip on the Amiga had hardware for teh track decoding. So the Amiga could read the disk while performing other operations, the data would come in via DMA (direct memopry access) of the co-processor.
(MacOS used variable-speed, that is how it fitted more sectors on outer tracks and achieved 800K.)