FPGAs: Programming Hardware with Code (Yes, Really)

What Is an FPGA, Actually?

You know how a CPU runs your code by executing instructions one at a time? An FPGA does something completely different — it becomes the hardware your code describes.

An FPGA (Field-Programmable Gate Array) is a chip full of tiny configurable logic blocks connected by a programmable routing network. When you "program" an FPGA, you're not writing a program that runs on hardware — you're configuring the hardware itself.

The difference is profound:

CPU FPGA
Runs instructions sequentially Everything runs in parallel
Fixed hardware, flexible software The hardware IS your design
Great for general purpose Great for dedicated, high-speed tasks
3 GHz clock, but one thing at a time 100 MHz clock, but 10,000 things at once

Think of a CPU as a Swiss Army knife. An FPGA is a factory where you design your own tools.

Real uses: high-frequency trading (microsecond decisions), 5G base stations, video processing, Bitcoin mining, AI inference, radar systems. If it needs to be fast AND flexible, it's probably an FPGA.


The Mental Model That Changes Everything

Before writing a single line of code, internalize this:

HDL code describes hardware, not behavior.

When you write this in Verilog:

assign out = a & b;

You're not telling a processor to "AND a and b and store the result." You're literally describing a wire that is permanently connected through an AND gate. The moment your design is loaded onto the FPGA, that AND gate exists in silicon. Always. Forever (until you reprogram).

This is why FPGA programming feels weird at first — you're not writing an algorithm, you're drawing a circuit with text.


HDL: The Language of Hardware

There are two main Hardware Description Languages:

  • VHDL — verbose, strongly typed, favored in Europe and aerospace
  • Verilog — concise, C-like, favored in Silicon Valley

We'll use Verilog because it's more readable for beginners. Everything here also applies conceptually to VHDL.


Setup: Free Tools

You don't need expensive hardware to start. Here's a completely free setup:

Simulation (no hardware needed):

# Install Icarus Verilog (simulator) + GTKWave (waveform viewer)
# macOS
brew install icarus-verilog gtkwave

# Ubuntu
apt install iverilog gtkwave

Online alternative: EDA Playground — write and simulate Verilog in your browser, zero installation.

Cheap real hardware: The Basys 3 (~$150) or the iCEstick (~$25) are great starter boards.


Your First Module: Hello, AND Gate

In Verilog, the basic unit is a module — think of it as a component with input and output pins.

module and_gate (
    input  wire a,
    input  wire b,
    output wire out
);
    assign out = a & b;
endmodule

That's it. A two-input AND gate. Let's break it down:

  • module and_gate — name your component
  • input wire a, b — two input pins
  • output wire out — one output pin
  • assign out = a & b — the logic: output is a AND b
  • endmodule — done

assign creates a continuous assignment — it's not executed once, it's a permanent connection. Whenever a or b changes, out updates instantly.


Simulation: See It Work Without Hardware

Write a testbench — a Verilog file that drives your module with test signals:

// testbench.v
module testbench;
    // Declare test signals
    reg a, b;        // reg = we drive these
    wire out;        // wire = module drives this

    // Instantiate the module under test
    and_gate uut (
        .a(a),
        .b(b),
        .out(out)
    );

    // Apply test stimuli
    initial begin
        $dumpfile("waves.vcd");   // save waveforms
        $dumpvars(0, testbench);

        // Test all combinations
        a = 0; b = 0; #10;  // wait 10 time units
        a = 0; b = 1; #10;
        a = 1; b = 0; #10;
        a = 1; b = 1; #10;

        $display("Simulation complete!");
        $finish;
    end

    // Print whenever output changes
    initial begin
        $monitor("a=%b b=%b | out=%b", a, b, out);
    end
endmodule

Run it:

iverilog -o sim testbench.v and_gate.v
vvp sim

Output:

a=0 b=0 | out=0
a=0 b=1 | out=0
a=1 b=0 | out=0
a=1 b=1 | out=1
Simulation complete!

Open the waveform:

gtkwave waves.vcd

You'll see a visual timeline of all your signals. This is how hardware engineers debug — not with print statements, but with waveforms showing exactly how signals change over time.


The Clock: Heartbeat of Digital Logic

Almost everything interesting in FPGAs is synchronous — it happens on the edge of a clock signal. The clock is a square wave that alternates between 0 and 1 at a fixed frequency. On each rising edge (0→1), flip-flops capture their inputs.

This is the fundamental building block of sequential logic:

module d_flip_flop (
    input  wire clk,
    input  wire d,
    output reg  q
);
    always @(posedge clk) begin
        q <= d;
    end
endmodule
  • always @(posedge clk) — "whenever there's a rising clock edge, do this"
  • q <= dnon-blocking assignment: sample d and store it in q
  • reg q — a register (has memory, unlike wire)

This flip-flop remembers the value of d at each clock edge. It's the Verilog equivalent of a variable — but it only updates once per clock cycle.

Critical rule: In always @(posedge clk) blocks, always use <= (non-blocking). In combinational always @(*) blocks, use = (blocking). Mix them up and you'll get subtle bugs that take days to find.


Building a Counter

Let's build something actually useful — a 4-bit counter that counts from 0 to 15 and wraps around:

module counter (
    input  wire       clk,
    input  wire       reset,
    output reg  [3:0] count   // 4-bit output
);
    always @(posedge clk) begin
        if (reset) begin
            count <= 4'b0000;  // reset to 0
        end else begin
            count <= count + 1;  // increment
        end
    end
endmodule

New syntax:

  • [3:0] — a 4-bit bus (bits 3 down to 0)
  • 4'b0000 — a 4-bit binary literal (4 bits, binary, value 0000)
  • 4'd15 would be decimal 15, 4'hF would be hex F

Testbench for the counter:

module counter_tb;
    reg clk, reset;
    wire [3:0] count;

    counter uut (.clk(clk), .reset(reset), .count(count));

    // Generate clock: toggle every 5 time units
    always #5 clk = ~clk;

    initial begin
        clk = 0; reset = 1;
        #15 reset = 0;  // release reset after 15 units
        #200 $finish;
    end

    initial begin
        $monitor("time=%0t count=%d", $time, count);
    end
endmodule

The always #5 clk = ~clk line generates a clock with 10-unit period (5 high, 5 low) — 100 MHz if each unit is 1 ns.


Combinational vs Sequential: The Core Distinction

Everything in digital logic falls into one of two categories:

Combinational logic — output depends only on current inputs. No memory. No clock.

// Combinational: output is always inputs OR'd together
assign out = a | b | c;

// Or use always @(*) for more complex combinational logic
always @(*) begin
    case (sel)
        2'b00: out = a;
        2'b01: out = b;
        2'b10: out = c;
        default: out = 0;
    endcase
end

Sequential logic — output depends on inputs AND past state. Has memory. Uses clock.

// Sequential: output only changes on clock edge
always @(posedge clk) begin
    if (enable)
        stored_value <= new_value;
end

Most real designs are a mix: combinational logic computes new values, sequential logic stores them on each clock edge.


A Real Design: PWM Generator

Let's build something you'd actually use — a PWM (Pulse Width Modulation) generator. PWMs are used to control LED brightness, motor speed, servo position. The key idea: a signal that's ON 75% of the time looks 75% as bright to your eye.

module pwm (
    input  wire       clk,
    input  wire [7:0] duty,    // 0-255, duty cycle
    output reg        pwm_out
);
    reg [7:0] counter;

    always @(posedge clk) begin
        counter <= counter + 1;  // 8-bit counter wraps 0→255→0

        if (counter < duty)
            pwm_out <= 1;
        else
            pwm_out <= 0;
    end
endmodule

If duty = 128, the output is high for 128/256 = 50% of cycles. duty = 255? Always high. duty = 0? Always low.

Connect this to an LED on an FPGA board and you've got a dimmable light. Change duty dynamically and you can fade in/out. This is running at clock speed — potentially 100 million times per second — with zero CPU involvement.


Finite State Machines: Giving Your Hardware a Brain

Real hardware logic often needs to remember what it's doing — it has states. This is where FSMs (Finite State Machines) come in.

Let's build a simple traffic light controller:

module traffic_light (
    input  wire       clk,
    input  wire       reset,
    output reg  [2:0] lights   // [2]=red, [1]=yellow, [0]=green
);
    // State encoding
    localparam RED    = 2'd0;
    localparam GREEN  = 2'd1;
    localparam YELLOW = 2'd2;

    reg [1:0]  state;
    reg [31:0] timer;

    // State durations (in clock cycles)
    localparam RED_TIME    = 100;
    localparam GREEN_TIME  = 80;
    localparam YELLOW_TIME = 20;

    always @(posedge clk) begin
        if (reset) begin
            state <= RED;
            timer <= 0;
            lights <= 3'b100;  // red on
        end else begin
            timer <= timer + 1;

            case (state)
                RED: begin
                    lights <= 3'b100;
                    if (timer >= RED_TIME) begin
                        state <= GREEN;
                        timer <= 0;
                    end
                end

                GREEN: begin
                    lights <= 3'b001;
                    if (timer >= GREEN_TIME) begin
                        state <= YELLOW;
                        timer <= 0;
                    end
                end

                YELLOW: begin
                    lights <= 3'b010;
                    if (timer >= YELLOW_TIME) begin
                        state <= RED;
                        timer <= 0;
                    end
                end

                default: state <= RED;
            endcase
        end
    end
endmodule

This is the standard FSM pattern in Verilog:

  1. Define states as localparam constants
  2. Use a case statement to handle each state
  3. Transition to the next state when conditions are met
  4. Update outputs based on current state

FSMs appear everywhere in hardware: UART receivers, SPI controllers, memory arbiters, protocol handlers.


Timing: The Thing That Bites Everyone

Here's something that surprises every beginner: your logic has physical delay.

A signal takes a few nanoseconds to propagate through gates and wires. If your clock is faster than that propagation delay, your flip-flops will capture wrong values — and your design will fail in mysterious ways that only show up on real hardware, not simulation.

This is called a timing violation, and the tool that catches it is called static timing analysis.

The key concept: setup time — how much before the clock edge does data need to be stable?

Data must be stable here ↓
                          |←setup time→|
─────────────────────────────────────────────→ time
                                        ↑
                                   Clock edge

Modern FPGA synthesis tools (Vivado, Quartus) automatically analyze timing and tell you if your design meets timing. If it doesn't, you either:

  • Slow down your clock
  • Add pipeline registers (break long paths into shorter stages)
  • Restructure your logic

For beginners: start with a slow clock (1-10 MHz) and don't worry about timing until you're hitting performance limits.


Synthesis: From Code to Silicon

The journey from Verilog to running hardware:

Verilog/VHDL
     ↓
  Synthesis          (maps your code to LUTs and flip-flops)
     ↓
 Place & Route       (physically places logic blocks on the chip)
     ↓
Timing Analysis      (checks if everything is fast enough)
     ↓
Bitstream Generation (creates the binary file to program the FPGA)
     ↓
 Programming         (load the bitstream onto the chip)

The main tools:

Tool Vendor FPGAs
Vivado Xilinx/AMD Artix, Kintex, Virtex, Zynq
Quartus Intel Cyclone, Arria, Stratix
iCEcube2 / nextpnr Lattice iCE40, ECP5

For open source toolchains (great for learning), the iCE40 family is fully supported by the IceStorm project.


The Big Gotchas

Gotcha #1: Everything is concurrent

In software, this runs top to bottom:

x = 5
x = x + 1  # x is now 6

In Verilog, these are independent concurrent assignments:

assign x = 5;        // one wire always = 5
assign x = x + 1;   // ILLEGAL: two drivers on same wire!

Gotcha #2: Non-blocking vs blocking assignments

Use <= in clocked blocks, = in combinational blocks. Always. No exceptions until you understand why.

Gotcha #3: Latches

If you write a combinational always block that doesn't assign a value in every case, you accidentally create a latch (memory that's not a flip-flop). Latches are timing nightmares.

// BAD: creates a latch (no else clause)
always @(*) begin
    if (enable)
        out = in;
    // What is out when enable=0? Latch!
end

// GOOD: explicit default
always @(*) begin
    if (enable)
        out = in;
    else
        out = 0;  // explicit
end

Gotcha #4: Simulation ≠ synthesis

Some Verilog is valid for simulation but can't be turned into actual hardware. #10 delay statements, for example, make no sense in synthesis. Stick to synthesizable constructs.


What to Build Next

Now you know enough to be dangerous. Here's a learning path:

Beginner projects:

  • 4-bit adder
  • Shift register
  • Debounce circuit (clean up noisy button presses)
  • 7-segment display driver

Intermediate projects:

  • UART transmitter/receiver (serial communication)
  • SPI or I2C controller
  • VGA signal generator (put pixels on a monitor!)
  • Simple CPU (yes, you can build one)

Advanced:

  • Pipelined designs
  • DDR memory controller
  • Video processing pipeline
  • Soft-core processor (MicroBlaze, RISC-V)

The Real Magic

Here's what makes FPGAs genuinely exciting:

A CPU doing video processing runs pixels through one at a time, 60 times a second. An FPGA can process every pixel simultaneously, every frame, with deterministic timing measured in nanoseconds.

A CPU executing a neural network runs multiplications sequentially. An FPGA can have 1,000 multipliers running in parallel, custom-fitted to your exact model dimensions.

This isn't magic — it's parallelism at the hardware level. You're not writing programs anymore. You're designing machines.

And now you know how.


Quick Reference Card

// Module skeleton
module name (input wire a, output reg b);
endmodule

// Continuous assignment (combinational)
assign out = a & b;

// Clocked block (sequential)
always @(posedge clk) begin
    q <= d;  // non-blocking!
end

// Combinational block
always @(*) begin
    case (sel)
        2'b00: out = a;
        default: out = 0;
    endcase
end

// Bus declarations
wire [7:0]  byte_wire;     // 8-bit wire
reg  [15:0] word_reg;      // 16-bit register
reg  [0:0]  bit_reg;       // 1-bit register (same as reg bit_reg)

// Literals
4'b1010    // 4-bit binary 1010
8'hFF      // 8-bit hex FF = 255
10'd100    // 10-bit decimal 100

// Constants
localparam MY_CONST = 42;

Now go build something. The hardware is waiting. ⚡

Read more

伊斯法罕的石榴

壹·两个名字 她有两个名字。 一个叫莎赫拉(شهلا),波斯语,意为"黑眸"。这是母亲给的,在伊斯法罕朱法区的犹太会堂里,拉比念诵祝祷词时,母亲抱着她,对父亲说:"你看她的眼睛,像石榴籽一样黑。" 另一个叫希拉(שִׁירָה),希伯来语,意为"诗歌"。这是她到特拉维夫的第三天,在移民局的铁皮桌前自己选的。办事员问她要不要改一个希伯来名字,她想了想,说:"希拉。" 办事员没问为什么。 她也没解释。 只有她自己知道——母亲在伊斯法罕的庭院里,总爱低声哼一首波斯古诗。哈菲兹的。调子缠绵,像扎因代河的水,绕过三十三孔桥,流远了,还能听见。 她想留住那个调子。 所以选了"

By yuki

慕容复国记

一则燕祚再兴的架空笑谈,兼致某个地中海沿岸的平行宇宙 引子 天下苦慕容复久矣。 不是苦他作恶,是苦他执念。 想那姑苏慕容氏,自五胡乱华以降,前燕、后燕、南燕、西燕,起起落落,兴亡如翻饼。到得慕容复这一代,手中既无一城一地,帐下亦无一兵一卒,偏偏还揣着一颗滚烫的复国心,逢人便谈"兴复大燕",说得旁人尴尬,自己倒慷慨激昂。 江湖人士评曰:"此人武功尚可,唯患复国癫。" 段誉叹息,王语嫣摇头,包不同欲言又止,邓百川默默饮酒。 然而—— 诸君且住。 倘若天道好还,气运轮转,慕容复当真复了国呢? 这故事,便从一封密信说起。 第一章:应许之地 宋哲宗元祐六年,暮春。 慕容复独坐燕子坞琴房,案上摊着一幅泛黄的舆图——大燕故疆,自辽东至河北,横亘千里。他用朱笔圈了又圈,

By yuki

燕王本纪外传·慕容垂复国记

太元九年,秦师败绩于淝水。 天下哗然。 苻坚大帝——那位曾以百万之众投鞭断流、豪言"我之兵力,投鞭于江,足断其流"的旷世英雄——率师南下,结果被谢玄那帮北府兵打了个落花流水,仓皇北顾,连皇帝袍子都险些留在淮南。 慕容垂在军中听闻此讯,面不改色,心中却已盘算好了第七十二套方案。 垂字道明,小字阿六敦,慕容氏之麒麟子也。 此公前半生,堪称中古史上最高规格的受气包—— 二十岁,以少胜多,大破宇文部,班师献捷,群臣妒之。 三十岁,大破段部,威震辽东,太后慕容可足浑氏怒之。 四十岁,以五千破晋军三万,时论以为"慕容垂一出,诸将皆废",于是诸将联名弹劾之。 兄长慕容评——那位以斗量金、以车载银、聚财无算的大司马——尤恨之入骨,欲借刀杀之。 垂乃叹曰: "大丈夫立功天地之间,

By yuki

超导:电阻消失背后的量子秘密

引言 1911年,荷兰物理学家昂内斯(Heike Kamerlingh Onnes)在将汞冷却到4.2K(约-269°C)时,发现其电阻突然完全消失。这一现象被命名为超导(Superconductivity)。 电阻为零,意味着电流可以在超导体中永久流动而不损耗任何能量。这不是电阻"很小",而是精确为零——实验上已验证超导电流的衰减时间超过10万年。 超导不只是一个工程奇迹,更是量子力学在宏观尺度上最壮观的表现之一。理解超导,需要从量子力学的深层结构出发。 一、超导的基本现象 超导体有两个标志性特征,缺一不可: 1.1 零电阻 普通金属的电阻来自电子与晶格振动(声子)和杂质的碰撞。温度越低,碰撞越少,电阻越小——但在普通金属中,电阻只是趋近于零,永远不会精确为零。 超导体在转变温度(临界温度 $T_c$)以下,电阻精确为零。这不是量的差异,而是质的相变。

By yuki