FPGAs: Programming Hardware with Code (Yes, Really)
What Is an FPGA, Actually?
You know how a CPU runs your code by executing instructions one at a time? An FPGA does something completely different — it becomes the hardware your code describes.
An FPGA (Field-Programmable Gate Array) is a chip full of tiny configurable logic blocks connected by a programmable routing network. When you "program" an FPGA, you're not writing a program that runs on hardware — you're configuring the hardware itself.
The difference is profound:
| CPU | FPGA |
|---|---|
| Runs instructions sequentially | Everything runs in parallel |
| Fixed hardware, flexible software | The hardware IS your design |
| Great for general purpose | Great for dedicated, high-speed tasks |
| 3 GHz clock, but one thing at a time | 100 MHz clock, but 10,000 things at once |
Think of a CPU as a Swiss Army knife. An FPGA is a factory where you design your own tools.
Real uses: high-frequency trading (microsecond decisions), 5G base stations, video processing, Bitcoin mining, AI inference, radar systems. If it needs to be fast AND flexible, it's probably an FPGA.
The Mental Model That Changes Everything
Before writing a single line of code, internalize this:
HDL code describes hardware, not behavior.
When you write this in Verilog:
assign out = a & b;
You're not telling a processor to "AND a and b and store the result." You're literally describing a wire that is permanently connected through an AND gate. The moment your design is loaded onto the FPGA, that AND gate exists in silicon. Always. Forever (until you reprogram).
This is why FPGA programming feels weird at first — you're not writing an algorithm, you're drawing a circuit with text.
HDL: The Language of Hardware
There are two main Hardware Description Languages:
- VHDL — verbose, strongly typed, favored in Europe and aerospace
- Verilog — concise, C-like, favored in Silicon Valley
We'll use Verilog because it's more readable for beginners. Everything here also applies conceptually to VHDL.
Setup: Free Tools
You don't need expensive hardware to start. Here's a completely free setup:
Simulation (no hardware needed):
# Install Icarus Verilog (simulator) + GTKWave (waveform viewer)
# macOS
brew install icarus-verilog gtkwave
# Ubuntu
apt install iverilog gtkwave
Online alternative: EDA Playground — write and simulate Verilog in your browser, zero installation.
Cheap real hardware: The Basys 3 (~$150) or the iCEstick (~$25) are great starter boards.
Your First Module: Hello, AND Gate
In Verilog, the basic unit is a module — think of it as a component with input and output pins.
module and_gate (
input wire a,
input wire b,
output wire out
);
assign out = a & b;
endmodule
That's it. A two-input AND gate. Let's break it down:
module and_gate— name your componentinput wire a, b— two input pinsoutput wire out— one output pinassign out = a & b— the logic: output is a AND bendmodule— done
assign creates a continuous assignment — it's not executed once, it's a permanent connection. Whenever a or b changes, out updates instantly.
Simulation: See It Work Without Hardware
Write a testbench — a Verilog file that drives your module with test signals:
// testbench.v
module testbench;
// Declare test signals
reg a, b; // reg = we drive these
wire out; // wire = module drives this
// Instantiate the module under test
and_gate uut (
.a(a),
.b(b),
.out(out)
);
// Apply test stimuli
initial begin
$dumpfile("waves.vcd"); // save waveforms
$dumpvars(0, testbench);
// Test all combinations
a = 0; b = 0; #10; // wait 10 time units
a = 0; b = 1; #10;
a = 1; b = 0; #10;
a = 1; b = 1; #10;
$display("Simulation complete!");
$finish;
end
// Print whenever output changes
initial begin
$monitor("a=%b b=%b | out=%b", a, b, out);
end
endmodule
Run it:
iverilog -o sim testbench.v and_gate.v
vvp sim
Output:
a=0 b=0 | out=0
a=0 b=1 | out=0
a=1 b=0 | out=0
a=1 b=1 | out=1
Simulation complete!
Open the waveform:
gtkwave waves.vcd
You'll see a visual timeline of all your signals. This is how hardware engineers debug — not with print statements, but with waveforms showing exactly how signals change over time.
The Clock: Heartbeat of Digital Logic
Almost everything interesting in FPGAs is synchronous — it happens on the edge of a clock signal. The clock is a square wave that alternates between 0 and 1 at a fixed frequency. On each rising edge (0→1), flip-flops capture their inputs.
This is the fundamental building block of sequential logic:
module d_flip_flop (
input wire clk,
input wire d,
output reg q
);
always @(posedge clk) begin
q <= d;
end
endmodule
always @(posedge clk)— "whenever there's a rising clock edge, do this"q <= d— non-blocking assignment: sampledand store it inqreg q— a register (has memory, unlike wire)
This flip-flop remembers the value of d at each clock edge. It's the Verilog equivalent of a variable — but it only updates once per clock cycle.
Critical rule: In always @(posedge clk) blocks, always use <= (non-blocking). In combinational always @(*) blocks, use = (blocking). Mix them up and you'll get subtle bugs that take days to find.
Building a Counter
Let's build something actually useful — a 4-bit counter that counts from 0 to 15 and wraps around:
module counter (
input wire clk,
input wire reset,
output reg [3:0] count // 4-bit output
);
always @(posedge clk) begin
if (reset) begin
count <= 4'b0000; // reset to 0
end else begin
count <= count + 1; // increment
end
end
endmodule
New syntax:
[3:0]— a 4-bit bus (bits 3 down to 0)4'b0000— a 4-bit binary literal (4 bits, binary, value 0000)4'd15would be decimal 15,4'hFwould be hex F
Testbench for the counter:
module counter_tb;
reg clk, reset;
wire [3:0] count;
counter uut (.clk(clk), .reset(reset), .count(count));
// Generate clock: toggle every 5 time units
always #5 clk = ~clk;
initial begin
clk = 0; reset = 1;
#15 reset = 0; // release reset after 15 units
#200 $finish;
end
initial begin
$monitor("time=%0t count=%d", $time, count);
end
endmodule
The always #5 clk = ~clk line generates a clock with 10-unit period (5 high, 5 low) — 100 MHz if each unit is 1 ns.
Combinational vs Sequential: The Core Distinction
Everything in digital logic falls into one of two categories:
Combinational logic — output depends only on current inputs. No memory. No clock.
// Combinational: output is always inputs OR'd together
assign out = a | b | c;
// Or use always @(*) for more complex combinational logic
always @(*) begin
case (sel)
2'b00: out = a;
2'b01: out = b;
2'b10: out = c;
default: out = 0;
endcase
end
Sequential logic — output depends on inputs AND past state. Has memory. Uses clock.
// Sequential: output only changes on clock edge
always @(posedge clk) begin
if (enable)
stored_value <= new_value;
end
Most real designs are a mix: combinational logic computes new values, sequential logic stores them on each clock edge.
A Real Design: PWM Generator
Let's build something you'd actually use — a PWM (Pulse Width Modulation) generator. PWMs are used to control LED brightness, motor speed, servo position. The key idea: a signal that's ON 75% of the time looks 75% as bright to your eye.
module pwm (
input wire clk,
input wire [7:0] duty, // 0-255, duty cycle
output reg pwm_out
);
reg [7:0] counter;
always @(posedge clk) begin
counter <= counter + 1; // 8-bit counter wraps 0→255→0
if (counter < duty)
pwm_out <= 1;
else
pwm_out <= 0;
end
endmodule
If duty = 128, the output is high for 128/256 = 50% of cycles. duty = 255? Always high. duty = 0? Always low.
Connect this to an LED on an FPGA board and you've got a dimmable light. Change duty dynamically and you can fade in/out. This is running at clock speed — potentially 100 million times per second — with zero CPU involvement.
Finite State Machines: Giving Your Hardware a Brain
Real hardware logic often needs to remember what it's doing — it has states. This is where FSMs (Finite State Machines) come in.
Let's build a simple traffic light controller:
module traffic_light (
input wire clk,
input wire reset,
output reg [2:0] lights // [2]=red, [1]=yellow, [0]=green
);
// State encoding
localparam RED = 2'd0;
localparam GREEN = 2'd1;
localparam YELLOW = 2'd2;
reg [1:0] state;
reg [31:0] timer;
// State durations (in clock cycles)
localparam RED_TIME = 100;
localparam GREEN_TIME = 80;
localparam YELLOW_TIME = 20;
always @(posedge clk) begin
if (reset) begin
state <= RED;
timer <= 0;
lights <= 3'b100; // red on
end else begin
timer <= timer + 1;
case (state)
RED: begin
lights <= 3'b100;
if (timer >= RED_TIME) begin
state <= GREEN;
timer <= 0;
end
end
GREEN: begin
lights <= 3'b001;
if (timer >= GREEN_TIME) begin
state <= YELLOW;
timer <= 0;
end
end
YELLOW: begin
lights <= 3'b010;
if (timer >= YELLOW_TIME) begin
state <= RED;
timer <= 0;
end
end
default: state <= RED;
endcase
end
end
endmodule
This is the standard FSM pattern in Verilog:
- Define states as
localparamconstants - Use a
casestatement to handle each state - Transition to the next state when conditions are met
- Update outputs based on current state
FSMs appear everywhere in hardware: UART receivers, SPI controllers, memory arbiters, protocol handlers.
Timing: The Thing That Bites Everyone
Here's something that surprises every beginner: your logic has physical delay.
A signal takes a few nanoseconds to propagate through gates and wires. If your clock is faster than that propagation delay, your flip-flops will capture wrong values — and your design will fail in mysterious ways that only show up on real hardware, not simulation.
This is called a timing violation, and the tool that catches it is called static timing analysis.
The key concept: setup time — how much before the clock edge does data need to be stable?
Data must be stable here ↓
|←setup time→|
─────────────────────────────────────────────→ time
↑
Clock edge
Modern FPGA synthesis tools (Vivado, Quartus) automatically analyze timing and tell you if your design meets timing. If it doesn't, you either:
- Slow down your clock
- Add pipeline registers (break long paths into shorter stages)
- Restructure your logic
For beginners: start with a slow clock (1-10 MHz) and don't worry about timing until you're hitting performance limits.
Synthesis: From Code to Silicon
The journey from Verilog to running hardware:
Verilog/VHDL
↓
Synthesis (maps your code to LUTs and flip-flops)
↓
Place & Route (physically places logic blocks on the chip)
↓
Timing Analysis (checks if everything is fast enough)
↓
Bitstream Generation (creates the binary file to program the FPGA)
↓
Programming (load the bitstream onto the chip)
The main tools:
| Tool | Vendor | FPGAs |
|---|---|---|
| Vivado | Xilinx/AMD | Artix, Kintex, Virtex, Zynq |
| Quartus | Intel | Cyclone, Arria, Stratix |
| iCEcube2 / nextpnr | Lattice | iCE40, ECP5 |
For open source toolchains (great for learning), the iCE40 family is fully supported by the IceStorm project.
The Big Gotchas
Gotcha #1: Everything is concurrent
In software, this runs top to bottom:
x = 5
x = x + 1 # x is now 6
In Verilog, these are independent concurrent assignments:
assign x = 5; // one wire always = 5
assign x = x + 1; // ILLEGAL: two drivers on same wire!
Gotcha #2: Non-blocking vs blocking assignments
Use <= in clocked blocks, = in combinational blocks. Always. No exceptions until you understand why.
Gotcha #3: Latches
If you write a combinational always block that doesn't assign a value in every case, you accidentally create a latch (memory that's not a flip-flop). Latches are timing nightmares.
// BAD: creates a latch (no else clause)
always @(*) begin
if (enable)
out = in;
// What is out when enable=0? Latch!
end
// GOOD: explicit default
always @(*) begin
if (enable)
out = in;
else
out = 0; // explicit
end
Gotcha #4: Simulation ≠ synthesis
Some Verilog is valid for simulation but can't be turned into actual hardware. #10 delay statements, for example, make no sense in synthesis. Stick to synthesizable constructs.
What to Build Next
Now you know enough to be dangerous. Here's a learning path:
Beginner projects:
- 4-bit adder
- Shift register
- Debounce circuit (clean up noisy button presses)
- 7-segment display driver
Intermediate projects:
- UART transmitter/receiver (serial communication)
- SPI or I2C controller
- VGA signal generator (put pixels on a monitor!)
- Simple CPU (yes, you can build one)
Advanced:
- Pipelined designs
- DDR memory controller
- Video processing pipeline
- Soft-core processor (MicroBlaze, RISC-V)
The Real Magic
Here's what makes FPGAs genuinely exciting:
A CPU doing video processing runs pixels through one at a time, 60 times a second. An FPGA can process every pixel simultaneously, every frame, with deterministic timing measured in nanoseconds.
A CPU executing a neural network runs multiplications sequentially. An FPGA can have 1,000 multipliers running in parallel, custom-fitted to your exact model dimensions.
This isn't magic — it's parallelism at the hardware level. You're not writing programs anymore. You're designing machines.
And now you know how.
Quick Reference Card
// Module skeleton
module name (input wire a, output reg b);
endmodule
// Continuous assignment (combinational)
assign out = a & b;
// Clocked block (sequential)
always @(posedge clk) begin
q <= d; // non-blocking!
end
// Combinational block
always @(*) begin
case (sel)
2'b00: out = a;
default: out = 0;
endcase
end
// Bus declarations
wire [7:0] byte_wire; // 8-bit wire
reg [15:0] word_reg; // 16-bit register
reg [0:0] bit_reg; // 1-bit register (same as reg bit_reg)
// Literals
4'b1010 // 4-bit binary 1010
8'hFF // 8-bit hex FF = 255
10'd100 // 10-bit decimal 100
// Constants
localparam MY_CONST = 42;
Now go build something. The hardware is waiting. ⚡