Compare commits

...

10 Commits

Author SHA1 Message Date
Nikolay Puzanov
9283c009b4 Add iverilog results 2023-06-17 16:10:54 +03:00
Nikolay Puzanov
d8b140e939 Return error code if vvp break's with ctrl-c 2023-06-17 16:09:34 +03:00
Nikolay Puzanov
a68a7b4790 Fix default block size 2023-06-17 14:36:52 +03:00
Nikolay Puzanov
142fb46b2f Rewrite run.sh script 2023-06-17 10:56:00 +03:00
Nikolay Puzanov
0cb0d82998 Update results 2023-06-16 22:15:52 +03:00
Nikolay Puzanov
7afbe06799 Set block size to 1kB. Block size is set via plusarg +dlen=NNN 2023-06-15 23:08:50 +03:00
Nikolay Puzanov
7e96777b89 Add multiple bench directory selection 2023-06-15 17:47:42 +03:00
Nikolay Puzanov
923a08b9a8 Fix firmware 2023-06-15 17:44:45 +03:00
Nikolay Puzanov
047bd9c42b Change benchmark to calc MD5 on 1024 softcores 2023-06-15 17:40:46 +03:00
Nikolay Puzanov
89c82cb611 Cosmetic changes 2023-06-13 16:36:29 +03:00
36 changed files with 1688 additions and 979 deletions

View File

@ -1,8 +1,10 @@
# Простой бенчмарк HDL симуляторов (преранняя версия)
# Простой бенчмарк HDL симуляторов (версия альфа)
Для оценки скорости запускается симуляция софт-процессора
[PicoRV32](https://github.com/YosysHQ/picorv32) с программой вычисления первых 200
знаков числа Пи.
Для оценки скорости запускается симуляция 1024 софт-процессоров
[PicoRV32](https://github.com/YosysHQ/picorv32) с программой вычисления хэш-суммы MD5
от блока 1кБ. Данные в каждом блоке инициализируются разными значениями. Размер блока
по-усолчанию равен 1кБ, но с помощью параметра `+dlen=NNN` можно установить
произвольный размер.
В папке `source` находятся исходники RTL и программы. Верхний модуль - `testbench` с
единственным входным сигналом `clock`. Генерация клока во внешнем модуле сделана для
@ -12,51 +14,48 @@
симуляторе. Скрипты называются `__build.sh` (для сборки проекта) и `__run.sh` (для
запуска симуляции).
Скрипт `run.sh` запускает бенчмарк на всех симуляторах и сохраняет время исполнения в
файл `results.txt`. Можно запустить бунчмарк на одном симуляторе, для чего в
параметрах скрипта `run.sh` нужно указать папку с бенчмарком.
Скрипт `run.sh` запускает бенчмарк из выбранной папки или все тесты. В параметрах
можно указать количество софт-ядер, размер блока, количество потоков симуляции (пока
только для верилятора) и список бенчмарков:
## Результаты для 50 знаков Пи
```
$ ./run.sh -h
Usage: ./run.sh [OPTION]... [SIM...]
Run simulator benchmark. Calculates MD5 hash from a block data
on an array of soft-cores PicoRV32.
Options:
-c [COUNT] Soft CPU count in simulation. Default: 1024
-s [SIZE] Data block size in bytes. Default: 1024 bytes
-t [COUNT] Simulation threads count. Default: 1
(so far only for Verilator)
-l List of available benchmarks
-h This help
The SIM parameter is the name of the simulator from the list of
option -l. If the parameter is not specified, benchmarks for all
simulators will be performed. Be careful, some simulators take
a very long time to benchmark.
```
## Результаты для 1024 процессоров
- Xeon E5-2630v3 @ 2.40GHz
- Verilator 5.011 devel rev v5.010-98-g15f8ebc56
- Icarus Verilog 13.0 (devel) (s20221226-127-gdeeac2edf)
- ModelSim SE-64 2020.4 (Revision: 2020.10)
- QuestaSim 64 2021.1 (Revision: 2021.1)
- Vivado 2021.1
Время в миллисекундах:
Время выполнения бенчмарка на блоке 1кБ (чч:мм:сс):
```
test-iverilog: 210540
test-modelsim: 25555
test-verilator: 1289
| Симулятор | Build | Run |
+-----------------------+----------+----------+
| Icarus Verilog | 00:00:27 | 19:04:37 |
| ModelSim | 00:00:00 | 01:33:14 |
| QuestaSim | 00:00:00 | 01:29:38 |
| Verilator (1 thread) | 00:12:03 | 00:02:57 |
| Verilator (8 threads) | 00:18:45 | 00:01:33 |
| XSIM | 00:00:29 | 02:08:54 |
| Xcelium | TBD | |
```
## Результаты для 200 знаков Пи
Вычисление 200 знаков на Icarus Verilog занимает непозволительно много времени, по
этому перед запуском всех бенчмарков рекомендую переименовать папку `test-iverilog` в
`notest-iverilog`.
Результаты для 200 знаков на том же процессоре:
```
test-iverilog: 3257116
test-xsim: 938296
test-modelsim: 359562
test-verilator: 20816
```
## Предварительные результаты по симуляторам "Big 3"
Коллеги прогнали бенчмарк на Xcelium, VCS и Modelsim. Примерные оценки показали
следующие результаты (приведено к скорости Xcelium):
```
test-verilator: 0.35
test-xcelium: 1
test-vcs: 1.37
test-modelsim: 5.95
test-xsim: 15.5
test-iverilog: 58
```
Конечно, нужно учитывать то, что Verilator - это cycle-accurate симулятор, и что он
не поддерживает состояния X и Z.

206
run.sh
View File

@ -1,45 +1,181 @@
#!/usr/bin/env bash
set -e
BUILD=__build.sh
RUN=__run.sh
## Default valies
CPU_COUNT=1024
BLOCK_SIZE=1024
THREADS=1
if [ -n "$1" ]
BLD_SCRIPT="__build.sh"
RUN_SCRIPT="__run.sh"
TEST_DIR_PREFIX="test-"
LOG_PREFIX="####"
function sim_dir_valid()
{
if [ -e "$1/$BLD_SCRIPT" ] && [ -e "$1/$RUN_SCRIPT" ]
then
return 0
else
return 1
fi
}
function sim_list()
{
for dir in "$TEST_DIR_PREFIX"*
do
if sim_dir_valid "$dir"
then
echo "${dir:5}"
fi
done
}
function print_help()
{
echo "Usage: $0 [OPTION]... [SIM...]"
echo "Run simulator benchmark. Calculates MD5 hash from a block data"
echo "on an array of soft-cores PicoRV32."
echo
echo "Options:"
echo " -c [COUNT] Soft CPU count in simulation. Default: 1024"
echo " -s [SIZE] Data block size in bytes. Default: 1024 bytes"
echo " -t [COUNT] Simulation threads count. Default: 1"
echo " (so far only for Verilator)"
echo " -l List of available benchmarks"
echo " -h This help"
echo
echo "The SIM parameter is the name of the simulator from the list of"
echo "option -l. If the parameter is not specified, benchmarks for all"
echo "simulators will be performed. Be careful, some simulators take "
echo "a very long time to benchmark."
echo
}
function check_arg(){
if [[ $2 == -* ]]
then
echo "Option $1 requires an argument" >&2
exit 1
fi
}
function parse_param()
{
while getopts ":c:s:t:lh" opt
do
case $opt in
c)
check_arg "-c" "$OPTARG"
CPU_COUNT=$OPTARG
;;
s)
check_arg "-s" "$OPTARG"
BLOCK_SIZE=$OPTARG
;;
t)
check_arg "-t" "$OPTARG"
THREADS=$OPTARG
;;
l)
sim_list
exit 0
;;
h)
print_help
exit 0
;;
\?)
echo "Invalid option: -$OPTARG" >&2
print_help
exit 1
;;
:)
echo "Option -$OPTARG requires an argument" >&2
exit 1
;;
esac
done
}
function log()
{
echo -n "$LOG_PREFIX "
echo "$@"
}
function run_benchmark()
{
benchmark=$1
dir=$TEST_DIR_PREFIX$benchmark
if sim_dir_valid "$dir"
then
local t0 t1 ms
if cd "$dir"
then
# Build
log "Build $benchmark"
t0=$(date +%s%N | cut -b1-13)
if ! ./$BLD_SCRIPT "$CPU_COUNT" "$BLOCK_SIZE" "$THREADS"
then
cd ..
log "Build $benchmark FAILED"
return 1
fi
t1=$(date +%s%N | cut -b1-13)
ms=$((t1 - t0))
log "Build $benchmark time (ms): $ms"
echo
# Run
log "Run $benchmark"
t0=$(date +%s%N | cut -b1-13)
if ! ./$RUN_SCRIPT "$CPU_COUNT" "$BLOCK_SIZE" "$THREADS"
then
cd ..
log "RUN $benchmark FAILED"
return 1
fi
t1=$(date +%s%N | cut -b1-13)
ms=$((t1 - t0))
log "Run $benchmark time (ms): $ms"
cd ..
return 0
else
log "Can't change dir to $dir"
return 1
fi
else
log "No run scripts found in $dir"
return 1
fi
}
parse_param "$@"
shift $((OPTIND - 1))
if [ $# -gt 0 ]
then
tests=$1
benches="$*"
else
tests=$(ls -1d test-*)
for b in $(sim_list); do benches="$benches$b "; done
fi
echo >> results.txt
echo "---------- Simulator's benchmark -----------" >> results.txt
echo $(date) >> results.txt
echo >> results.txt
log "Soft-cores count: $CPU_COUNT"
log "Block size: $BLOCK_SIZE"
log "Threads count: $THREADS"
log "Benchmarks: $benches"
for test_dir in $tests
for bench in $benches
do
if [ ! -d "$test_dir" ]
then
echo "Directory $test_dit is not exists. Break"
exit -1
fi
if [ -e $test_dir/$BUILD -a -e $test_dir/$RUN ]
then
echo "#### Run benchmark in $test_dir"
cd $test_dir
./$BUILD
start_ms=$(date +%s%N | cut -b1-13)
./$RUN
stop_ms=$(date +%s%N | cut -b1-13)
cd ..
ms=$(expr $stop_ms - $start_ms)
echo "#### $test_dir: $ms milliseconds"
echo
echo "$test_dir: $ms" >> results.txt
else
echo "Skip $test_dir directory"
fi
echo
run_benchmark "$bench"
done

11
scripts/sim_vars.sh Normal file
View File

@ -0,0 +1,11 @@
if [ $# -lt 3 ]
then
echo "Usage: $0 <CPU_COUNT> <BLOCK_SIZE> <THREADS_COUNT"
exit -1
fi
CPU_COUNT=$1
BLOCK_SIZE=$2
THREADS=$3
FFILE=../source/sources.f

View File

@ -2,11 +2,13 @@
// Slaves address ranges:
// 0 - 0x00000000-0x0000ffff
// 1 - 0x01000000-0x01000fff
// 1 - 0x00010000-0x0001ffff
// 2 - 0x01000000-0x01000fff
// i_slave_rdata bits:
// 0: i_slave_rdata[31:0]
// 1: i_slave_rdata[63:32]
// 2: i_slave_rdata[95:64]
module bus_mux
(input wire clock,
@ -23,32 +25,38 @@ module bus_mux
output wire o_ready,
// Slaves interface
input wire [63:0] i_slave_rdata,
output wire [1:0] o_slave_valid,
input wire [1:0] i_slave_ready);
input wire [95:0] i_slave_rdata,
output wire [2:0] o_slave_valid,
input wire [2:0] i_slave_ready);
wire [1:0] selector;
reg [1:0] selector_reg;
wire [2:0] selector;
reg [2:0] selector_reg;
always @(posedge clock)
if (reset)
selector_reg <= 2'd0;
selector_reg <= 3'd0;
else
if (!i_valid)
selector_reg <= selector;
assign selector[0] =
i_la_addr[16] == 1'b0 &&
i_la_addr[24] == 1'b0;
assign selector[1] =
i_la_addr[16] == 1'b1 &&
i_la_addr[24] == 1'b0;
assign selector[2] =
i_la_addr[24] == 1'b1;
assign o_slave_valid = selector_reg & {2{i_valid}};
assign o_slave_valid = selector_reg & {3{i_valid}};
assign o_ready = |(i_slave_ready & selector_reg);
assign o_rdata =
(i_slave_rdata[31:0] & {32{selector_reg[0]}}) |
(i_slave_rdata[63:32] & {32{selector_reg[1]}});
(i_slave_rdata[63:32] & {32{selector_reg[1]}}) |
(i_slave_rdata[95:64] & {32{selector_reg[2]}});
`ifdef FORMAL
@ -57,7 +65,7 @@ module bus_mux
ones = 0;
// Check for selector is zero or one-hot value
for (n = 0; n < 2; n = n + 1)
for (n = 0; n < 3; n = n + 1)
if (selector[n] == 1'b1)
ones = ones + 1;
@ -66,27 +74,34 @@ module bus_mux
// Check for correct address ranges decode
if (i_la_addr >= 32'h0 && i_la_addr <= 32'hffff)
assert(selector[0] == 1'b1);
if (i_la_addr >= 32'h1000000 && i_la_addr <= 32'h1000fff)
if (i_la_addr >= 32'h10000 && i_la_addr <= 32'h1ffff)
assert(selector[1] == 1'b1);
if (i_la_addr >= 32'h1000000 && i_la_addr <= 32'h1000fff)
assert(selector[2] == 1'b1);
end
// Check multiplexer
always @(*) begin : formal_mux
case (selector_reg)
2'b01: begin
3'b001: begin
assert(o_rdata == i_slave_rdata[31:0]);
assert(o_ready == i_slave_ready[0]);
assert(o_slave_valid[0] == i_valid);
end
2'b10: begin
3'b010: begin
assert(o_rdata == i_slave_rdata[63:32]);
assert(o_ready == i_slave_ready[1]);
assert(o_slave_valid[1] == i_valid);
end
2'b00: begin
3'b100: begin
assert(o_rdata == i_slave_rdata[95:64]);
assert(o_ready == i_slave_ready[2]);
assert(o_slave_valid[2] == i_valid);
end
3'b000: begin
assert(o_rdata == 32'd0);
assert(o_ready == 1'b0);
assert(o_slave_valid == 2'd0);
assert(o_slave_valid == 3'd0);
end
endcase
end

View File

@ -2,3 +2,4 @@
*.bin
*.map
*.asm
compile_flags.txt

View File

@ -1,21 +1,12 @@
PROJECT := fw
SOURCES := crt0.s main.c uprintf.c
SOURCES := crt0.s main.c uprintf.c md5.c
CPU_RAM_REG := ram_reg
ARCH := riscv32-none-elf
CFLAGS := -O2 -Wall -march=rv32i -mabi=ilp32 -mstrict-align \
-nostartfiles \
-ffunction-sections -lgcc \
-nostartfiles -ffunction-sections -lgcc \
-Wl,-Tpicorv32-minimal.ld,-static,-Map,$(PROJECT).map
# CFLAGS := -O3 -Wall -march=rv32i -mabi=ilp32 -mstrict-align \
# -nostartfiles \
# -ffunction-sections \
# -ffreestanding -lgcc \
# -Wl,-T,picorv32-minimal.ld,-static,-Map,$(PROJECT).map
# -nostdlib
ELF = $(PROJECT).elf
BIN = $(PROJECT).bin
ASM = $(PROJECT).asm

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,5 @@
#include "../io_reg.h"
#include "md5.h"
#include "uprintf.h"
#include <stdint.h>
@ -8,75 +9,21 @@ void put_char(char c)
IO_REG_CONSOLE = c | IO_REG_CONSOLE_SEND;
}
#define N 200
#define CHUNK 4
#define ARR_LEN (10 * N / 3 + 1)
static int arr[ARR_LEN];
void print_digit(int d)
int main(void)
{
static int cnt = 0;
uint8_t result[16];
uint8_t *daddr;
uint32_t dlen;
p("%d", d);
cnt++;
daddr = (uint8_t *)IO_REG_DATA_ADDR;
dlen = IO_REG_DATA_LEN;
if (cnt == CHUNK) {
p("\n");
cnt = 0;
}
}
md5Buf(daddr, dlen, result);
/* See: https://en.wikipedia.org/wiki/Spigot_algorithm */
int main()
{
p("\nComputation of %d first digits of PI\n", N);
for (int i = 0; i < ARR_LEN; i++)
arr[i] = 2;
int nines = 0;
int predigit = 0;
for (int j = 1; j < N + 1; j++) {
int q = 0;
for (int i = ARR_LEN; i > 0; i--) {
int x = 10 * arr[i - 1] + q * i;
arr[i - 1] = x % (2 * i - 1);
q = x / (2 * i - 1);
}
arr[0] = q % 10;
q = q / 10;
if (9 == q)
nines++;
else if (10 == q) {
print_digit(predigit + 1);
for (int k = 0; k < nines; k++)
print_digit(0);
predigit = 0;
nines = 0;
}
else {
print_digit(predigit);
predigit = q;
if (0 != nines) {
for (int k = 0; k < nines; k++)
print_digit(9);
nines = 0;
}
}
}
p("%d", predigit);
p("\nDONE\n");
IO_REG_MD5_OUT0 = *(uint32_t *)(result + 0);
IO_REG_MD5_OUT1 = *(uint32_t *)(result + 4);
IO_REG_MD5_OUT2 = *(uint32_t *)(result + 8);
IO_REG_MD5_OUT3 = *(uint32_t *)(result + 12);
/* Stop simulation */
IO_REG_CTRL = IO_REG_CTRL_STOP;

210
source/firmware/md5.c Normal file
View File

@ -0,0 +1,210 @@
/*
* Derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm
* and modified slightly to be functionally identical but condensed into control
* structures.
*/
#include "md5.h"
#include <stdint.h>
/*
* Constants defined by the MD5 algorithm
*/
#define A 0x67452301
#define B 0xefcdab89
#define C 0x98badcfe
#define D 0x10325476
static uint32_t S[] = {7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7,
12, 17, 22, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9,
14, 20, 5, 9, 14, 20, 4, 11, 16, 23, 4, 11, 16,
23, 4, 11, 16, 23, 4, 11, 16, 23, 6, 10, 15, 21,
6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21};
static uint32_t K[] = {
0xd76aa478, 0xe8c7b756, 0x242070db, 0xc1bdceee, 0xf57c0faf, 0x4787c62a,
0xa8304613, 0xfd469501, 0x698098d8, 0x8b44f7af, 0xffff5bb1, 0x895cd7be,
0x6b901122, 0xfd987193, 0xa679438e, 0x49b40821, 0xf61e2562, 0xc040b340,
0x265e5a51, 0xe9b6c7aa, 0xd62f105d, 0x02441453, 0xd8a1e681, 0xe7d3fbc8,
0x21e1cde6, 0xc33707d6, 0xf4d50d87, 0x455a14ed, 0xa9e3e905, 0xfcefa3f8,
0x676f02d9, 0x8d2a4c8a, 0xfffa3942, 0x8771f681, 0x6d9d6122, 0xfde5380c,
0xa4beea44, 0x4bdecfa9, 0xf6bb4b60, 0xbebfbc70, 0x289b7ec6, 0xeaa127fa,
0xd4ef3085, 0x04881d05, 0xd9d4d039, 0xe6db99e5, 0x1fa27cf8, 0xc4ac5665,
0xf4292244, 0x432aff97, 0xab9423a7, 0xfc93a039, 0x655b59c3, 0x8f0ccc92,
0xffeff47d, 0x85845dd1, 0x6fa87e4f, 0xfe2ce6e0, 0xa3014314, 0x4e0811a1,
0xf7537e82, 0xbd3af235, 0x2ad7d2bb, 0xeb86d391};
/*
* Padding used to make the size (in bits) of the input congruent to 448 mod 512
*/
static uint8_t PADDING[] = {
0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
/*
* Bit-manipulation functions defined by the MD5 algorithm
*/
#define F(X, Y, Z) ((X & Y) | (~X & Z))
#define G(X, Y, Z) ((X & Z) | (Y & ~Z))
#define H(X, Y, Z) (X ^ Y ^ Z)
#define I(X, Y, Z) (Y ^ (X | ~Z))
/*
* Rotates a 32-bit word left by n bits
*/
uint32_t rotateLeft(uint32_t x, uint32_t n)
{
return (x << n) | (x >> (32 - n));
}
/*
* Initialize a context
*/
void md5Init(MD5Context *ctx)
{
ctx->size = (uint64_t)0;
ctx->buffer[0] = (uint32_t)A;
ctx->buffer[1] = (uint32_t)B;
ctx->buffer[2] = (uint32_t)C;
ctx->buffer[3] = (uint32_t)D;
}
/*
* Add some amount of input to the context
*
* If the input fills out a block of 512 bits, apply the algorithm (md5Step)
* and save the result in the buffer. Also updates the overall size.
*/
void md5Update(MD5Context *ctx, uint8_t *input_buffer, size_t input_len)
{
uint32_t input[16];
unsigned int offset = ctx->size % 64;
ctx->size += (uint64_t)input_len;
// Copy each byte in input_buffer into the next space in our context input
for (unsigned int i = 0; i < input_len; ++i) {
ctx->input[offset++] = (uint8_t) * (input_buffer + i);
// If we've filled our context input, copy it into our local array input
// then reset the offset to 0 and fill in a new buffer.
// Every time we fill out a chunk, we run it through the algorithm
// to enable some back and forth between cpu and i/o
if (offset % 64 == 0) {
for (unsigned int j = 0; j < 16; ++j) {
// Convert to little-endian
// The local variable `input` our 512-bit chunk separated into
// 32-bit words we can use in calculations
input[j] = (uint32_t)(ctx->input[(j * 4) + 3]) << 24 |
(uint32_t)(ctx->input[(j * 4) + 2]) << 16 |
(uint32_t)(ctx->input[(j * 4) + 1]) << 8 |
(uint32_t)(ctx->input[(j * 4)]);
}
md5Step(ctx->buffer, input);
offset = 0;
}
}
}
/*
* Pad the current input to get to 448 bytes, append the size in bits to the
* very end, and save the result of the final iteration into digest.
*/
void md5Finalize(MD5Context *ctx)
{
uint32_t input[16];
unsigned int offset = ctx->size % 64;
unsigned int padding_length =
offset < 56 ? 56 - offset : (56 + 64) - offset;
// Fill in the padding and undo the changes to size that resulted from the
// update
md5Update(ctx, PADDING, padding_length);
ctx->size -= (uint64_t)padding_length;
// Do a final update (internal to this function)
// Last two 32-bit words are the two halves of the size (converted from
// bytes to bits)
for (unsigned int j = 0; j < 14; ++j) {
input[j] = (uint32_t)(ctx->input[(j * 4) + 3]) << 24 |
(uint32_t)(ctx->input[(j * 4) + 2]) << 16 |
(uint32_t)(ctx->input[(j * 4) + 1]) << 8 |
(uint32_t)(ctx->input[(j * 4)]);
}
input[14] = (uint32_t)(ctx->size * 8);
input[15] = (uint32_t)((ctx->size * 8) >> 32);
md5Step(ctx->buffer, input);
// Move the result into digest (convert from little-endian)
for (unsigned int i = 0; i < 4; ++i) {
ctx->digest[(i * 4) + 0] = (uint8_t)((ctx->buffer[i] & 0x000000FF));
ctx->digest[(i * 4) + 1] =
(uint8_t)((ctx->buffer[i] & 0x0000FF00) >> 8);
ctx->digest[(i * 4) + 2] =
(uint8_t)((ctx->buffer[i] & 0x00FF0000) >> 16);
ctx->digest[(i * 4) + 3] =
(uint8_t)((ctx->buffer[i] & 0xFF000000) >> 24);
}
}
/*
* Step on 512 bits of input with the main MD5 algorithm.
*/
void md5Step(uint32_t *buffer, uint32_t *input)
{
uint32_t AA = buffer[0];
uint32_t BB = buffer[1];
uint32_t CC = buffer[2];
uint32_t DD = buffer[3];
uint32_t E;
unsigned int j;
for (unsigned int i = 0; i < 64; ++i) {
switch (i / 16) {
case 0:
E = F(BB, CC, DD);
j = i;
break;
case 1:
E = G(BB, CC, DD);
j = ((i * 5) + 1) % 16;
break;
case 2:
E = H(BB, CC, DD);
j = ((i * 3) + 5) % 16;
break;
default:
E = I(BB, CC, DD);
j = (i * 7) % 16;
break;
}
uint32_t temp = DD;
DD = CC;
CC = BB;
BB = BB + rotateLeft(AA + E + K[i] + input[j], S[i]);
AA = temp;
}
buffer[0] += AA;
buffer[1] += BB;
buffer[2] += CC;
buffer[3] += DD;
}
void md5Buf(uint8_t *input, int len, uint8_t *result)
{
MD5Context ctx;
md5Init(&ctx);
md5Update(&ctx, input, len);
md5Finalize(&ctx);
memcpy(result, ctx.digest, 16);
}

24
source/firmware/md5.h Normal file
View File

@ -0,0 +1,24 @@
#ifndef MD5_H
#define MD5_H
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct
{
uint64_t size; // Size of input in bytes
uint32_t buffer[4]; // Current accumulation of hash
uint8_t input[64]; // Input to be used in the next step
uint8_t digest[16]; // Result of algorithm
} MD5Context;
void md5Init(MD5Context *ctx);
void md5Update(MD5Context *ctx, uint8_t *input, size_t input_len);
void md5Finalize(MD5Context *ctx);
void md5Step(uint32_t *buffer, uint32_t *input);
void md5Buf(uint8_t *input, int len, uint8_t *result);
#endif

View File

@ -7,10 +7,12 @@ let cross-rv5 = import <nixpkgs> {
libc = "newlib";
};
};
flags-file = "compile_flags.txt";
in
cross-rv5.mkShell {
nativeBuildInputs = [ nixpkgs.gnumake nixpkgs.guile_3_0 ];
shellHook = ''
export NIX_SHELL_NAME="riscv"
echo | riscv32-none-elf-gcc -E -Wp,-v - 2>&1 | grep "^ .*newlib" | sed 's/^ /-I/' > ${flags-file}
'';
}

View File

@ -4,5 +4,5 @@ set -e
../scripts/register-gen.scm io.reg > io_reg.v
../scripts/register-gen.scm -c io.reg > io_reg.h
../scripts/register-gen.scm -t io.reg > io_reg.txt
../scripts/picorv32-bus-mux-gen.scm -s 0+0x10000 -s 0x01000000+0x1000 -m bus_mux > bus_mux.v
../scripts/picorv32-bus-mux-gen.scm -s 0+0x10000 -s 0x01000000+0x1000 -m bus_mux -f > bus_mux.sby
../scripts/picorv32-bus-mux-gen.scm -s 0+0x10000 -s 0x10000+0x10000 -s 0x01000000+0x1000 -m bus_mux > bus_mux.v
../scripts/picorv32-bus-mux-gen.scm -s 0+0x10000 -s 0x10000+0x10000 -s 0x01000000+0x1000 -m bus_mux -f > bus_mux.sby

View File

@ -9,6 +9,30 @@
(info "Control register")
(bits 1 "stop" w (reset #b0)))
(reg "data_addr"
(info "Data block address")
(bits 32 "addr" r))
(reg "data_len"
(info "Data block length")
(bits 32 "len" r))
(reg "md5_out0"
(info "Bytes 0..3 of MD5 sum")
(bits 32 "data" w))
(reg "md5_out1"
(info "Bytes 4..7 of MD5 sum")
(bits 32 "data" w))
(reg "md5_out2"
(info "Bytes 8..11 of MD5 sum")
(bits 32 "data" w))
(reg "md5_out3"
(info "Bytes 12..15 of MD5 sum")
(bits 32 "data" w))
(reg "console" read-notify
(info "Virtual console port")
(bits 8 "data" rw (info "Read/write char from/to console"))

View File

@ -7,8 +7,38 @@
#define IO_REG_CTRL (*(volatile uint32_t*)(IO_REG_BASE + 0x00000000))
#define IO_REG_CTRL_STOP (1 << 0)
/* -- Register 'DATA_ADDR' -- */
#define IO_REG_DATA_ADDR (*(volatile uint32_t*)(IO_REG_BASE + 0x00000004))
#define IO_REG_DATA_ADDR_ADDR__MASK 0xffffffff
#define IO_REG_DATA_ADDR_ADDR__SHIFT 0
/* -- Register 'DATA_LEN' -- */
#define IO_REG_DATA_LEN (*(volatile uint32_t*)(IO_REG_BASE + 0x00000008))
#define IO_REG_DATA_LEN_LEN__MASK 0xffffffff
#define IO_REG_DATA_LEN_LEN__SHIFT 0
/* -- Register 'MD5_OUT0' -- */
#define IO_REG_MD5_OUT0 (*(volatile uint32_t*)(IO_REG_BASE + 0x0000000c))
#define IO_REG_MD5_OUT0_DATA__MASK 0xffffffff
#define IO_REG_MD5_OUT0_DATA__SHIFT 0
/* -- Register 'MD5_OUT1' -- */
#define IO_REG_MD5_OUT1 (*(volatile uint32_t*)(IO_REG_BASE + 0x00000010))
#define IO_REG_MD5_OUT1_DATA__MASK 0xffffffff
#define IO_REG_MD5_OUT1_DATA__SHIFT 0
/* -- Register 'MD5_OUT2' -- */
#define IO_REG_MD5_OUT2 (*(volatile uint32_t*)(IO_REG_BASE + 0x00000014))
#define IO_REG_MD5_OUT2_DATA__MASK 0xffffffff
#define IO_REG_MD5_OUT2_DATA__SHIFT 0
/* -- Register 'MD5_OUT3' -- */
#define IO_REG_MD5_OUT3 (*(volatile uint32_t*)(IO_REG_BASE + 0x00000018))
#define IO_REG_MD5_OUT3_DATA__MASK 0xffffffff
#define IO_REG_MD5_OUT3_DATA__SHIFT 0
/* -- Register 'CONSOLE' -- */
#define IO_REG_CONSOLE (*(volatile uint32_t*)(IO_REG_BASE + 0x00000004))
#define IO_REG_CONSOLE (*(volatile uint32_t*)(IO_REG_BASE + 0x0000001c))
#define IO_REG_CONSOLE_DATA__MASK 0x000000ff
#define IO_REG_CONSOLE_DATA__SHIFT 0
#define IO_REG_CONSOLE_SEND (1 << 8)

View File

@ -1,10 +1,16 @@
Register map of IO_REG (base: 0x1000000)
========================================
| Offset | Name | Description |
|------------+---------+----------------------|
| 0x00000000 | CTRL | Control register |
| 0x00000004 | CONSOLE | Virtual console port |
| Offset | Name | Description |
|------------+-----------+-------------------------|
| 0x00000000 | CTRL | Control register |
| 0x00000004 | DATA_ADDR | Data block address |
| 0x00000008 | DATA_LEN | Data block length |
| 0x0000000c | MD5_OUT0 | Bytes 0..3 of MD5 sum |
| 0x00000010 | MD5_OUT1 | Bytes 4..7 of MD5 sum |
| 0x00000014 | MD5_OUT2 | Bytes 8..11 of MD5 sum |
| 0x00000018 | MD5_OUT3 | Bytes 12..15 of MD5 sum |
| 0x0000001c | CONSOLE | Virtual console port |
CTRL Register (0x00000000)
@ -17,7 +23,67 @@ CTRL Register (0x00000000)
| 0 | STOP | WO | 0 | |
CONSOLE Register (0x00000004)
DATA_ADDR Register (0x00000004)
-------------------------------
Data block address
| Bits | Name | Mode | Reset | Description |
|------+------+------+-------+-------------|
| 31:0 | ADDR | RO | 0 | |
DATA_LEN Register (0x00000008)
------------------------------
Data block length
| Bits | Name | Mode | Reset | Description |
|------+------+------+-------+-------------|
| 31:0 | LEN | RO | 0 | |
MD5_OUT0 Register (0x0000000c)
------------------------------
Bytes 0..3 of MD5 sum
| Bits | Name | Mode | Reset | Description |
|------+------+------+-------+-------------|
| 31:0 | DATA | WO | 0 | |
MD5_OUT1 Register (0x00000010)
------------------------------
Bytes 4..7 of MD5 sum
| Bits | Name | Mode | Reset | Description |
|------+------+------+-------+-------------|
| 31:0 | DATA | WO | 0 | |
MD5_OUT2 Register (0x00000014)
------------------------------
Bytes 8..11 of MD5 sum
| Bits | Name | Mode | Reset | Description |
|------+------+------+-------+-------------|
| 31:0 | DATA | WO | 0 | |
MD5_OUT3 Register (0x00000018)
------------------------------
Bytes 12..15 of MD5 sum
| Bits | Name | Mode | Reset | Description |
|------+------+------+-------+-------------|
| 31:0 | DATA | WO | 0 | |
CONSOLE Register (0x0000001c)
-----------------------------
Virtual console port

View File

@ -17,6 +17,24 @@ module io_reg
/* ---- 'ctrl' ---- */
output wire o_ctrl_stop,
/* ---- 'data_addr' ---- */
input wire [31:0] i_data_addr_addr,
/* ---- 'data_len' ---- */
input wire [31:0] i_data_len_len,
/* ---- 'md5_out0' ---- */
output wire [31:0] o_md5_out0_data,
/* ---- 'md5_out1' ---- */
output wire [31:0] o_md5_out1_data,
/* ---- 'md5_out2' ---- */
output wire [31:0] o_md5_out2_data,
/* ---- 'md5_out3' ---- */
output wire [31:0] o_md5_out3_data,
/* ---- 'console' ---- */
output wire o_console__rnotify,
input wire [7:0] i_console_data,
@ -28,13 +46,53 @@ module io_reg
/* ---- Address decoder ---- */
wire ctrl_select;
wire data_addr_select;
wire data_len_select;
wire md5_out0_select;
wire md5_out1_select;
wire md5_out2_select;
wire md5_out3_select;
wire console_select;
assign ctrl_select =
i_addr[2] == 1'b0;
i_addr[2] == 1'b0 &&
i_addr[3] == 1'b0 &&
i_addr[4] == 1'b0;
assign data_addr_select =
i_addr[2] == 1'b1 &&
i_addr[3] == 1'b0 &&
i_addr[4] == 1'b0;
assign data_len_select =
i_addr[2] == 1'b0 &&
i_addr[3] == 1'b1 &&
i_addr[4] == 1'b0;
assign md5_out0_select =
i_addr[2] == 1'b1 &&
i_addr[3] == 1'b1 &&
i_addr[4] == 1'b0;
assign md5_out1_select =
i_addr[2] == 1'b0 &&
i_addr[3] == 1'b0 &&
i_addr[4] == 1'b1;
assign md5_out2_select =
i_addr[2] == 1'b1 &&
i_addr[3] == 1'b0 &&
i_addr[4] == 1'b1;
assign md5_out3_select =
i_addr[2] == 1'b0 &&
i_addr[3] == 1'b1 &&
i_addr[4] == 1'b1;
assign console_select =
i_addr[2] == 1'b1;
i_addr[2] == 1'b1 &&
i_addr[3] == 1'b1 &&
i_addr[4] == 1'b1;
/* ---- 'ctrl' ---- */
@ -50,6 +108,70 @@ module io_reg
end
/* ---- 'md5_out0' ---- */
reg [31:0] md5_out0_data;
assign o_md5_out0_data = md5_out0_data;
always @(posedge clock)
if (reset)
md5_out0_data <= 32'b0;
else
if (md5_out0_select && i_write) begin
if (i_ben[0]) md5_out0_data[7:0] <= i_data[7:0];
if (i_ben[1]) md5_out0_data[15:8] <= i_data[15:8];
if (i_ben[2]) md5_out0_data[23:16] <= i_data[23:16];
if (i_ben[3]) md5_out0_data[31:24] <= i_data[31:24];
end
/* ---- 'md5_out1' ---- */
reg [31:0] md5_out1_data;
assign o_md5_out1_data = md5_out1_data;
always @(posedge clock)
if (reset)
md5_out1_data <= 32'b0;
else
if (md5_out1_select && i_write) begin
if (i_ben[0]) md5_out1_data[7:0] <= i_data[7:0];
if (i_ben[1]) md5_out1_data[15:8] <= i_data[15:8];
if (i_ben[2]) md5_out1_data[23:16] <= i_data[23:16];
if (i_ben[3]) md5_out1_data[31:24] <= i_data[31:24];
end
/* ---- 'md5_out2' ---- */
reg [31:0] md5_out2_data;
assign o_md5_out2_data = md5_out2_data;
always @(posedge clock)
if (reset)
md5_out2_data <= 32'b0;
else
if (md5_out2_select && i_write) begin
if (i_ben[0]) md5_out2_data[7:0] <= i_data[7:0];
if (i_ben[1]) md5_out2_data[15:8] <= i_data[15:8];
if (i_ben[2]) md5_out2_data[23:16] <= i_data[23:16];
if (i_ben[3]) md5_out2_data[31:24] <= i_data[31:24];
end
/* ---- 'md5_out3' ---- */
reg [31:0] md5_out3_data;
assign o_md5_out3_data = md5_out3_data;
always @(posedge clock)
if (reset)
md5_out3_data <= 32'b0;
else
if (md5_out3_select && i_write) begin
if (i_ben[0]) md5_out3_data[7:0] <= i_data[7:0];
if (i_ben[1]) md5_out3_data[15:8] <= i_data[15:8];
if (i_ben[2]) md5_out3_data[23:16] <= i_data[23:16];
if (i_ben[3]) md5_out3_data[31:24] <= i_data[31:24];
end
/* ---- 'console' ---- */
reg [7:0] console_data;
assign o_console_data = console_data;
@ -77,16 +199,42 @@ module io_reg
/* ---- Read multiplexer ---- */
reg [31:0] data_ctrl;
reg [31:0] data_data_addr;
reg [31:0] data_data_len;
reg [31:0] data_md5_out0;
reg [31:0] data_md5_out1;
reg [31:0] data_md5_out2;
reg [31:0] data_md5_out3;
reg [31:0] data_console;
assign o_data =
data_ctrl |
data_data_addr |
data_data_len |
data_md5_out0 |
data_md5_out1 |
data_md5_out2 |
data_md5_out3 |
data_console;
always @(*) begin
data_ctrl = 32'd0;
data_data_addr = 32'd0;
data_data_len = 32'd0;
data_md5_out0 = 32'd0;
data_md5_out1 = 32'd0;
data_md5_out2 = 32'd0;
data_md5_out3 = 32'd0;
data_console = 32'd0;
if (data_addr_select) begin
data_data_addr[31:0] = i_data_addr_addr;
end
if (data_len_select) begin
data_data_len[31:0] = i_data_len_len;
end
if (console_select) begin
data_console[7:0] = i_console_data;
data_console[8] = i_console_send;

228
source/md5calculator.sv Normal file
View File

@ -0,0 +1,228 @@
`timescale 1ps/1ps
module md5calculator
(input clock,
input reset,
output done,
input [31:0] md5_data_addr,
input [31:0] md5_data_len,
output [127:0] md5);
parameter MEM_ADDR_WIDTH = 16;
parameter ROM_ADDR_WIDTH = 16;
/* verilator lint_off UNUSED */
logic cpu_mem_valid;
logic cpu_mem_instr;
logic cpu_mem_ready;
logic [31:0] cpu_mem_addr;
logic [31:0] cpu_mem_wdata;
logic [ 3:0] cpu_mem_wstrb;
logic [31:0] cpu_mem_rdata;
// Look-Ahead Interface
logic cpu_mem_la_read;
logic cpu_mem_la_write;
logic [31:0] cpu_mem_la_addr;
logic [31:0] cpu_mem_la_wdata;
logic [ 3:0] cpu_mem_la_wstrb;
/* verilator lint_on UNUSED */
// PicoRV32 // Defaults
picorv32 #(.ENABLE_COUNTERS(0), // = 1,
.ENABLE_COUNTERS64(0), // = 1,
.ENABLE_REGS_16_31(1), // = 1,
.ENABLE_REGS_DUALPORT(1), // = 1,
.LATCHED_MEM_RDATA(0), // = 0,
.TWO_STAGE_SHIFT(1), // = 1,
.BARREL_SHIFTER(0), // = 0,
.TWO_CYCLE_COMPARE(0), // = 0,
.TWO_CYCLE_ALU(0), // = 0,
.COMPRESSED_ISA(0), // = 0,
.CATCH_MISALIGN(1), // = 1,
.CATCH_ILLINSN(1), // = 1,
.ENABLE_PCPI(0), // = 0,
.ENABLE_MUL(0), // = 0,
.ENABLE_FAST_MUL(0), // = 0,
.ENABLE_DIV(0), // = 0,
.ENABLE_IRQ(0), // = 0,
.ENABLE_IRQ_QREGS(0), // = 1,
.ENABLE_IRQ_TIMER(0), // = 1,
.ENABLE_TRACE(0), // = 0,
.REGS_INIT_ZERO(0), // = 0,
.MASKED_IRQ(32'h 0000_0000), // = 32'h 0000_0000,
.LATCHED_IRQ(32'h ffff_ffff), // = 32'h ffff_ffff,
.PROGADDR_RESET(32'h 0000_0000), // = 32'h 0000_0000,
.PROGADDR_IRQ(32'h 0000_0010), // = 32'h 0000_0010,
.STACKADDR(32'h ffff_ffff)) // = 32'h ffff_ffff
picorv32
(.clk(clock),
.resetn(~reset),
.mem_valid(cpu_mem_valid), // output reg
.mem_instr(cpu_mem_instr), // output reg
.mem_ready(cpu_mem_ready), // input
.mem_addr(cpu_mem_addr), // output reg [31:0]
.mem_wdata(cpu_mem_wdata), // output reg [31:0]
.mem_wstrb(cpu_mem_wstrb), // output reg [ 3:0]
.mem_rdata(cpu_mem_rdata), // input [31:0]
// Look-Ahead Interface
.mem_la_read(cpu_mem_la_read), // output
.mem_la_write(cpu_mem_la_write), // output
.mem_la_addr(cpu_mem_la_addr), // output [31:0]
.mem_la_wdata(cpu_mem_la_wdata), // output reg [31:0]
.mem_la_wstrb(cpu_mem_la_wstrb), // output reg [ 3:0]
// Unused
/* verilator lint_off PINCONNECTEMPTY */
.pcpi_valid(), // output reg
.pcpi_insn(), // output reg [31:0]
.pcpi_rs1(), // output [31:0]
.pcpi_rs2(), // output [31:0]
.pcpi_wr(1'b0), // input
.pcpi_rd(32'd0), // input [31:0]
.pcpi_wait(1'b0), // input
.pcpi_ready(1'b0), // input
.irq(32'd0), // input [31:0]
.eoi(), // output reg [31:0]
.trap(), // output reg
.trace_valid(), // output reg
.trace_data() // output reg [35:0]
/* verilator lint_on PINCONNECTEMPTY */
);
// -- Bus multiplexer
// Slaves address ranges:
// 0 - 0x00000000-0x0000ffff
// 1 - 0x00010000-0x0001ffff
// 2 - 0x01000000-0x01000fff
// i_slave_rdata bits:
// 0: i_slave_rdata[31:0]
// 1: i_slave_rdata[63:32]
// 2: i_slave_rdata[95:64]
logic [31:0] rdata_ram;
logic [31:0] rdata_rom;
logic [31:0] rdata_reg;
logic valid_ram;
logic ready_ram;
logic valid_rom;
logic ready_rom;
logic valid_reg;
logic ready_reg;
bus_mux bus_mux
(.clock, .reset,
// CPU
.i_la_addr(cpu_mem_la_addr),
.o_rdata(cpu_mem_rdata),
.i_valid(cpu_mem_valid),
.o_ready(cpu_mem_ready),
// Slaves
.i_slave_rdata({rdata_reg, rdata_rom, rdata_ram}),
.o_slave_valid({valid_reg, valid_rom, valid_ram}),
.i_slave_ready({ready_reg, ready_rom, ready_ram}));
// -- CPU memory
picorv32_tcm #(.ADDR_WIDTH(MEM_ADDR_WIDTH),
.USE_LOOK_AHEAD(1),
.USE_ADDR_MUX(0),
.MEM_INIT_FILE("../source/firmware/fw.mem"))
main_tcm
(.clock, .reset,
/* PicoRV32 bus interface */
.mem_valid(valid_ram),
.mem_ready(ready_ram),
.mem_addr(cpu_mem_addr[MEM_ADDR_WIDTH-1:0]),
.mem_wdata(cpu_mem_wdata),
.mem_wstrb(cpu_mem_wstrb),
.mem_rdata(rdata_ram),
.mem_la_addr(cpu_mem_la_addr[MEM_ADDR_WIDTH-1:0]));
// -- DATA memory
picorv32_tcm #(.ADDR_WIDTH(ROM_ADDR_WIDTH),
.USE_LOOK_AHEAD(1),
.USE_ADDR_MUX(0))
rom
(.clock, .reset,
/* PicoRV32 bus interface */
.mem_valid(valid_rom),
.mem_ready(ready_rom),
.mem_addr(cpu_mem_addr[MEM_ADDR_WIDTH-1:0]),
.mem_wdata(cpu_mem_wdata),
.mem_wstrb(cpu_mem_wstrb),
.mem_rdata(rdata_rom),
.mem_la_addr(cpu_mem_la_addr[MEM_ADDR_WIDTH-1:0]));
// -- Registers
logic ctrl_stop;
logic [31:0] md5_out0;
logic [31:0] md5_out1;
logic [31:0] md5_out2;
logic [31:0] md5_out3;
logic [7:0] i_console_data;
logic [7:0] o_console_data;
logic console_send;
logic reg_write;
logic reg_read;
assign ready_reg = 1'b1;
assign reg_write = valid_reg & |(cpu_mem_wdata);
assign reg_read = valid_reg & &(~cpu_mem_wdata);
assign i_console_data = 8'ha5;
assign done = ctrl_stop;
assign md5 = {md5_out3, md5_out2, md5_out1, md5_out0};
io_reg io_reg
(.clock, .reset,
// CPU
.i_addr({16'd0, cpu_mem_addr[15:0]}),
.i_data(cpu_mem_wdata),
.o_data(rdata_reg),
.i_ben(cpu_mem_wstrb),
.i_write(reg_write),
.i_read(reg_read),
// Ctrl
.o_ctrl_stop(ctrl_stop),
// MD5
.i_data_addr_addr(md5_data_addr),
.i_data_len_len(md5_data_len),
.o_md5_out0_data(md5_out0),
.o_md5_out1_data(md5_out1),
.o_md5_out2_data(md5_out2),
.o_md5_out3_data(md5_out3),
// Console
.i_console_data(i_console_data),
.o_console_data(o_console_data),
.o_console_send_hsreq(console_send),
// Unused
/* verilator lint_off PINCONNECTEMPTY */
.o_console__rnotify(),
.i_console_send_hsack(1'b1),
.i_console_send(1'b0),
.i_console_valid(1'b1)
/* verilator lint_on PINCONNECTEMPTY */
);
// Print console output
initial
forever begin
@(posedge clock);
if (!reset && console_send) begin
$write("%c", o_console_data);
$fflush;
end
end
endmodule // testbench

View File

@ -2,4 +2,5 @@
../source/io_reg.v
../source/picorv32_tcm.sv
../source/picorv32.v
../source/md5calculator.sv
../source/testbench.sv

View File

@ -1,204 +1,57 @@
`timescale 1ps/1ps
module testbench (input clock);
parameter MEM_ADDR_WIDTH = 16;
module testbench #(parameter CPU_COUNT = 1024)
(input clock);
logic reset = 1'b1;
localparam DATA_ADDR = 32'h00010000;
localparam DATA_LEN = 1024;
/* verilator lint_off UNUSED */
logic cpu_mem_valid;
logic cpu_mem_instr;
logic cpu_mem_ready;
logic [31:0] cpu_mem_addr;
logic [31:0] cpu_mem_wdata;
logic [ 3:0] cpu_mem_wstrb;
logic [31:0] cpu_mem_rdata;
logic [31:0] data_len;
logic [CPU_COUNT-1:0] done_all;
// Look-Ahead Interface
logic cpu_mem_la_read;
logic cpu_mem_la_write;
logic [31:0] cpu_mem_la_addr;
logic [31:0] cpu_mem_la_wdata;
logic [ 3:0] cpu_mem_la_wstrb;
/* verilator lint_on UNUSED */
for (genvar ncpu = 0; ncpu < CPU_COUNT; ncpu = ncpu + 1) begin : cpus
localparam logic [31:0] MD5IN = ncpu;
// PicoRV32 // Defaults
picorv32 #(.ENABLE_COUNTERS(0), // = 1,
.ENABLE_COUNTERS64(0), // = 1,
.ENABLE_REGS_16_31(1), // = 1,
.ENABLE_REGS_DUALPORT(1), // = 1,
.LATCHED_MEM_RDATA(0), // = 0,
.TWO_STAGE_SHIFT(1), // = 1,
.BARREL_SHIFTER(0), // = 0,
.TWO_CYCLE_COMPARE(0), // = 0,
.TWO_CYCLE_ALU(0), // = 0,
.COMPRESSED_ISA(0), // = 0,
.CATCH_MISALIGN(1), // = 1,
.CATCH_ILLINSN(1), // = 1,
.ENABLE_PCPI(0), // = 0,
.ENABLE_MUL(0), // = 0,
.ENABLE_FAST_MUL(0), // = 0,
.ENABLE_DIV(0), // = 0,
.ENABLE_IRQ(0), // = 0,
.ENABLE_IRQ_QREGS(0), // = 1,
.ENABLE_IRQ_TIMER(0), // = 1,
.ENABLE_TRACE(0), // = 0,
.REGS_INIT_ZERO(0), // = 0,
.MASKED_IRQ(32'h 0000_0000), // = 32'h 0000_0000,
.LATCHED_IRQ(32'h ffff_ffff), // = 32'h ffff_ffff,
.PROGADDR_RESET(32'h 0000_0000), // = 32'h 0000_0000,
.PROGADDR_IRQ(32'h 0000_0010), // = 32'h 0000_0010,
.STACKADDR(32'h ffff_ffff)) // = 32'h ffff_ffff
picorv32
(.clk(clock),
.resetn(~reset),
logic done;
logic reset;
logic [127:0] md5;
.mem_valid(cpu_mem_valid), // output reg
.mem_instr(cpu_mem_instr), // output reg
.mem_ready(cpu_mem_ready), // input
.mem_addr(cpu_mem_addr), // output reg [31:0]
.mem_wdata(cpu_mem_wdata), // output reg [31:0]
.mem_wstrb(cpu_mem_wstrb), // output reg [ 3:0]
.mem_rdata(cpu_mem_rdata), // input [31:0]
assign done_all[ncpu] = done;
// Look-Ahead Interface
.mem_la_read(cpu_mem_la_read), // output
.mem_la_write(cpu_mem_la_write), // output
.mem_la_addr(cpu_mem_la_addr), // output [31:0]
.mem_la_wdata(cpu_mem_la_wdata), // output reg [31:0]
.mem_la_wstrb(cpu_mem_la_wstrb), // output reg [ 3:0]
md5calculator cpu
(.clock, .reset, .done,
.md5_data_addr(DATA_ADDR),
.md5_data_len(data_len),
.md5(md5));
// Unused
/* verilator lint_off PINCONNECTEMPTY */
.pcpi_valid(), // output reg
.pcpi_insn(), // output reg [31:0]
.pcpi_rs1(), // output [31:0]
.pcpi_rs2(), // output [31:0]
.pcpi_wr(1'b0), // input
.pcpi_rd(32'd0), // input [31:0]
.pcpi_wait(1'b0), // input
.pcpi_ready(1'b0), // input
.irq(32'd0), // input [31:0]
.eoi(), // output reg [31:0]
.trap(), // output reg
.trace_valid(), // output reg
.trace_data() // output reg [35:0]
/* verilator lint_on PINCONNECTEMPTY */
);
initial
for (int n = 0; n < (2 ** (cpu.rom.ADDR_WIDTH-2)); n += 1)
cpu.rom.ram[n] = ncpu;
// -- Bus multiplexer
// Slaves address ranges:
// 0 - 0x00000000-0x0000ffff
// 1 - 0x01000000-0x01000fff
initial
if(!$value$plusargs("dlen=%d", data_len))
data_len = DATA_LEN;
// i_slave_rdata bits:
// 0: i_slave_rdata[31:0]
// 1: i_slave_rdata[63:32]
initial begin
reset = 1'b1;
repeat($urandom % 5 + 2) @(posedge clock);
reset = 1'b0;
@(posedge clock);
logic [31:0] rdata_ram;
logic [31:0] rdata_reg;
logic valid_ram;
logic ready_ram;
logic valid_reg;
logic ready_reg;
bus_mux bus_mux
(.clock, .reset,
// CPU
.i_la_addr(cpu_mem_la_addr),
.o_rdata(cpu_mem_rdata),
.i_valid(cpu_mem_valid),
.o_ready(cpu_mem_ready),
// Slaves
.i_slave_rdata({rdata_reg, rdata_ram}),
.o_slave_valid({valid_reg, valid_ram}),
.i_slave_ready({ready_reg, ready_ram}));
// -- CPU memory
picorv32_tcm #(.ADDR_WIDTH(MEM_ADDR_WIDTH),
.USE_LOOK_AHEAD(1),
.USE_ADDR_MUX(0),
.MEM_INIT_FILE("../source/firmware/fw.mem"))
picorv32_tcm
(.clock, .reset,
/* PicoRV32 bus interface */
.mem_valid(valid_ram),
.mem_ready(ready_ram),
.mem_addr(cpu_mem_addr[MEM_ADDR_WIDTH-1:0]),
.mem_wdata(cpu_mem_wdata),
.mem_wstrb(cpu_mem_wstrb),
.mem_rdata(rdata_ram),
.mem_la_addr(cpu_mem_la_addr[MEM_ADDR_WIDTH-1:0]));
// -- Registers
// Reg 'ctrl'
logic ctrl_stop;
// Reg 'console'
logic [7:0] i_console_data;
logic [7:0] o_console_data;
logic console_send;
logic reg_write;
logic reg_read;
assign ready_reg = 1'b1;
assign reg_write = valid_reg & |(cpu_mem_wdata);
assign reg_read = valid_reg & &(~cpu_mem_wdata);
assign i_console_data = 8'ha5;
io_reg io_reg
(.clock, .reset,
// CPU
.i_addr({16'd0, cpu_mem_addr[15:0]}),
.i_data(cpu_mem_wdata),
.o_data(rdata_reg),
.i_ben(cpu_mem_wstrb),
.i_write(reg_write),
.i_read(reg_read),
// Reg 'ctrl'
.o_ctrl_stop(ctrl_stop),
// Reg 'console'
.i_console_data(i_console_data),
.o_console_data(o_console_data),
.o_console_send_hsreq(console_send),
// Unused
/* verilator lint_off PINCONNECTEMPTY */
.o_console__rnotify(),
.i_console_send_hsack(1'b1),
.i_console_send(1'b0),
.i_console_valid(1'b1)
/* verilator lint_on PINCONNECTEMPTY */
);
// Reset
localparam RESET_DURATION = 5;
initial begin
repeat(RESET_DURATION) @(posedge clock);
reset = 1'b0;
while(!done) @(posedge clock);
$display("MD5(0x%x) = %x", MD5IN, md5);
end
end
// Print console output
initial
forever begin
@(posedge clock);
if (!reset && console_send) begin
$write("%c", o_console_data);
$fflush;
end
end
// Wait for complete
initial begin
while (reset || ctrl_stop == 1'b0) @(posedge clock);
$display("--- BENCH BEGIN ---");
repeat(5) @(posedge clock);
while ((&done_all) == 1'b0) @(posedge clock);
@(posedge clock);
$display("--- BENCH DONE ---");
$finish;
end

View File

@ -1,3 +1,5 @@
#!/usr/bin/env bash
iverilog -g2012 -o top -f ../source/sources.f top.sv
. ../scripts/sim_vars.sh
iverilog -g2012 -o top -Ptop.CPU_COUNT=$CPU_COUNT -f $FFILE top.sv

View File

@ -1,3 +1,5 @@
#!/usr/bin/env bash
vvp -n ./top
. ../scripts/sim_vars.sh
vvp -N ./top +dlen=$BLOCK_SIZE

View File

@ -1,7 +1,7 @@
`timescale 1ps/1ps
module top;
module top #(parameter CPU_COUNT = 1024);
logic clock = 1'b0;
initial forever #(10ns/2) clock = ~clock;
testbench testbench (clock);
testbench #(CPU_COUNT) testbench (clock);
endmodule

View File

@ -1,5 +1,7 @@
#!/usr/bin/env bash
set -e
. ../scripts/sim_vars.sh
rm -rf testbench
vlog -sv -work testbench -vopt -f ../source/sources.f top.sv
vlog -sv -work testbench -vopt $param -f $FFILE top.sv

View File

@ -1,3 +1,5 @@
#!/usr/bin/env bash
vsim -c -batch -voptargs=+acc=npr -do "run -all" -quiet -lib testbench top
. ../scripts/sim_vars.sh
vsim -batch -voptargs=+acc=npr -do "run -all" -quiet +dlen=$BLOCK_SIZE -GCPU_COUNT=$CPU_COUNT -lib testbench top

View File

@ -1,7 +1,7 @@
`timescale 1ps/1ps
module top;
module top #(parameter CPU_COUNT = 1024);
logic clock = 1'b0;
initial forever #(10ns/2) clock = ~clock;
testbench testbench (clock);
testbench #(CPU_COUNT) testbench (clock);
endmodule

View File

@ -3,10 +3,12 @@ TOP_MODULE = testbench
SOURCES = top.cpp clock_generator.cpp
FLAGS_FILE = ../source/sources.f
INCLUDES =
PARAMS :=
THREADS := 1
FLAGS = -Wno-WIDTH -cc --top-module $(TOP_MODULE) +1800-2017ext+sv \
--timing --Mdir $(TOP_MODULE) -o $(TOP_MODULE) -f $(FLAGS_FILE) \
--timescale "1ps/1ps" --threads 1
$(PARAMS) --timescale "1ps/1ps" --threads $(THREADS) -j 0
# FLAGS += --trace

View File

@ -1,5 +1,7 @@
#!/usr/bin/env bash
set -e
. ../scripts/sim_vars.sh
make clean
make
make OPT_FAST="-Os -march=native" VM_PARALLEL_BUILDS=0 PARAMS="-GCPU_COUNT=$CPU_COUNT" THREADS=$THREADS

View File

@ -1,3 +1,5 @@
#!/usr/bin/env bash
./testbench/testbench
. ../scripts/sim_vars.sh
./testbench/testbench +dlen=$BLOCK_SIZE

View File

@ -0,0 +1 @@
((verilog-mode . ((flycheck-verilator-include-path . ("../source")))))

View File

@ -1,7 +1,11 @@
#!/usr/bin/env bash
set -e
. ../scripts/sim_vars.sh
rm -rf xcelium.d
xmvlog -sv -f ../source/sources.f top.sv
xmelab -timescale 1ps/1ps top
## WARNING: defparam is not tested
xmvlog -sv -f $FFILE top.sv
xmelab -timescale 1ps/1ps -defparam top.CPU_COUNT=$CPU_COUNT top

View File

@ -1,3 +1,5 @@
#!/usr/bin/env bash
xmsim -status top
. ../scripts/sim_vars.sh
xmsim -status top +dlen=$BLOCK_SIZE

View File

@ -1,7 +1,7 @@
`timescale 1ps/1ps
module top;
module top #(parameter CPU_COUNT = 1024);
logic clock = 1'b0;
initial forever #(10ns/2) clock = ~clock;
testbench testbench (clock);
testbench #(CPU_COUNT) testbench (clock);
endmodule

View File

@ -1,4 +1,4 @@
webtalk*
xsim.*
xsim*
xelab.*
xvlog.*

View File

@ -1,7 +1,8 @@
#!/usr/bin/env bash
set -e
FFILE=../source/sources.f
. ../scripts/sim_vars.sh
SOURCES=$(cat $FFILE | sed -ze 's/\n/ /g')
rm -rf xsim.dir
@ -10,4 +11,4 @@ rm -rf xvlog.* xelab.* xsim.*
rm -rf top.wdb
xvlog -work work --sv top.sv $SOURCES
xelab --O3 -L work top
xelab --O3 --generic_top "CPU_COUNT=$CPU_COUNT" -L work top

View File

@ -1,4 +1,5 @@
#!/usr/bin/env bash
#vsim -c -batch -voptargs=+acc=npr -do "run -all" -quiet -lib testbench top
xsim top --runall
. ../scripts/sim_vars.sh
xsim top -testplusarg dlen=$BLOCK_SIZE --runall

View File

@ -1,7 +1,7 @@
`timescale 1ps/1ps
module top;
module top #(parameter CPU_COUNT = 1024);
logic clock = 1'b0;
initial forever #(10ns/2) clock = ~clock;
testbench testbench (clock);
testbench #(CPU_COUNT) testbench (clock);
endmodule