command is as below. arm-elf-gcc -v -save-temps -O2 -ffreestanding -c -msoft-float -nostdinc test.c Using built-in specs. Target: arm-elf Configured with: ./configure --target=arm-elf --prefix=/rdisk1/users/wangqiang/tool-chain --enable-interwork --enable-multilib --with-float=soft --enable-languages=c,c++ --with-newlib --with-headers=/rdisk1/users/wangqiang/software/src/newlib-1.14.0/newlib/libc/include Thread model: single gcc version 4.1.1 /rdisk5/xgsoc/ExDB/ARM/ARM_GCC/tool-chain/bin/../libexec/gcc/arm-elf/4.1.1/cc1 -E -quiet -nostdinc -v -iprefix /rdisk5/xgsoc/ExDB/ARM/ARM_GCC/tool-chain/bin/../lib/gcc/arm-elf/4.1.1/ -D__USES_INITFINI__ test.c -msoft-float -ffreestanding -O2 -fpch-preprocess -o test.i #include "..." search starts here: #include <...> search starts here: End of search list. /rdisk5/xgsoc/ExDB/ARM/ARM_GCC/tool-chain/bin/../libexec/gcc/arm-elf/4.1.1/cc1 -fpreprocessed test.i -quiet -dumpbase test.c -msoft-float -auxbase test -O2 -version -ffreestanding -o test.s GNU C version 4.1.1 (arm-elf) compiled by GNU C version 2.96 20000731 (Red Hat Linux 7.3 2.96-113). GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 3cdb4d57ef011d2fa661601e3cf86274 /rdisk5/xgsoc/ExDB/ARM/ARM_GCC/tool-chain/bin/../lib/gcc/arm-elf/4.1.1/../../../../arm-elf/bin/as -mfloat-abi=soft -o test.o test.s Source Code is as below. test.c #define burst_copy(dst,src,len) {\ __asm__ __volatile__ ( \ "1: \n\t" \ "ldmia %1!,{r3-r6} \n\t" \ "stmia %0!,{r3-r6} \n\t" \ "subs %2, %2, #1 \n\t" \ "bne 1b \n\t" \ ::"r"(dst),"r"(src),"r"(len) \ :"r3","r4","r5","r6"); \ } int main() { burst_copy(0xFFFF0000,0xC0000000,0x8); burst_copy(0xFFFF2000,0xFFFF0000,0x8); } Problem: the destination of first burst_copy is the source of the second. but source of the second burst_copy will use the result of first,as 0xFFFF0080,and no the Address(0xFFFF0000) I surposed to.
Created attachment 15341 [details] C source code
You're updating the source and destination register but marking them only as input in the asm. That's invalid. You trick GCC into believing that as it hasn't changed, it can re-use the register with the source-address.
I modified the source as below. It seems that It does not work. I am not sure whether it is a bug.Or gcc can not treat this situation? // copy 128 bit one time #define burst_copy(dst,src,len) {\ __asm__ __volatile__ ( \ "1: \n\t" \ "ldmia %1!,{r3-r6} \n\t" \ "stmia %0!,{r3-r6} \n\t" \ "subs %2, %2, #1 \n\t" \ "bne 1b \n\t" \ ::"r"(dst),"r"(src),"r"(len) \ :"%0", "%1" , "%2","r3","r4","r5","r6" ); \ }
Not a bug. You need to write your macro like this: #define burst_copy(dst,src,len) {\ unsigned t1, t2, t3; \ __asm__ __volatile__ ( \ "1: \n\t" \ "ldmia %1!,{r3-r6} \n\t" \ "stmia %0!,{r3-r6} \n\t" \ "subs %2, %2, #1 \n\t" \ "bne 1b \n\t" \ :"=r"(t1),"=r"(t2),"=r"(t3) \ :"0"(dst),"1"(src),"2"(len) \ :"r3","r4","r5","r6", "memory"); \ } Note that the results are never used, but this informs the compiler that the input values have been destroyed by the operation. Also note the clobber of "memory" to indicate that values in memory have been updated by the operation. It might be better to use an inline function for this rather than a macro, then you can use the input operands as your output operands and don't need to declare the temporaries. It would also give better error checking in some circumstances.