Taiwan Computational Quantum Matter Software Foundry

  • Jifeng Yu and Ying-Jer Kao, Spin-1/2 J1-J2 Heisenberg antiferromagnet on a square lattice: A plaquette renormalized tensor network study, Phys. Rev. B 85 094407 (2012).
  • J. F. Yu, S. C. Hsiao, Y.-J. Kao, GPU accelerated tensor contractions in the plaquette renormalization scheme, Comput. Fluids 45, 55
  • Ti-Yen Lan, Yun-Da Hsieh, Ying-Jer Kao, High-precision Monte Carlo study of the three-dimensional XY model on GPU, arXiv:1211.0780.


Development of Quantitative System for Risk Analysis in Finance

  • C.H. Han. Instantaneous Volatility Estimation by Fourier Transform Methods. To appear on Handbook of Financial Econometrics and Statistics (C.F. Lee eds.), Springer-Verlag, New York. 2013.
  • C.H. Han Importance Sampling Estimation of Joint Default Probability under Structural-Form Models with Stochastic Correlation. Monte Carlo and Quasi-Monte Carlo Methods. Editors Leszek Plaskota and Henryk Woźniakowski. Springer, 2012.


Solving Large-scale Numerical Problems on GPU

  • Chenhan D. Yu, Weichung Wang*, and Dan'l Pierce. (2011) A CPU-GPU Hybrid Approach for the Unsymmetric Multifrontal Method. Parallel Computing, 37:759-770.
  • Chenhan D. Yu and Weichung Wang. (Preprint) “Performance Models and Workload Distribution Algorithms for Optimizing a Hybrid CPU-GPU Multifrontal Solver.”
  • Yaohung M. Tsai, Ray-Bing Chen, and Weichung Wang (2012). “Tuning Block Size for QR Factorization on CPU-GPU Hybrid Systems.” Special Session: Auto-Tuning for Multicore and GPU (ATMG) in Conjunction with the IEEE 6th International Symposium on Embedded Multicore SoCs, Aizu-Wakamatsu, Japan.
  • Yukai Hung and Weichung Wang* (2012). Accelerating Parallel Particle Swarm Optimization via GPU. Optimization Methods and Software. 27(1):33-51.
  • Ray-Bing Chen, Dai-Ni Hsieh, Ying Hung, and Weichung Wang* (2013, Accepted). Optimizing Latin Hypercube Designs by Particle Swarm. Statistics and Computing.
  • Ray-Bing Chen, Yen-Wen Hsu, Ying Hung, and Weichung Wang. (Preprint) “Central Composite Discrepancy-Based Uniform Designs for Irregular Experimental Regions.”
  • Cheng-Ying Chou, Yi-Yan Chuo, Yukai Hung, and Weichung Wang*. (2011) A Fast Forward Projection Using Multithreads for Multirays on GPUs in Medical Image Reconstruction. Medical Physics, 38(7):4052-4065.
  • Cheng-Ying Chou, Yun Dong, Yukai Hung, Yu-Jiun Kao, Weichung Wang*, Chien-Min Kao, and Chin-Tu Chen. (2012). Accelerating Image Reconstruction in Dual-Head PET System by GPU and Symmetry Properties. PLOS ONE 7(12): e50540.
  • Quey-Liang Kao and Che-Rung Lee (2012, Dec). Design Fast Matrix Algorithms on High-Performance Cloud Platforms. IEEE CloudCom 2012


A Mixed OpenMP/MPI Programming Framework for Hybrid CPU/GPU Cluster Computing

  • Tyng-Yeu Liang, Fu-Chun Lu, Jun-Yao Chiu,“ A Hybrid Resource Reservation Method for Workflows in Clouds”, International Journal of Grid and High Performance Computing (IJGHPC), volume 4, issue 4, pp.1-21, December, 2012.
  • Tyng-Yeu Liang, Yu-Wei Chang, Hung-Fu Li, “A CUDA Programming Toolkit on Grids”, International Journal of Grid and Utility Computing, vol. 3, no 2., pp.97-111, June, 2012.
  • Tyng-Yeu Liang, Hung-Fu Li and Jun-Yao Chiu, “Enabling Mixed OpenMP/MPI Programming on Hybrid CPU/GPU Computing Architecture”, 2012 Multicore and GPU Programming Models, Languages and Compilers Workshop, collocated with 26th IEEE International Parallel & Distributed Processing Symposium, pp.2369-2377, Shanghai, China, May 21-25, 2012.
  • Che-Lun Hung, Chun-Yuan Lin, Hsiao-hsi Wang, Chin-Yuan Chang, Efficient Packet Pattern Matching for Gigabit Network Intrusion Detection using GPUs, accepted by The 2nd International Workshop on Embedded Multi-Core computing and Applications (in conjuction with IEEE ICESS 2012), 2012. (EI)


Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUs

  • Cheng-Hung Lin, Chen-Hsiung Liu, Lung-Sheng Chien, and Shih-Chieh Chang, "Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUs," accepted to be published in IEEE Transactions on Computers. (SCI)
  • Cheng-Hung Lin, Chen-Hsiung Liu, Shih-Chieh Chang, and Wing-Kai Hon, "Memory-Efficient Pattern Matching Architectures Using Perfect Hashing on Graphic Processing Units," in Proc. of the 31st Annual IEEE International Conference on Computer Communications (INFOCOM 2012), Orlando, Florida, USA, March 25-30, 2012.(Top conference, Acceptance rate: 18%, 278/1547)
  • Cheng-Hung Lin, Chen-Hsiung Liu, Lung-Sheng Chien, Shih-Chieh Chang, and Wing-Kai Hon, "PFAC Library: GPU-based string matching algorithm", accepted by GPU Technology Conference (GTC 2012), San Jose, California, May 14-17, 2012.
  • Cheng-Hung Lin and Jyh-Charn Liu, "M-DFA (multithreaded DFA): An Algorithm for Reduction of State Transitions and Acceleration of REGEXP Matching" in Proc. of ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS 2012), Austin, Texas, USA, Oct. 29-30, 2012.
  • Cheng-Hung Lin, Chen-Hsiung Liu, and Shih-Chieh Chang, "Accelerating Regular Expression Matching Using Hierarchical Parallel Machines on GPU", in Proc. of IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2011), Houston, Texas, USA, December 5-9, pp.1706-1710, 2011.
  • Cheng-Hung Lin, Sheng-Yu Tsai, Chen-Hsiung Liu, Shih-Chieh Chang, and Jyuo-Min Shyu, "Accelerating String Matching Using Multi-threaded Algorithm on GPU," in Proc. IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2010), Miami, Florida, USA, December 6-10, 2010. (Google citation:13)


  • Chun-Yuan Lin (*corresponding author) and Yu-Shiang Lin, Efficient Parallel Algorithm for Multiple Sequence Alignments with Regular Expression Constrains on Graphics Processing Units,appear to International Journal of Computational Science and Engineering, 2012. (EI)
  • Chun-Yuan Lin, Sheng-Ta Li, and Che Lun Hung, Frequency-based RE-Sequencing tool for short reads on Graphics Processing Units,appear to International Journal of Computational Science and Engineering, 2012. (EI)
  • Sheng-Ta Lee, Chun-Yuan Lin(*corresponding author), Che Lun Hung, Hsuan Ying Huang, Using Frequency Distance Filteration for Reducing Database Search Workload on GPU-Based Cloud Service,accepted by The 2012 International Workshop on Cloud Computing for Bioinformatics and Its Applications (in conjuction with IEEE CloudCom 2012).(EI)
  • Yu-Shiang Lin, Chun-Yuan Lin(*corresponding author), and Yeh-Ching Chung, GPU-Based Cloud Service for Multiple Sequence Alignments with Regular Expression Constrains,accepted by The 2012 International Workshop on Cloud Computing for Bioinformatics and Its Applications (in conjuction with IEEE CloudCom
  • Yu-Rong Chen, Che Lun Hung, Yu-Shiang Lin, Chun-Yuan Lin (*corresponding author), Tien-Lin Lee, Kual-Zheng Lee, Parallel UPGMA Algorithm on Graphics Processing Units Using CUDA,accepted by The third International Workshop on Forntier of GPU Computing (in conjuction with IEEE HPCC 2012), 2012. (EI)
  • Yu-Shiang Lin, Chun-Yuan Lin (*corresponding author), and Der-Chyuan Lou, Efficient Parallel RSA Decryption Algorithm for Many-core GPUs with CUDA, accepted by 2012 International Conference on Telecommunication Systems Management
  • Kuan-Ju Lin, Yi-Hsuan Huang, and Chun-Yuan Lin (*corresponding author), Efficient Parallel Knuth-Morris-Pratt Algorithm for Multi-GPUs with CUDA, accepted by Workshop on Parallel, Peer-to-peer, Distributed, and Cloud Computing, International Computer Symposium 2012.
  • Chun Yuan Lin (*corresponding author), Jen-Cheng Huang, and Sheng-Ta Li, Accelerating Smith-Waterman Algorithm Using Frequency Distance Filtration on Graphics Processing Units, accepted by The 17th Mobile Computing Workshop, 2012.
  • Chun Yuan Lin, Sheng-Ta Li, Che-Lun Hung, Chuan Yi Tang, and Yaw-Ling Lin, CUDA-FRESCO: Frequency-based RE-Sequencing tool based on CO-clustering segmentation by GPU, 2011 IEEE 13th International Conference on High Performance Computing and Communications, 2011, pp. 857-862. (EI)
  • Chun-Yuan Lin, Yu-Shiang Lin, Jiayi Zhou, Chuan Yi Tang, GPU-REMuSiC: Efficient Constrained Multiple Sequence Alignment Algorithm on Graphics Processing Units, Proceedings of the 16th Workshop on Compiler Techniques for High-Performance and Embedded Computing, 2011.
  • Sheng-Ta Li, Chun-Yuan Lin (*corresponding author), Yu-Shiang Lin, Joy Lee and Chuan Yi Tang, CUDA-FRESCO: an efficient algorithm for mapping short reads on Graphics Processing Units with CUDA,�Proceedings of the GPU Technology Conference 2010, pp.75.

Music Information Retrieval

  • Chung-Che Wang, Chieh-Hsing Chen, Chin-Yang Kuo, Li-Ting Chiu, and Jyh-Shing Roger Jang, "Accelerating Query by Singing/Humming on GPU: Optimization for Web Deployment", The 36th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, March 2012.

Computer Graphics

  • Min Shih, Yung-Feng Chiu, Ying-Chieh Chen, and Chun-Fa Chang. Real-Time Ray Tracing with CUDA. International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP) 2009. (EI)

HPC and GPU Performance Optimization

  • Lung-Sheng Chien, “Hand-Tuned SGEMM on GT200 GPU”,
  • Che-Rung Lee, Shih-Hsiang Lo, Nan-Hsi Chen, Yeh-Ching Chung, I-Hsin Chung (2012, May). GPU Performance Enhancement via Communication Cost Reduction: Case Studies of Radix Sort and WSN Relay Node Placement Problem. IEEE/ACM CCGRID 2012, Ottawa, Canada.
  • Che-Rung Lee, Zhi-Hung Chen, Quey-Liang Kao (2012, May). Parallelizing the Hamiltonian Computation in DQMC Simulations: Checkerboard Method for Sparse Matrix Exponentials on Multicore and GPU. IEEE IPDPSW 2012, ShangHai, China.
  • Shih-Hsiang Lo, Che-Rung Lee, Yeh-Ching Chung, Optimizing Pairwise Box Intersection Checking on GPUs for Large-Scale Simulations, accepted by ACM Transactions on Modeling and Computer Simulation (TOMACS)
  • Shih-Hsiang Lo; Che-Rung Lee; Quey-Liang Kao; I-Hsin Chung; Yeh-Ching Chung, Improving GPU Memory Performance with Artificial Barrier Synchronization, submitted to IEEE TPDS, under revision.
  • Che-Rung Lee, Shih-Hsiang Lo, Nan-Hsi Chen, Quey-Liang Kao, Yeh-Ching Chung, I-Hsin Chung, (Preprint) Data Streaming and Data Compression for GPU Performance Enhancement: Communication Cost Reduction and Beyond, submitted to International Journal of Parallel Programming .
  • Chun-Yuan Lin(*corresponding author), Wei Sheng Lee, and Chuan Yi Tang, Parallel Shellsort Algorithm for Many-core GPUs with CUDA, appear to International Journal of Grid and High Performance Computing, 2012.(EI)
  • Chi-Cheng Chuang, Yu-Sheng Chiu, Quey-Liang Kao, Zhi-Hung Chen and Che-Rung Lee (2012, Dec). Accelerating Block Checkerboard Method on GPU for Performance Enhancement of 2D and 3D Quantum Monte Carlo Simulations. IEEE CloudCom 2012, Taipei, Taiwan.