# lens **Repository Path**: lsjr/lens ## Basic Information - **Project Name**: lens - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-09-18 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README /************************* * Project: Video Compression based Anomaly Detection * Author: Yi-Chao Chen @ UT Austin *************************/ /************************* * Dataset *************************/ 1. 4sq/ - FourSquare data set collected by Gene: The files include the checkins of the city. We can retrieve and info from these checkins. - Format: 1. in "Airport", file name: 4SQ_VENUE_DETAILS_Airport.gz VENUE_DATA - venues -| stat | tags | ts | tips | checkins - users - ['last', 'gender', 'userid', 'ts', 'home', 'first'] | mayor 2. in other , file name: 4SQ_VENUE_TRENDS_.gz -| 'current_lat' | 'current_lng' | 'VENUE_INDEX' | 'VENUE_INFO' - venues - ['city', 'addr', 'zip', 'country', 'cate_name', 'hereNow', 'usersCount', 'state', 'contact', 'cate_id', 'ts', 'checkinsCount', 'lat', 'lng', 'id', 'name'] | 'VENUE_DETAIL' - venues - ['stat', 'tags', 'ts', 'tips', 'checkins', 'mayor'] 2. 4sq/city_info/4SQ__INFO - The detailed information of venues in the above data set. Except "Airport", other files are generated by "subtask_process_4sq/generate_city_info.py". - Format: 1. in "Airport", there are several possible format: a) venues - ['grp_type', 'city', 'addr', 'checkinsCount', 'cate_name', 'ts', 'hereNow', 'state', 'contact', 'cate_id', 'lat', 'grp_name', 'lng', 'usersCount', 'id', 'name'] b) venues - ['grp_type', 'name', 'addr', 'checkinsCount', 'cate_name', 'ts', 'hereNow', 'state', 'contact', 'cate_id', 'lat', 'grp_name', 'lng', 'usersCount', 'id', 'name'] 2. in other , our output format is: a) venues - ['lat', 'lng', 'id', 'name', 'checkinsCount'] 3. video Get video samples from: - http://trace.eas.asu.edu/yuv/ - http://media.xiph.org/video/derf/ - stefan_cif.yuv CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 90 frames - bus_cif.yuv CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 150 frames - foreman_cif.yuv CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 300 frames - coastguard_cif.yuv CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 300 frames - highway_cif.yuv CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 2000 frames The video files are large, so I put them in valleyview local disk: /var/local/yichao/anomaly_compression/data/video/ 4. huawei_cellular/BS_gps_hourly_traffic.txt TM sample produced upon 3G dataset. - row: 3075 rows, each row represents the traffic time series of one Base Station column: 26 columns, first two are gps values, next 24 columns represent 24 one-hour traffic(in bytes). 5. Traffic Matrix - MAWI 1. processed_data/subtask_parse_mawi/tm/tm_mawi.sort_ips.top100.txt.86400. 25 frames 93 * 91 2. processed_data/subtask_parse_mawi/tm/tm_mawi.sort_ips.top150.txt.86400. 25 frames 138 * 138 3. processed_data/subtask_parse_mawi/tm/tm_mawi.sort_ips.top200.txt.86400. 25 frames 180 * 180 4. processed_data/subtask_parse_mawi/tm/tm_mawi.sort_ips.top500.txt.86400. 25 frames 93 * 91 - 4SQ - SJTU WiFi 1. processed_data/subtask_parse_sjtu_wifi/tm/tm_upload.sort_ips.ap.country.txt.3600.top400. 19 frames 250 * 193 processed_data/subtask_parse_sjtu_wifi/tm/tm_download.sort_ips.ap.country.txt.3600.top400. 19 frames 193 * 250 2. processed_data/subtask_parse_sjtu_wifi/tm/tm_upload.sort_ips.ap.bgp.sub_CN.txt.3600.top400. 19 frames 250 * 400 processed_data/subtask_parse_sjtu_wifi/tm/tm_download.sort_ips.ap.bgp.sub_CN.txt.3600.top400. 19 frames 400 * 250 3. processed_data/subtask_parse_sjtu_wifi/tm/tm_upload.sort_ips.ap.gps.5.txt.3600.top400. 19 frames 250 * 400 processed_data/subtask_parse_sjtu_wifi/tm/tm_download.sort_ips.ap.gps.5.txt.3600.top400. 19 frames 400 * 250 4. Group by NUM top loaded APs processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi.ap_load.all.bin600.top50.txt 114 (time) * 50 (APs) processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi.ap_load.dl.bin600.top50.txt 114 (time) * 50 (APs) processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi.ap_load.ul.bin600.top50.txt 114 (time) * 50 (APs) processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi2.ap_load.all.bin600.top100.txt 287 (time) * 100 (APs) processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi2.ap_load.dl.bin600.top100.txt 287 (time) * 100 (APs) processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi2.ap_load.ul.bin600.top100.txt 287 (time) * 100 (APs) - Huawei 3G 1. Group by lat,lng of BS processed_data/subtask_parse_huawei_3g/region_tm/tm_3g_region_all.res0.006.bin10.sub. 146 frames 26 * 21 2. Group by BS at different areas: processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs0.all.bin10.txt group by BS BS types: unknown 1074 * 145 processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs1.all.bin10.txt group by BS BS types: general urban area 458 * 145 processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs2.all.bin10.txt group by BS BS types: general urban area 48 * 145 processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs3.all.bin10.txt group by BS BS types: general urban area 472 * 145 processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs3.all.bin60.txt group by BS BS types: general urban area 472 * 24 processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs4.all.bin10.txt group by BS BS types: general urban area 24 * 145 processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs5.all.bin10.txt group by BS BS types: general urban area 1 * 145 processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs6.all.bin10.txt group by BS BS types: general urban area 240 * 145 processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs7.all.bin10.txt group by BS BS types: general urban area 14 * 145 processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs8.all.bin10.txt group by BS BS types: general urban area 19 * 145 processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs9.all.bin10.txt group by BS BS types: general urban area 24 * 145 processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs10.all.bin10.txt group by BS BS types: general urban area 82 * 145 processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs11.all.bin10.txt group by BS BS types: general urban area 13 * 145 3. group by BS (all BSs) processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.all.all.bin10.txt 2469 * 145 4. group by BS (all BSs) and choose the top loaded BSs processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.load.top200.all.bin10.txt 200 * 145 5. group by BS (all BSs) and choose the most stable BSs processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.stable.top200.all.bin10.txt 200 * 145 6. group by RNC processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.rnc.all.bin10.txt 13 * 145 - GEANT (Totem) 1. processed_data/subtask_parse_totem/tm/tm_totem. 10772 frames 23 * 23 time bin = 15 minutes - Abilene 1. data/abilene/X 1008 (time) * 121 (od pairs) time bin = 10 minutes 2. processed_data/subtask_parse_abilene/tm/tm_abilene.od. same as above, but in 3D version 1008 frames 11 * 11 time bin = 10 minutes - CSI 1. Static /v/filer4b/v27q002/ut-wireless/swati/processed_traces/MonitorExp1/128.83.158.127_file.dat0_matrix.mat.txt 9850 * 90 /v/filer4b/v27q002/ut-wireless/swati/processed_traces/MonitorExp1/128.83.158.50_file.dat0_matrix.mat.txt 9706 * 90 2. Mobile data/csi/mobile/Mob-Recv1run1.dat0_matrix.mat_dB.txt 10000 * 90 data/csi/mobile/Mob-Recv1run1.dat1_matrix.mat_dB.txt 10000 * 90 - Sensor 1. IntelLab processed_data/subtask_parse_sensor/tm/tm_sensor.temp.bin600.txt processed_data/subtask_parse_sensor/tm/tm_sensor.humidity.bin600.txt processed_data/subtask_parse_sensor/tm/tm_sensor.light.bin600.txt processed_data/subtask_parse_sensor/tm/tm_sensor.voltage.bin600.txt 4943 * 54 - RON processed_data/subtask_parse_ron/tm/tm_ron1.latency. 12 * 12 * 494 - Cister RSSI: telos processed_data/subtask_parse_telos_rssi/tm/tm_telos_rssi.txt 10000 * 16 - CU RSSI: multi location processed_data/subtask_parse_multi_loc_rssi/tm/tm_multi_loc_rssi.txt 500 * 895 (179 nodes * 5 monitors) - Channel CSI condor_data/subtask_parse_csi_channel/csi/static_trace13.ant1.mag.txt 5000 * 270 - UCSB Meshnet condor_data/subtask_parse_ucsb_meshnet/tm/tm_ucsb_meshnet.connected.txt 1527 * 425 - UMich RSS condor_data/subtask_parse_umich_rss/tm/tm_umich_rss.txt 3127 * 182 /************************* * Subtasks *************************/ 1. subtask_process_4sq a) generate_city_info.py - Goal: read 4sq checkins and produce the information of all venues in the dataset. - Input: 1. city: the name of the city. e.g. Airport, Manhattan, Austin, San_Francisco - Output: 1. ../processed_data/subtask_process_4sq/combined_city_info/4SQ__INFO - The information of the venues in the city. Will be link to ../data/4sq/city_info/4SQ__INFO and used in generating TM. - Format: venues - ['lat', 'lng', 'id', 'name', 'checkinsCount'] - Batch Run: 1. batch_generate_city_info.sh b) generate_Human_TM.py - Goal: read 4sq checkins and produce human traffic matrix - Input: 1. period: generate a traffic matrix with "period" days of checkins data. 2. city: the name of the city. e.g. Airport, Manhattan, Austin, San_Francisco - Output: 1. ../processed_data/subtask_process_4sq/TM/_sorted.txt - The order of airports in TM - Format: 2. ../processed_data/subtask_process_4sq/TM/TM__period_.txt - The Human Traffic Matrix using days of data - Variables: 1. user_hist: userid - ts - ['last', 'gender', 'userid', 'ts', 'home', 'first', 'lat', 'lng', 'venue', 'venue_id'] - Batch Run: 1. batch_generate_Human_TM.sh c) plot_TM.mother.plot - Goal: given the Human TM generated above, plot the heat map using Gnuplot - Output: 1. ../figures/subtask_process_4sq/TM/TM_period_.eps - Batch Run: 1. batch_plot_TM.pl 2. subtask_psnr To compare PSNR of videos compressed using MPEG and PCA. It also output the compressed video for anomaly detection. a) PCA_psnr.m - Goal: calculate the PSNR of a video which is compressed by PCA low-rank approximation. step 1: given a video with [frame, width, height, YUV] pixels. step 2: convert to a 2D matrix: [frame, width * height * YUV] step 3: fragment the 2D matrix into small ones: fragment i = [frame, x_i:y_i] step 4: apply PCA to each fragment with a given rank. the rank decides the compression ratio and quality step 5: reconstruct the approximated matrix step 6: calculate PSNR - Input: 1. num_PC: num of PCs to use (i.e. rank) 2. video_name: the name of raw video (assume the video format: YUV CIF 4:2:0) 3. frames: number of frames to analyze 4. width: the width of the video 5. height: the height of the video - Output: 1. PSNR: the PSNR (dB) of the PCA low-rank approximation. 2. compressed size: the size of the PCA approximation. e.g. the size of principle components and eigenvectors - Batch Run: 1. batch_PCA_psnr.m b) PCA_psnr_by_frame.m - Goal: calculate the PSNR oa a vidoe which is compressed by PCA low-rank approximation step 1: given a video with [frame, width, height, YUV] pixels. step 2: make a GoP every [4 or 8 or 16] frames step 3: convert GoP into a 2D matrix. step 4: apply DCT to the 2D matrix. step 5: apply PCA to the DCT output with a given rank. step 6: reconstruct the approximated matrix step 7: apply inverted DCT the the approximated matrix step 8: calculate PSNR c) DCT_psnr.m - Goal: calculate the PSNR of a video which is compressed using 3D DCT step 1. given a video with [frame, width, height, YUV] pixels. step 2: make a GoP every [4 or 8 or 16] frames step 3: apply 3D DCT to each GoP. step 4: partition a GoP into small chunks (e.g. 44x35 pixels a chunk) step 5: for each chunk, calculate the error after iDCT if the chunk is removed step 6: remove chunks with small error step 7: apply inverted DCT to the matrix with only remaining chunks step 8: calculate PSNR - DCT_psnr_combine_yuv.m The only difference of this code is to apply DCT on a 4D array: [width, height, frame, YUV]. d) DCT_psnr_combine_yuv.m - Goal: Similar to "DCT_psnr.m", but instead of handling YUV seperately, this one combine them into one and apply 3D DCT to the combined matrix. Note. it's too slow (due to the larger matrix), so not used for now... e) compressive_sensing_psnr.m - Goal: calculate the PSNR of a video which is compressed using compressive sensing. step 1. given a video with [frame, width, height, YUV] pixels. step 2: make a GoP every [4 or 8 or 16] frames step 3: apply compressive sensing with spatial and temporal constraints to each GoP. step 4: reconstruct the GoP using U and V returned by compressive sensing step 5: calculate PSNR f) yuv_psnr.m - Goal: calculate the PSNR of two videos - Input: 1. video_name1: file name and path of the 1st video. 2. video_name2: file name and path of the 2nd video. 3. frames: number of frames to analyze 4. width: the width of the video 5. height: the height of the video 3. subtask_inject_error The objective of this task is to inject anomalies to a given matrix or video. a) inject_err.m - Goal: inject anomalies by adding some large numbers to the given matrix. 4. subtask_TM_to_video Convert the matrix to a YUV video. Because ffmpeg only work on video, before implementing our own MPEG encoding, we need to convert TM to video to apply MPEG based anomaly detection method. a) TM_to_video.m - Goal: convert the given 3D matrix to a YUV video. Since a pixel in the YUV video only have 1 byte, I put the 1st byte to V, 2nd byte to U, and 3rd byte to Y (assuming the values in the matrix have at most 3 bytes.) b) sanity_check.m - Goal: the code is used to check my implementation is correct.. 5. subtask_ffmpeg Use ffmpeg to convert raw YUV video to MPEG, and also convert MPEG back to YUV video. a) batch_convert.sh b) batch_convert_TM.sh 6. subtask_detect_anomaly After getting the normal subspace (i.e. compressed video) using scripts in "subtask_psnr" and "subtask_ffmpeg, the scripts here calculate the abnormal subspace (i.e. the difference between raw video and compressed video) and detect anomalies. a) diff_orig_comp_video.m - Goal: calculate the difference between raw video and compressed video. b) detect_anomaly.m - Goal: given the difference time-series from "diff_orig_comp_video.m", this script detect anomalies and return the performance. /************************* * Helpers: * ./utils/ *************************/ 1. Matlab Lib: YUV2Image - YUV2Image/ - http://www.mathworks.com/matlabcentral/fileexchange/6318-convert-yuv-cif-420-video-file-to-image-files/ 2. Gene Lee's codes - data.py, googlemaps.py, utils.py - used to process FourSquare dataset. 3. PSNR - calculate_psnr.m - The codes copied from somewhere online to calculate PSNR of a video 4. Matlab Lib: DCT/IDCT - mirt_dctn - http://www.mathworks.com/matlabcentral/fileexchange/24050-multidimensional-discrete-cosine-transform-dct