当前位置：首页 > news >正文

杭州住房和城乡建设部网站wordpress修改图片大小

news 2025/10/17 7:21:13

杭州住房和城乡建设部网站,wordpress修改图片大小,wap网页游戏,720云和wordpress文章目录 0 前言1 数据集合2 网络构建3 模型训练4 模型性能评估5 文字预测6 最后 0 前言 #x1f525; 优质竞赛项目系列#xff0c;今天要分享的是 #x1f6a9; 深度学习中文汉字识别该项目较为新颖#xff0c;适合作为竞赛课题方向#xff0c;学长非常推荐#xf… 文章目录 0 前言1 数据集合2 网络构建3 模型训练4 模型性能评估5 文字预测6 最后 0 前言优质竞赛项目系列今天要分享的是深度学习中文汉字识别该项目较为新颖适合作为竞赛课题方向学长非常推荐学长这里给一个题目综合评分(每项满分5分) 难度系数3分工作量3分创新点4分更多资料, 项目分享 https://gitee.com/dancheng-senior/postgraduate 1 数据集合学长手有3755个汉字一级字库的印刷体图像数据集我们可以利用它们进行接下来的3755个汉字的识别系统的搭建。用深度学习做文字识别用的网络当然是CNN那具体使用哪个经典网络VGG?RESNET还是其他我想了下越深的网络训练得到的模型应该会更好但是想到训练的难度以及以后线上部署时预测的速度我觉得首先建立一个比较浅的网络基于LeNet的改进做基本的文字识别然后再根据项目需求再尝试其他的网络结构。这次任务所使用的深度学习框架是强大的Tensorflow。 2 网络构建第一步当然是搭建网络和计算图其实文字识别就是一个多分类任务比如这个3755文字识别就是3755个类别的分类任务。我们定义的网络非常简单基本就是LeNet的改进版值得注意的是我们加入了batch normalization。另外我们的损失函数选择sparse_softmax_cross_entropy_with_logits优化器选择了Adam学习率设为0.1 #network: conv2d-max_pool2d-conv2d-max_pool2d-conv2d-max_pool2d-conv2d-conv2d-max_pool2d-fully_connected-fully_connecteddef build_graph(top_k):keep_prob tf.placeholder(dtypetf.float32, shape[], namekeep_prob)images tf.placeholder(dtypetf.float32, shape[None, 64, 64, 1], nameimage_batch)labels tf.placeholder(dtypetf.int64, shape[None], namelabel_batch)is_training tf.placeholder(dtypetf.bool, shape[], nametrain_flag)with tf.device(/gpu:5):#给slim.conv2d和slim.fully_connected准备了默认参数batch_normwith slim.arg_scope([slim.conv2d, slim.fully_connected],normalizer_fnslim.batch_norm,normalizer_params{is_training: is_training}):conv3_1 slim.conv2d(images, 64, [3, 3], 1, paddingSAME, scopeconv3_1)max_pool_1 slim.max_pool2d(conv3_1, [2, 2], [2, 2], paddingSAME, scopepool1)conv3_2 slim.conv2d(max_pool_1, 128, [3, 3], paddingSAME, scopeconv3_2)max_pool_2 slim.max_pool2d(conv3_2, [2, 2], [2, 2], paddingSAME, scopepool2)conv3_3 slim.conv2d(max_pool_2, 256, [3, 3], paddingSAME, scopeconv3_3)max_pool_3 slim.max_pool2d(conv3_3, [2, 2], [2, 2], paddingSAME, scopepool3)conv3_4 slim.conv2d(max_pool_3, 512, [3, 3], paddingSAME, scopeconv3_4)conv3_5 slim.conv2d(conv3_4, 512, [3, 3], paddingSAME, scopeconv3_5)max_pool_4 slim.max_pool2d(conv3_5, [2, 2], [2, 2], paddingSAME, scopepool4)flatten slim.flatten(max_pool_4)fc1 slim.fully_connected(slim.dropout(flatten, keep_prob), 1024,activation_fntf.nn.relu, scopefc1)logits slim.fully_connected(slim.dropout(fc1, keep_prob), FLAGS.charset_size, activation_fnNone,scopefc2)# 因为我们没有做热编码所以使用sparse_softmax_cross_entropy_with_logitsloss tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logitslogits, labelslabels))accuracy tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits, 1), labels), tf.float32))update_ops tf.get_collection(tf.GraphKeys.UPDATE_OPS)if update_ops:updates tf.group(*update_ops)loss control_flow_ops.with_dependencies([updates], loss)global_step tf.get_variable(step, [], initializertf.constant_initializer(0.0), trainableFalse)optimizer tf.train.AdamOptimizer(learning_rate0.1)train_op slim.learning.create_train_op(loss, optimizer, global_stepglobal_step)probabilities tf.nn.softmax(logits)# 绘制loss accuracy曲线tf.summary.scalar(loss, loss)tf.summary.scalar(accuracy, accuracy)merged_summary_op tf.summary.merge_all()# 返回top k 个预测结果及其概率返回top K accuracypredicted_val_top_k, predicted_index_top_k tf.nn.top_k(probabilities, ktop_k)accuracy_in_top_k tf.reduce_mean(tf.cast(tf.nn.in_top_k(probabilities, labels, top_k), tf.float32))return {images: images,labels: labels,keep_prob: keep_prob,top_k: top_k,global_step: global_step,train_op: train_op,loss: loss,is_training: is_training,accuracy: accuracy,accuracy_top_k: accuracy_in_top_k,merged_summary_op: merged_summary_op,predicted_distribution: probabilities,predicted_index_top_k: predicted_index_top_k,predicted_val_top_k: predicted_val_top_k} 3 模型训练训练之前我们应设计好数据怎么样才能高效地喂给网络训练。首先我们先创建数据流图这个数据流图由一些流水线的阶段组成阶段间用队列连接在一起。第一阶段将生成文件名我们读取这些文件名并且把他们排到文件名队列中。第二阶段从文件中读取数据使用Reader产生样本而且把样本放在一个样本队列中。根据你的设置实际上也可以拷贝第二阶段的样本使得他们相互独立这样就可以从多个文件中并行读取。在第二阶段的最后是一个排队操作就是入队到队列中去在下一阶段出队。因为我们是要开始运行这些入队操作的线程所以我们的训练循环会使得样本队列中的样本不断地出队。入队操作都在主线程中进行,Session中可以多个线程一起运行。在数据输入的应用场景中入队操作是从硬盘中读取输入放到内存当中速度较慢。使用QueueRunner可以创建一系列新的线程进行入队操作让主线程继续使用数据。如果在训练神经网络的场景中就是训练网络和读取数据是异步的主线程在训练网络另一个线程在将数据从硬盘读入内存。 # batch的生成 def input_pipeline(self, batch_size, num_epochsNone, augFalse):# numpy array 转 tensorimages_tensor tf.convert_to_tensor(self.image_names, dtypetf.string)labels_tensor tf.convert_to_tensor(self.labels, dtypetf.int64)# 将image_list ,label_list做一个slice处理input_queue tf.train.slice_input_producer([images_tensor, labels_tensor], num_epochsnum_epochs)labels input_queue[1]images_content tf.read_file(input_queue[0])images tf.image.convert_image_dtype(tf.image.decode_png(images_content, channels1), tf.float32)if aug:images self.data_augmentation(images)new_size tf.constant([FLAGS.image_size, FLAGS.image_size], dtypetf.int32)images tf.image.resize_images(images, new_size)image_batch, label_batch tf.train.shuffle_batch([images, labels], batch_sizebatch_size, capacity50000,min_after_dequeue10000)# print image_batch, image_batch.get_shape()return image_batch, label_batch训练时数据读取的模式如上面所述那训练代码则根据该架构设计如下 def train():print(Begin training)# 填好数据读取的路径train_feeder DataIterator(data_dir./dataset/train/)test_feeder DataIterator(data_dir./dataset/test/)model_name chinese-rec-modelwith tf.Session(configtf.ConfigProto(gpu_optionsgpu_options, allow_soft_placementTrue)) as sess:# batch data 获取train_images, train_labels train_feeder.input_pipeline(batch_sizeFLAGS.batch_size, augTrue)test_images, test_labels test_feeder.input_pipeline(batch_sizeFLAGS.batch_size)graph build_graph(top_k1) # 训练时top k 1saver tf.train.Saver()sess.run(tf.global_variables_initializer())# 设置多线程协调器coord tf.train.Coordinator()threads tf.train.start_queue_runners(sesssess, coordcoord)train_writer tf.summary.FileWriter(FLAGS.log_dir /train, sess.graph)test_writer tf.summary.FileWriter(FLAGS.log_dir /val)start_step 0# 可以从某个step下的模型继续训练if FLAGS.restore:ckpt tf.train.latest_checkpoint(FLAGS.checkpoint_dir)if ckpt:saver.restore(sess, ckpt)print(restore from the checkpoint {0}.format(ckpt))start_step int(ckpt.split(-)[-1])logger.info(:::Training Start:::)try:i 0while not coord.should_stop():i 1start_time time.time()train_images_batch, train_labels_batch sess.run([train_images, train_labels])feed_dict {graph[images]: train_images_batch,graph[labels]: train_labels_batch,graph[keep_prob]: 0.8,graph[is_training]: True}_, loss_val, train_summary, step sess.run([graph[train_op], graph[loss], graph[merged_summary_op], graph[global_step]],feed_dictfeed_dict)train_writer.add_summary(train_summary, step)end_time time.time()logger.info(the step {0} takes {1} loss {2}.format(step, end_time - start_time, loss_val))if step FLAGS.max_steps:breakif step % FLAGS.eval_steps 1:test_images_batch, test_labels_batch sess.run([test_images, test_labels])feed_dict {graph[images]: test_images_batch,graph[labels]: test_labels_batch,graph[keep_prob]: 1.0,graph[is_training]: False}accuracy_test, test_summary sess.run([graph[accuracy], graph[merged_summary_op]],feed_dictfeed_dict)if step 300:test_writer.add_summary(test_summary, step)logger.info(Eval a batch)logger.info(the step {0} test accuracy: {1}.format(step, accuracy_test))logger.info(Eval a batch)if step % FLAGS.save_steps 1:logger.info(Save the ckpt of {0}.format(step))saver.save(sess, os.path.join(FLAGS.checkpoint_dir, model_name),global_stepgraph[global_step])except tf.errors.OutOfRangeError:logger.info(Train Finished)saver.save(sess, os.path.join(FLAGS.checkpoint_dir, model_name), global_stepgraph[global_step])finally:# 达到最大训练迭代数的时候清理关闭线程coord.request_stop()coord.join(threads)执行以下指令进行模型训练。因为我使用的是TITAN X所以感觉训练时间不长大概1个小时可以训练完毕。训练过程的loss和accuracy变换曲线如下图所示然后执行指令设置最大迭代步数为16002每100步进行一次验证每500步存储一次模型。 python Chinese_OCR.py --modetrain --max_steps16002 --eval_steps100 --save_steps5004 模型性能评估我们的需要对模模型进行评估我们需要计算模型的top 1 和top 5的准确率。执行指令 python Chinese_OCR.py --modevalidation def validation():print(Begin validation)test_feeder DataIterator(data_dir./dataset/test/)final_predict_val []final_predict_index []groundtruth []with tf.Session(configtf.ConfigProto(gpu_optionsgpu_options,allow_soft_placementTrue)) as sess:test_images, test_labels test_feeder.input_pipeline(batch_sizeFLAGS.batch_size, num_epochs1)graph build_graph(top_k5)saver tf.train.Saver()sess.run(tf.global_variables_initializer())sess.run(tf.local_variables_initializer()) # initialize test_feeders inside statecoord tf.train.Coordinator()threads tf.train.start_queue_runners(sesssess, coordcoord)ckpt tf.train.latest_checkpoint(FLAGS.checkpoint_dir)if ckpt:saver.restore(sess, ckpt)print(restore from the checkpoint {0}.format(ckpt))logger.info(:::Start validation:::)try:i 0acc_top_1, acc_top_k 0.0, 0.0while not coord.should_stop():i 1start_time time.time()test_images_batch, test_labels_batch sess.run([test_images, test_labels])feed_dict {graph[images]: test_images_batch,graph[labels]: test_labels_batch,graph[keep_prob]: 1.0,graph[is_training]: False}batch_labels, probs, indices, acc_1, acc_k sess.run([graph[labels],graph[predicted_val_top_k],graph[predicted_index_top_k],graph[accuracy],graph[accuracy_top_k]], feed_dictfeed_dict)final_predict_val probs.tolist()final_predict_index indices.tolist()groundtruth batch_labels.tolist()acc_top_1 acc_1acc_top_k acc_kend_time time.time()logger.info(the batch {0} takes {1} seconds, accuracy {2}(top_1) {3}(top_k).format(i, end_time - start_time, acc_1, acc_k))except tf.errors.OutOfRangeError:logger.info(Validation Finished)acc_top_1 acc_top_1 * FLAGS.batch_size / test_feeder.sizeacc_top_k acc_top_k * FLAGS.batch_size / test_feeder.sizelogger.info(top 1 accuracy {0} top k accuracy {1}.format(acc_top_1, acc_top_k))finally:coord.request_stop()coord.join(threads)return {prob: final_predict_val, indices: final_predict_index, groundtruth: groundtruth}5 文字预测刚刚做的那一步只是使用了我们生成的数据集作为测试集来检验模型性能这种检验是不大准确的因为我们日常需要识别的文字样本不会像是自己合成的文字那样的稳定和规则。那我们尝试使用该模型对一些实际场景的文字进行识别真正考察模型的泛化能力。首先先编写好预测的代码 def inference(name_list):print(inference)image_set[]# 对每张图进行尺寸标准化和归一化for image in name_list:temp_image Image.open(image).convert(L)temp_image temp_image.resize((FLAGS.image_size, FLAGS.image_size), Image.ANTIALIAS)temp_image np.asarray(temp_image) / 255.0temp_image temp_image.reshape([-1, 64, 64, 1])image_set.append(temp_image)# allow_soft_placement 如果你指定的设备不存在允许TF自动分配设备with tf.Session(configtf.ConfigProto(gpu_optionsgpu_options,allow_soft_placementTrue)) as sess:logger.info(start inference)# images tf.placeholder(dtypetf.float32, shape[None, 64, 64, 1])# Pass a shadow label 0. This label will not affect the computation graph.graph build_graph(top_k3)saver tf.train.Saver()# 自动获取最后一次保存的模型ckpt tf.train.latest_checkpoint(FLAGS.checkpoint_dir)if ckpt: saver.restore(sess, ckpt)val_list[]idx_list[]# 预测每一张图for item in image_set:temp_image itempredict_val, predict_index sess.run([graph[predicted_val_top_k], graph[predicted_index_top_k]],feed_dict{graph[images]: temp_image,graph[keep_prob]: 1.0,graph[is_training]: False})val_list.append(predict_val)idx_list.append(predict_index)#return predict_val, predict_indexreturn val_list,idx_list这里需要说明一下我会把我要识别的文字图像存入一个叫做tmp的文件夹内里面的图像按照顺序依次编号我们识别时就从该目录下读取所有图片仅内存进行逐一识别。 # 获待预测图像文件夹内的图像名字 def get_file_list(path):list_name[]files os.listdir(path)files.sort()for file in files:file_path os.path.join(path, file)list_name.append(file_path)return list_name那我们使用训练好的模型进行汉字预测观察效果。首先我从一篇论文pdf上用截图工具截取了一段文字然后使用文字切割算法把文字段落切割为单字如下图因为有少量文字切割失败所以丢弃了一些单字。从一篇文章中用截图工具截取文字段落。切割出来的单字黑底白字。最后将所有的识别文字按顺序组合成段落可以看出汉字识别完全正确说明我们的基于深度学习的OCR系统还是相当给力至此支持3755个汉字识别的OCR系统已经搭建完毕经过测试效果还是很不错。这是一个没有经过太多优化的模型在模型评估上top 1的正确率达到了99.9%这是一个相当优秀的效果了所以说在一些比较理想的环境下的文字识别的效果还是比较给力但是对于复杂场景的或是一些干扰比较大的文字图像识别起来的效果可能不会太理想这就需要针对特定场景做进一步优化。 6 最后更多资料, 项目分享 https://gitee.com/dancheng-senior/postgraduate

查看全文

http://www.yingshimen.cn/news/94371/