Excel文件读写
- 使用read_excel读取,读取后的结果为dataframe格式
- 读取excel文件和csv文件参数大致一样,但要考虑工作表sheet页
- 参数较多,可以自行控制,但很多时候使用默认参数
- 读取excel时,注意编码,常用编码为utf-8、gbk、gbk2312和gb18030等
- 使用to_excel快速保存为xlsx格式
import pandas as pd
import os
os.getcwd()
'D:\\Jupyter\\notebook\\Python数据清洗实战\\数据清洗之文件读写'
os.chdir('D:\\Jupyter\\notebook\\Python数据清洗实战\\数据')
df1 = pd.read_excel('meal_order_detail.xlsx', encoding='utf-8', sheet_name='meal_order_detail1')
df1.head(5)
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border=”1″ class=”dataframe”>
<thead>
<tr style="text-align: right;">
<th></th>
<th>detail_id</th>
<th>order_id</th>
<th>dishes_id</th>
<th>logicprn_name</th>
<th>parent_class_name</th>
<th>dishes_name</th>
<th>itemis_add</th>
<th>counts</th>
<th>amounts</th>
<th>cost</th>
<th>place_order_time</th>
<th>discount_amt</th>
<th>discount_reason</th>
<th>kick_back</th>
<th>add_inprice</th>
<th>add_info</th>
<th>bar_code</th>
<th>picture_file</th>
<th>emp_id</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>2956</td>
<td>417</td>
<td>610062</td>
<td>NaN</td>
<td>NaN</td>
<td>蒜蓉生蚝</td>
<td>0</td>
<td>1</td>
<td>49</td>
<td>NaN</td>
<td>2016-08-01 11:05:36</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>caipu/104001.jpg</td>
<td>1442</td>
</tr>
<tr>
<th>1</th>
<td>2958</td>
<td>417</td>
<td>609957</td>
<td>NaN</td>
<td>NaN</td>
<td>蒙古烤羊腿\r\n\r\n\r\n</td>
<td>0</td>
<td>1</td>
<td>48</td>
<td>NaN</td>
<td>2016-08-01 11:07:07</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>caipu/202003.jpg</td>
<td>1442</td>
</tr>
<tr>
<th>2</th>
<td>2961</td>
<td>417</td>
<td>609950</td>
<td>NaN</td>
<td>NaN</td>
<td>大蒜苋菜</td>
<td>0</td>
<td>1</td>
<td>30</td>
<td>NaN</td>
<td>2016-08-01 11:07:40</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>caipu/303001.jpg</td>
<td>1442</td>
</tr>
<tr>
<th>3</th>
<td>2966</td>
<td>417</td>
<td>610038</td>
<td>NaN</td>
<td>NaN</td>
<td>芝麻烤紫菜</td>
<td>0</td>
<td>1</td>
<td>25</td>
<td>NaN</td>
<td>2016-08-01 11:11:11</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>caipu/105002.jpg</td>
<td>1442</td>
</tr>
<tr>
<th>4</th>
<td>2968</td>
<td>417</td>
<td>610003</td>
<td>NaN</td>
<td>NaN</td>
<td>蒜香包</td>
<td>0</td>
<td>1</td>
<td>13</td>
<td>NaN</td>
<td>2016-08-01 11:11:30</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>caipu/503002.jpg</td>
<td>1442</td>
</tr>
</tbody>
</table>
</div>
df1 = pd.read_excel('meal_order_detail.xlsx', encoding='utf-8', sheet_name=0) # sheet_name 可根据索引值取工作簿 等于0即为第一个工作簿
df1.head(5)
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border=”1″ class=”dataframe”>
<thead>
<tr style="text-align: right;">
<th></th>
<th>detail_id</th>
<th>order_id</th>
<th>dishes_id</th>
<th>logicprn_name</th>
<th>parent_class_name</th>
<th>dishes_name</th>
<th>itemis_add</th>
<th>counts</th>
<th>amounts</th>
<th>cost</th>
<th>place_order_time</th>
<th>discount_amt</th>
<th>discount_reason</th>
<th>kick_back</th>
<th>add_inprice</th>
<th>add_info</th>
<th>bar_code</th>
<th>picture_file</th>
<th>emp_id</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>2956</td>
<td>417</td>
<td>610062</td>
<td>NaN</td>
<td>NaN</td>
<td>蒜蓉生蚝</td>
<td>0</td>
<td>1</td>
<td>49</td>
<td>NaN</td>
<td>2016-08-01 11:05:36</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>caipu/104001.jpg</td>
<td>1442</td>
</tr>
<tr>
<th>1</th>
<td>2958</td>
<td>417</td>
<td>609957</td>
<td>NaN</td>
<td>NaN</td>
<td>蒙古烤羊腿\r\n\r\n\r\n</td>
<td>0</td>
<td>1</td>
<td>48</td>
<td>NaN</td>
<td>2016-08-01 11:07:07</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>caipu/202003.jpg</td>
<td>1442</td>
</tr>
<tr>
<th>2</th>
<td>2961</td>
<td>417</td>
<td>609950</td>
<td>NaN</td>
<td>NaN</td>
<td>大蒜苋菜</td>
<td>0</td>
<td>1</td>
<td>30</td>
<td>NaN</td>
<td>2016-08-01 11:07:40</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>caipu/303001.jpg</td>
<td>1442</td>
</tr>
<tr>
<th>3</th>
<td>2966</td>
<td>417</td>
<td>610038</td>
<td>NaN</td>
<td>NaN</td>
<td>芝麻烤紫菜</td>
<td>0</td>
<td>1</td>
<td>25</td>
<td>NaN</td>
<td>2016-08-01 11:11:11</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>caipu/105002.jpg</td>
<td>1442</td>
</tr>
<tr>
<th>4</th>
<td>2968</td>
<td>417</td>
<td>610003</td>
<td>NaN</td>
<td>NaN</td>
<td>蒜香包</td>
<td>0</td>
<td>1</td>
<td>13</td>
<td>NaN</td>
<td>2016-08-01 11:11:30</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>caipu/503002.jpg</td>
<td>1442</td>
</tr>
</tbody>
</table>
</div>
os.getcwd()
'D:\\Jupyter\\notebook\\Python数据清洗实战\\数据'
df1.to_excel('asdf.xlsx', index=False, sheet_name='one')
正文完