数据科学导论作业

第一次作业：点击下载

怎么说，对这门课就是默默告诉自己，习惯就好。默默贴个 github 地址：

第二次作业：点击下载

同样，还是贴几个github链接地址：

https://github.com/lamalor/ds100/blob/182feda62685b0988f4b4afb52c85256f4ac7d92/hw6/hw6.ipynb

https://github.com/Dhanush123/data100/blob/master/hw6/hw6.ipynb

https://github.com/iewaij/introDataScience/blob/1eb96b44721ed6cc40761a7e3003f11a5c243cc3/legacy/material/homework/DS100%20SP17/hw5/hw5_solution.ipynb

https://github.com/dovahcrow/ds100/blob/30d629246c87f2fafbf10c155545c30344be23e6/sp17/hw/hw5/hw5_solution.ipynb

1、更改juypter默认工作路径

在 anaconda prompt 中执行命令：

1	jupyter notebook --generate-config

即可查看 jupyter_notebook_config.py 配置文件的位置，打开配置文件 jupyter_notebook_config.py，搜索关键字 notebook_dir ，将值设置为自己想要的工作目录并取消注释即可（注意路径中不能有中文）；

然后右击 jupyter notebook 快捷方式，将属性中的 “目标” 字段的值，去掉末尾的 %USERPROFILE% ，点击 应用，确定即可。

2、jupyter使用anaconda虚拟python环境

打开 anaconda prompt，激活之前创建的某个python虚拟环境，在当前虚拟环境中执行 conda install nb_conda。重启 juypter notebook 服务器即可出现想要的虚拟环境 kernal 。

3、jupyter两种工作模式及其快捷键

分为命令模式（边框蓝色) 和编辑模式(边框绿色)
命令模式中， M 进入markdown编辑模式， Y 进入代码编辑模式

4、代码自动补全

注意，下面的所有操作是在 base 环境中安装的，在其他虚拟环境中安装好像并不起作用。

安装nbextensions

1 2	pip install jupyter_contrib_nbextensions jupyter contrib nbextension install --user

安装nbextensions_configurator

1 2	pip install jupyter_nbextensions_configurator jupyter nbextensions_configurator enable --user

重启 jupyter，在弹出的主页面里，能看到增加了一个Nbextensions标签页，在这个页面里，勾选Hinterland即启用了代码自动补全。

5、pandas教程[pan(el)-da(ta)-s]

1、pandas中的数据分为三种：一维数据 Series、二维数据 DataFrame、以及三维数据 Panel。

Series相当于一个字典。

1
2
3

data = np.random.randn(5) # 一维随机数
index = ['a', 'b', 'c', 'd', 'e'] # 指定索引
s = pd.Series(data, index)

二维数据，1. 带Series的字典 2.列表构成的字典 3.带字典的列表

d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}  #one做列
d = {'one' : [1, 2, 3, 4], 'two' : [4, 3, 2, 1]}   #one做列
d = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]  #a,b,c做列
df = pd.DataFrame(d)  # 新建 DataFrame
print(df）

列的选择，添加与删除

df[‘列名’] df.pop(‘列名’) df.insert(添加列位置索引序号, ‘添加列名’, 数值)

行的选取，列的选取，块的选取

df.index    df.columns   df.values   df.values    df.describes()   df.T    df.dtypes
df.sort_index()    df.sort_values()
df.head()   df.tail()
df[行选取]   df[列名]   df.loc[[行选取],[列选取]]
#df.iloc   df.iat
df.drop('行名')
df.append()
df.shape   df.size   df.ndim    
df.sum(axis=1 =0 =default)     df.mean()  df.std()
表明智函数应用：pipe()   #对所有元素起作用
行或列函数应用：apply()  #对行或者列起作用
元素函数应用：applymap() #对某一个元素起作用

python中使用 type(var) 查看变量的数据类型

6、juypter显示行号

view >> toggle line numbers

7、juypter中matplotlib画图中文显示乱码

https://blog.csdn.net/u014465934/article/details/80377470